Giter VIP home page Giter VIP logo

Comments (10)

ymcui avatar ymcui commented on September 27, 2024
  1. 可能是你二次预训练的学习率过大导致。
  2. 尝试调整fine-tuning阶段的学习率,以原模型的学习率为基准(即你获得90%准确率的lr)上下各波动一个刻度,然后观察准确率是否有变化。

from chinese-electra.

sc1054 avatar sc1054 commented on September 27, 2024
  • 可能是你二次预训练的学习率过大导致。
  • 尝试调整fine-tuning阶段的学习率,以原模型的学习率为基准(即你获得90%准确率的lr)上下各波动一个刻度,然后观察准确率是否有变化。

谢谢。二次预训练学习率是否有问题这个还不清楚,不过调整fine-tuning阶段的学习率确实有效,准确率能到80%+。另外能否提供一下预训练结束时判别器和生成器的loss情况?

from chinese-electra.

ymcui avatar ymcui commented on September 27, 2024

这个可能跟你训练的数据量大小也有一定关系。
通过tensorboard查看到我们的base模型在1m步时的loss是5.3左右(total_loss)。

from chinese-electra.

sc1054 avatar sc1054 commented on September 27, 2024

那可能我的预训练过程也有一定的问题,我的total_loss 在10左右。

from chinese-electra.

ymcui avatar ymcui commented on September 27, 2024

由于没有进一步讨论,本issue关闭。如有需要可随时reopen。

from chinese-electra.

ZJiaBin avatar ZJiaBin commented on September 27, 2024

,我在提供的base模型基础上用自己的数据继续预训练大约100w步,生成器loss 0.9左右,判别器loss 0.18左右,之后用预训练的模型在分类任务上微调,准确率只有10%+,而且用训练集测试准确率也只有30%。如果直接在提供的base模型基础上准确率能到90%,这是什么原因呢

您好,我也想在base的基础上继续跑,可是发现提供的base的模型大小是400多M, 但是我从随机值开始跑生成的base模型是1g多,没发现是哪里的问题~

from chinese-electra.

Veyronl avatar Veyronl commented on September 27, 2024

压缩了,只抽取了一部分模型权重,优化器状态等都未抽取,所以小了很多

from chinese-electra.

ZJiaBin avatar ZJiaBin commented on September 27, 2024

压缩了,只抽取了一部分模型权重,优化器状态等都未抽取,所以小了很多

明白了,请问那我如果想继续在这组参数上做pretraining,需要在哪里指定嘛?我试了直接放在原本的模型输出目录下,似乎不work~

from chinese-electra.

Veyronl avatar Veyronl commented on September 27, 2024

肯定不work的,想要继续Pretrain需要解决的问题是:把模型需要的但是未提供的参数进行intialize

from chinese-electra.

ZJiaBin avatar ZJiaBin commented on September 27, 2024

肯定不work的,想要继续Pretrain需要解决的问题是:把模型需要的但是未提供的参数进行intialize

明白了,感谢大佬指导!

from chinese-electra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.