Comments (10)
- 可能是你二次预训练的学习率过大导致。
- 尝试调整fine-tuning阶段的学习率,以原模型的学习率为基准(即你获得90%准确率的lr)上下各波动一个刻度,然后观察准确率是否有变化。
from chinese-electra.
- 可能是你二次预训练的学习率过大导致。
- 尝试调整fine-tuning阶段的学习率,以原模型的学习率为基准(即你获得90%准确率的lr)上下各波动一个刻度,然后观察准确率是否有变化。
谢谢。二次预训练学习率是否有问题这个还不清楚,不过调整fine-tuning阶段的学习率确实有效,准确率能到80%+。另外能否提供一下预训练结束时判别器和生成器的loss情况?
from chinese-electra.
这个可能跟你训练的数据量大小也有一定关系。
通过tensorboard查看到我们的base模型在1m步时的loss是5.3左右(total_loss)。
from chinese-electra.
那可能我的预训练过程也有一定的问题,我的total_loss 在10左右。
from chinese-electra.
由于没有进一步讨论,本issue关闭。如有需要可随时reopen。
from chinese-electra.
,我在提供的base模型基础上用自己的数据继续预训练大约100w步,生成器loss 0.9左右,判别器loss 0.18左右,之后用预训练的模型在分类任务上微调,准确率只有10%+,而且用训练集测试准确率也只有30%。如果直接在提供的base模型基础上准确率能到90%,这是什么原因呢
您好,我也想在base的基础上继续跑,可是发现提供的base的模型大小是400多M, 但是我从随机值开始跑生成的base模型是1g多,没发现是哪里的问题~
from chinese-electra.
压缩了,只抽取了一部分模型权重,优化器状态等都未抽取,所以小了很多
from chinese-electra.
压缩了,只抽取了一部分模型权重,优化器状态等都未抽取,所以小了很多
明白了,请问那我如果想继续在这组参数上做pretraining,需要在哪里指定嘛?我试了直接放在原本的模型输出目录下,似乎不work~
from chinese-electra.
肯定不work的,想要继续Pretrain需要解决的问题是:把模型需要的但是未提供的参数进行intialize
from chinese-electra.
肯定不work的,想要继续Pretrain需要解决的问题是:把模型需要的但是未提供的参数进行intialize
明白了,感谢大佬指导!
from chinese-electra.
Related Issues (20)
- huggingface.co上chinese-electra-180g-small-discriminator的文件里缺少tokenizer.json HOT 5
- 关于实验所需的显卡配置
- 用electra-small在CMRC上微调时,训练过程没有问题,但是测试时出现了写出结果错误 HOT 2
- finetune/qa_tasks.py的585行有个错误 HOT 2
- Key discriminator_predictions/dense/bias/adam_m not found in checkpoint HOT 2
- Huggingface 的 legal base 和 legal small tf版是同一个模型 HOT 1
- 参数中的discriminator_predictions是什么? HOT 2
- Bug in preprocessing data? HOT 2
- 请教司法领域版本的预训练细节 HOT 2
- 一个原理上的疑问 HOT 2
- 预训练语料的疑问 HOT 3
- ELECTRA的Generator为什么没有选择修改为WWM? HOT 2
- 应用electra-discriminator训练过的模型作分类任务时部分权重没用上 HOT 3
- huggingface上的hfl/chinese-legal-electra-base-discriminator缺失vocab.txt文件 HOT 6
- 微调后的模型大小问题 HOT 2
- 百度网盘里的ELECTRA-180g-base, Chinese 包括generator部分的参数吗 HOT 2
- 预训练数据文件格式,是一行一个句子,还是一行一个包含多个句子的文档 HOT 11
- 关于微调后的模型 HOT 2
- Pytorch版本 HOT 1
- max length of hfl/chinese-electra-180g-large-discriminator tokenizer is incorrect
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chinese-electra.