qiangsiwei / bert_distill Goto Github PK

View Code? Open in Web Editor NEW

304.0 304.0 87.0 29.61 MB

BERT distillation（基于BERT的蒸馏实验）

Python 100.00%

bert classification distillation nlp

bert_distill's People

Contributors

Stargazers

Watchers

Forkers

uptodiff mowayao caoxu915683474 wolkerzheng qqgeogor zhongyunuestc guoqunabc zhangyanbo2007 madehong haif-liu 18106574249 yinheju illith haerbinwyzhaha anigi98932 wusongxu liuwq168 zy1417548204 seeker1943 lilujunai chenmoshushi blmoistawinde ltrainzhang jozhouxian hanlard lance-cs-wz sjliu0920 daishijun da-southampton cdj0311 zeta1999 xmy123 shan6333 zhuzhibin1988 lyf915 yangyixiao123 chengli0327 sarah-callies houpanpan yangkm601v1 liangzongchang yuqianglxf elysium-amami dongcin zhipeng-zhong killua-zyk yongzhang-hz neilhui sisteryaya snaildm ningshiqi xiedake hillary060 yxiao1994 zzisme qiushisun renkeneng bit-engd yujun001 amberww wac81 strugglingyj xiongshufeng tickleliu horizon86 rp-linmu liyinchao jiahenghuang fang-dui creator-123 moomoofarm1 jenneryben chenmeng0527 zzk0 ljc0753 schcher chloe-mxxxxc iq-scm yazidzinedineh alexwaker zhaohobby

bert_distill's Issues

utils.py 86行bug

" np.random.rand() > p_mask" 而不是 " np.random.rand() < p_mask"

distill.py结果问题

首先感谢分享代码，我看distill.py有个疑问，最后输出的准确率是dev集上的结果，而默认teach_on_dev = True，这样相当于用dev集合在训练，这会导致测试效果虚高吧？

关于损失函数

论文中提高的是使用logits，但是提交的代码是softmax后的结果，请问这里是由什么原因吗？

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

您好，能请教一个问题吗？我在运行python ptbert.py的时候报了上面的错，显示错误在pooled_output = self.dropout(pooled_output)这一行，打印出pooled_output是'pooler_output'这个东西，是个str不是tensor，这就很奇怪了，_, pooled_output = self.bert(input_ids, None, input_mask)，为什么bert出来的pooled_output就是'pooler_output'呢？我不知道是哪里错了，还望能指点下吗？非常感谢大佬！

找不到word2vec与t_tr

运行test.py时报错，No such file or directory: 'data/cache/word2vec'
找了一个200维的明文词向量改了名字算是糊弄过去了。utils里还改了一下维度。

然后运行distll、small时报错， No such file or directory: 'data/cache/t_tr'
这个又是什么文件？能否提供下载？

distill

测试的时候用的是验证集的数据，而验证集的数据也用来训练了，这里有问题吧

蒸馏之后LSTM如何使用的？

您好，
非常感谢分享代码！
我有一个疑问，在distill.py蒸馏训练后，test.py的run_distill()里只用到了bert模型预测的标签作为数据的训练标签，并没有使用distill.py蒸馏的模型，这是什么原理呢？
还望解答，谢谢！

code疑问：BertModel的传参和distill中teacher的预测输入

bert_distill/ptbert.py

Line 103 in ceed9c9

 sequence_output, _ = self.bert(input_ids, None, input_mask, output_all_encoded_layers=False) 

这里的传参，input_mask是不是传错位置了，您这样相当于attention_mask是None， token_type_ids是input_mask。
https://github.com/qiangsiwei/bert_distill/blob/master/distill.py#L22 这里预测的时候为什么不加[CLS]和[SEP]了？
感谢大佬开源，希望大佬能解答下我的疑惑。