Giter VIP home page Giter VIP logo

bert_distill's People

Contributors

blmoistawinde avatar qiangsiwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bert_distill's Issues

关于数据集

作者,你好,我想问一下,你使用的数据集是什么数据集?

utils.py 86行bug

" np.random.rand() > p_mask" 而不是 " np.random.rand() < p_mask"

distill.py结果问题

首先感谢分享代码,我看distill.py有个疑问,最后输出的准确率是dev集上的结果,而默认teach_on_dev = True,这样相当于用dev集合在训练,这会导致测试效果虚高吧?

关于损失函数

论文中提高的是使用logits,但是提交的代码是softmax后的结果,请问这里是由什么原因吗?

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

您好,能请教一个问题吗?我在运行python ptbert.py的时候报了上面的错,显示错误在pooled_output = self.dropout(pooled_output)这一行,打印出pooled_output是'pooler_output'这个东西,是个str不是tensor,这就很奇怪了,_, pooled_output = self.bert(input_ids, None, input_mask),为什么bert出来的pooled_output就是'pooler_output'呢?我不知道是哪里错了,还望能指点下吗?非常感谢大佬!

找不到word2vec与t_tr

运行test.py时报错,No such file or directory: 'data/cache/word2vec'
找了一个200维的明文词向量改了名字算是糊弄过去了。utils里还改了一下维度。

然后运行distll、small时报错, No such file or directory: 'data/cache/t_tr'
这个又是什么文件?能否提供下载?

distill

测试的时候用的是验证集的数据,而验证集的数据也用来训练了,这里有问题吧

蒸馏之后LSTM如何使用的?

您好,
非常感谢分享代码!
我有一个疑问,在distill.py蒸馏训练后,test.py的run_distill()里只用到了bert模型预测的标签作为数据的训练标签,并没有使用distill.py蒸馏的模型,这是什么原理呢?
还望解答,谢谢!

code疑问:BertModel的传参和distill中teacher的预测输入

sequence_output, _ = self.bert(input_ids, None, input_mask, output_all_encoded_layers=False)
这里的传参,input_mask是不是传错位置了,您这样相当于attention_mask是None, token_type_ids是input_mask。
https://github.com/qiangsiwei/bert_distill/blob/master/distill.py#L22 这里预测的时候为什么不加[CLS]和[SEP]了?
感谢大佬开源,希望大佬能解答下我的疑惑。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.