autoliuweijie / fastbert Goto Github PK
View Code? Open in Web Editor NEWThe score code of FastBERT (ACL2020)
Home Page: https://www.aclweb.org/anthology/2020.acl-main.537/
The score code of FastBERT (ACL2020)
Home Page: https://www.aclweb.org/anthology/2020.acl-main.537/
Line 132 in 5f9e98b
有没有办法能够更灵活的调度需要计算的样本,比如建立一个pool,进入到第10层之后的都放到一个池子里,一起调度,让每一层计算的batchsize固定,这样充分利用显卡资源的话推理起来应该会快很多。
First of all, thanks for your kind offer.
Why did you set segment_embedding's first dimension to 3 ??
This is on path [FastBERT/uer/layers/embeddings.py] (line 18)
Is this part flexible depending on the model architecture?
The paper does not have this content, so I asked a question.
First, thanks for you work, it's very useful for inference BERT-like modes. Hope your paper get published soon.
And something i'm confused about is the FLOP of dense layer, which is in the section 4.1.
As far as i know, the FLOPs of fully connect layer with bias = 2 * I * O
I=input neuron numbers, O=output neuron numbers.
For the Fully-connect layer 128 ->128, FLOPs = 2 * 128 * 128 = 32,768
And in the Table 1, the answer is 4.2M, which is much higher than i got.
Can you release your method to calculate the answer ?
您好,我做了两个不同的实验。第一个是训练和测试的batchsize大小都为1(这样训练速度较慢);第二个是训练和测试的batchsize大小都为32;第二个实验的分类准确率比第一个实验低约2个百分点。
我在思考batchsize影响这么大的原因,一般来说batchsize增大可以增加模型泛化能力。但是在fastbert中,是否因为batchsize变大对推理阶段准确率影响较大?
不知道您有没有做过batchsize对训练测试影响的相关实验,或者有什么建议呢?
不知道作者有没有在复杂的分类数据集上尝试过该模型,我尝试在一个40分类的数据集上所有样本的不确定性都在0.95以上。
多谢!多谢!
@autoliuweijie
现在distill阶段使用的是固定的speed和epochs, 而且没有做early stopping. 对于不同的数据集, 如何确定这些超参数,已经如何选取最终的模型?
win10+pycharm,运行FastBERT\pypi\examples\single_sentence_classification\test.py,出错:
ModuleNotFoundError: No module named 'uer.encoders.synt_encoder'。
在windows命令行执行没有问题。
Would you clarify what the Weibo dataset (one of the benchmarked task in the paper) is or provide a copy in this repo?
Could you can provide source code early, we want to try and follow your work, Thanks
大佬您好,首先非常感谢您杰出的工作,和让人非常眼前一亮的论文。我读完论文后有点疑问,就是文章中fastbert相比较bert模型加速速率这个倍数是怎么计算得出的哇,我只看到了flops的计算代码。但是flops和推理速度并没有线性关系,想请大佬解答一下这个问题,非常感谢
感谢您分享出论文的代码与做出的贡献,我在阅读代码与论文的时候有一个问题,在Adaptive inference中的uncertainty公式怎么理解呢,为什么可以这样确定结果的不确定性
Hi,
I found in MultiHeadedAttention, thop only count the FLOPS of linear layer, missing the attention operation.
请问安装了fastbert,改如何进行Batch预测?
我好想没有找到批量预测的代码块,请问在哪里啊
请问CPU上,单个句子的推理时间是多少啊
这344行是个空行,有啥问题吗
你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?
I am curious about fast_mode argument, how and when to use this argument?
请问如果我是多标签任务,在每个维度独立做二分类
1.可以直接用KL散度做蒸馏loss吗
2.可以用类别维度的熵表示不确定性吗
用你的数据集thucnews跑多分类是OK的,用自己的数据集一直出现这个错误,请问数据集需要怎么处理吗?
Traceback (most recent call last):
File "run_fastbert.py", line 652, in
main()
File "run_fastbert.py", line 589, in main
result = evaluate(args, False, False)
File "run_fastbert.py", line 445, in evaluate
p = confusion[i,i].item()/confusion[i,:].sum().item()
ZeroDivisionError: division by zero
请问一下有没有tf的版本?
Thanks for releasing the great repo.
Could you share the DistillBERT (3layer) model and the DistillBERT (1layer)? It is very helpful for me.
Thanks!
Best,
Deming Ye
请问论文中 BERT baseline的FLOPs为什么是21785M?
按照表一列的内容,BERT的FLOPs不应该是1809.9 * 12 + 46.1 = 21765M吗?
Hi,
I am unable to access datasets linked in this repository. Please help me to access them.
Thanks
Hi, it looks like this cloud does not work now https://fastbert-model-file-1257235592.cos.ap-beijing.myqcloud.com/
看代码,好像没有固定呢?
https://github.com/BitVoyage/FastBERT 这个实现里面是固定的。
应该添加labels = labels = ['T', 'F']
你好,
请问文章里表2中所列BERT/DistilBERT的FLOPs是否包括最后的classifier,即在CLS后面接的那个MLP?这部分的FLOPs应该和最后的label个数N有关吧?另外,文章中的FLOPs是用什么工具计算的?
谢谢
First of all, thanks for your kind offer.
What do you think is the reason for self-attention for each classifier layer?
The paper also says that it does self-attention in 128 dimensions.
What do you think is the difference from deriving a result without self-attention with a only hidden size of 768 dimensions?
Hi, I'm trying to load other huggingface pre-trained model, for example, like this one: https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-pytorch_model.bin
but i found i cannot load it as the parameter names are different, or where do you get the pre-trained model?
你好,我在复现论文效果时遇到两个问题,请教一下。
下面是我分享复现时的细节,并非全部与所问问题相关:
Blocks | Which model belongs to |
---|---|
Embeddings | M0 |
Transformer-0 | M0 |
Stu-Classfier-0 | M0 |
Transformer-1 | M1 |
Stu-Classfier-1 | M1 |
... | ... |
Transformer11 | M11 |
Tea-Classfier | M11 |
表一:一共分为12段,M0-M11分别对应12个分类器
根据上述最后一条,我猜想GPU上推理加速效果不显著,是因为多出来的输入输出操作占用太多时间。
再次感谢您的研究成果,希望您能多分享一下推理方面的经验,是否有不用切分模型的自适应推理方法呢?
line 234:
self._self_distillation(
sentences_train, batch_size, learning_rate, epochs_num,
warmup, report_steps, model_saving_pathm, sentences_dev,
labels_dev, dev_speed, verbose
)
model_saving_pathm
应该是model_saving_path
吧
Hi,
I have a very rookie question. How can I calculate the FLOPs of BERT model?
I tried to use thop,
macs, params = profile(model, inputs=(input, ),
custom_ops={YourModule: count_your_model})
but I don't know how what is the input and custom_ops={YourModule: count_your_model}
For example, I want to run the models given by Huggingface. https://github.com/huggingface/transformers/tree/master/examples/text-classification
CUDA_VISIBLE_DEVICES=1 python run_glue.py \
--model_type bert \
--model_name_or_path /tmp/fintune_CoLA_output-bert/ \
I tried to put the macs, params = profile(model, inputs.....) command line in run_glue.py, but I'm not sure where to put it.
I get errors like:
[WARN] Cannot find rule for <class 'torch.nn.modules.sparse.Embedding'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.normalization.LayerNorm'>. Treat it as zero Macs and zero Params.
File "/home/zhk20002/anaconda2/envs/Py3.6/lib/python3.6/site-packages/transformers/trainer.py", line 677, in _training_step model, inputs=inputs, custom_ops={ File "/home/zhk20002/anaconda2/envs/Py3.6/lib/python3.6/site-packages/thop/profile.py", line 188, in profile model(*inputs) File "/home/zhk20002/anaconda2/envs/Py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhk20002/anaconda2/envs/Py3.6/lib/python3.6/site-packages/transformers/modeling_bert.py", line 1144, in forward inputs_embeds=inputs_embeds, File "/home/zhk20002/anaconda2/envs/Py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhk20002/anaconda2/envs/Py3.6/lib/python3.6/site-packages/transformers/modeling_bert.py", line 691, in forward input_shape = input_ids.size() AttributeError: 'str' object has no attribute 'size'
Do you have a general code like this where I can test out the Flops of models such as BERT, RoBERTa, DistilBERT by just changing the --model_type
?
Thanks!
Tony
Any one know where to get them?
Thank you and thank you.
Hello,感谢你杰出的工作。
我在glue的蚂蚁金服语义相似度语料上进行试验,finetune_epochs取20,distill_epochs取10,learning_rate取2e-5,dev_speed取0.5,最终蒸馏后在dev上的dev_acc始终在0.725徘徊。
若想让蒸馏后的dev_acc达到0.9,是不是要增大训练epoch,还是有别的影响因素呢?
感谢解答!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.