lisennlp / tinybert Goto Github PK
View Code? Open in Web Editor NEW简洁易用版TinyBert:基于Bert进行知识蒸馏的预训练语言模型
简洁易用版TinyBert:基于Bert进行知识蒸馏的预训练语言模型
我看论文里说tinybert其实用到了Embedding outputs、Hidden states、Self-Attention distributions来做蒸馏,想了解下teacher-model和student-model在维度不同的时,怎样去衡量他们的mse呢,我看代码里好像也没给出来,直接就mse了
CUDA_VISIBLE_DEVICES=2,3 python general_distill.py \
--teacher_model /nas/pretrain-bert/pretrain-pytorch/chinese_wwm_ext_pytorch/ \
--student_model student_model/ \
--train_file_path /nas/lishengping/datas/tiny_task_data/train.txt \
--do_lower_case \
--train_batch_size 20 \
--output_dir ./output_dir \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--eval_step 5000 \
--max_seq_len 128 \
--gradient_accumulation_steps 1 3>&2 2>&1 1>&3 | tee logs/tiny_bert.log
第四行,请问generall distillation为什么也用了task_data。
I used it to distill a roberta model, but there are some errors the may be "out of the index" thing?I am confused if tinybert can distill roberta? Hope you help me .
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [47,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize
failed.
hello,我是用tinybert做韵律预测任务,我用中文的预训练任务做数据加强后,得到的数据感觉有问题,是不能用于韵律预测的,请问哪数据增强是不是不用了
hi,蒸馏过程中,用做数据增强吗?数据增强会提高分类任务的精度吗?
请问可以分享下general distillation后的中文TinyBERT模型吗,非常感谢!
你好,请问数据格式一定是
训练过程中的2.用相关任务的数据对Bert进行fine-tune得到fine-tune的Bert base模型;请问这一步是在训练中的哪些步骤中体现的呀,感谢回复
python pregenerate_training_data.py --train_corpus ${CORPUS_RAW} \
--bert_model
--reduce_memory --do_lower_case
--epochs_to_generate 3
--output_dir
I'm a little bit confused with this part, shall I download some pre-trained BERT model and the corpus at the beginning? And it seems this part is not included in your script. Is there any difference? Thx.
Excuse me, after the author uses the enhanced training set to distill, does the inference speed increase by 8 times as fast as the paper said? My test found that the single line is basically the same as the bert base, and the batch processing changes linearly, which is not as fast as the bert base. I wonder if the author knows the reason? Thank you for the answer.
请问一下,作者在使用增强训练集蒸馏完之后,推理速度有没有提升论文中说的快8倍?我测试发现单条跟bert base基本一致,批处理大体线性变化,还不如bert base快,不知作者是否知道是什么原因?谢谢解答。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.