lisennlp / tinybert Goto Github PK

View Code? Open in Web Editor NEW

245.0 245.0 49.0 2.25 MB

简洁易用版TinyBert：基于Bert进行知识蒸馏的预训练语言模型

Python 97.75% Shell 2.25%

tinybert's People

Stargazers

Watchers

tinybert's Issues

您好，我想知道general.sh里面的student_model最开始是什么内容呢，有一点没弄懂

the trained model download url is 404

关于维度蒸馏的问题

我看论文里说tinybert其实用到了Embedding outputs、Hidden states、Self-Attention distributions来做蒸馏，想了解下teacher-model和student-model在维度不同的时，怎样去衡量他们的mse呢，我看代码里好像也没给出来，直接就mse了

general distillation为什么也用的是task data, 不应该用general data吗？

CUDA_VISIBLE_DEVICES=2,3 python general_distill.py   \
                          --teacher_model /nas/pretrain-bert/pretrain-pytorch/chinese_wwm_ext_pytorch/ \
                          --student_model student_model/  \
                          --train_file_path  /nas/lishengping/datas/tiny_task_data/train.txt \
                          --do_lower_case \
                          --train_batch_size 20 \
                          --output_dir ./output_dir  \
                          --learning_rate 5e-5  \
                          --num_train_epochs  3  \
                          --eval_step  5000  \
                          --max_seq_len  128  \
                          --gradient_accumulation_steps  1  3>&2 2>&1 1>&3 | tee logs/tiny_bert.log

第四行，请问generall distillation为什么也用了task_data。

Can I use it to distill roberta?

I used it to distill a roberta model, but there are some errors the may be "out of the index" thing?I am confused if tinybert can distill roberta? Hope you help me .

/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [47,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

中文任务的数据增强

hello，我是用tinybert做韵律预测任务，我用中文的预训练任务做数据加强后，得到的数据感觉有问题，是不能用于韵律预测的，请问哪数据增强是不是不用了

中文数据增强

hi，蒸馏过程中，用做数据增强吗？数据增强会提高分类任务的精度吗？

通用的中文TinyBERt模型求助

请问可以分享下general distillation后的中文TinyBERT模型吗，非常感谢！

数据格式

你好，请问数据格式一定是

训练步骤请教

训练过程中的2.用相关任务的数据对Bert进行fine-tune得到fine-tune的Bert base模型；请问这一步是在训练中的哪些步骤中体现的呀，感谢回复

Is there any format converting on corpus (like the step 1 in general distillation in original TinyBERT repo)?

Is there any format converting on corpus (like step 1 of general distillation in the original TinyBERT repo)?

${BERT_BASE_DIR}$ includes the BERT-base teacher model.

python pregenerate_training_data.py --train_corpus ${CORPUS_RAW} \
--bert_model ${BERT_BASE_DIR}$
--reduce_memory --do_lower_case
--epochs_to_generate 3
--output_dir ${CORPUS_JSON_DIR}$

I'm a little bit confused with this part, shall I download some pre-trained BERT model and the corpus at the beginning? And it seems this part is not included in your script. Is there any difference? Thx.

蒸馏后的模型推理速度是否提升

Excuse me, after the author uses the enhanced training set to distill, does the inference speed increase by 8 times as fast as the paper said? My test found that the single line is basically the same as the bert base, and the batch processing changes linearly, which is not as fast as the bert base. I wonder if the author knows the reason? Thank you for the answer.
请问一下，作者在使用增强训练集蒸馏完之后，推理速度有没有提升论文中说的快8倍？我测试发现单条跟bert base基本一致，批处理大体线性变化，还不如bert base快，不知作者是否知道是什么原因？谢谢解答。

lisennlp / tinybert Goto Github PK

tinybert's People

Stargazers

Watchers

Forkers

tinybert's Issues

Is there any format converting on corpus (like step 1 of general distillation in the original TinyBERT repo)?

Recommend Projects

Recommend Topics

Recommend Org