Giter VIP home page Giter VIP logo

tinybert's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tinybert's Issues

关于维度蒸馏的问题

我看论文里说tinybert其实用到了Embedding outputs、Hidden states、Self-Attention distributions来做蒸馏,想了解下teacher-model和student-model在维度不同的时,怎样去衡量他们的mse呢,我看代码里好像也没给出来,直接就mse了

general distillation为什么也用的是task data, 不应该用general data吗?

CUDA_VISIBLE_DEVICES=2,3 python general_distill.py   \
                          --teacher_model /nas/pretrain-bert/pretrain-pytorch/chinese_wwm_ext_pytorch/ \
                          --student_model student_model/  \
                          --train_file_path  /nas/lishengping/datas/tiny_task_data/train.txt \
                          --do_lower_case \
                          --train_batch_size 20 \
                          --output_dir ./output_dir  \
                          --learning_rate 5e-5  \
                          --num_train_epochs  3  \
                          --eval_step  5000  \
                          --max_seq_len  128  \
                          --gradient_accumulation_steps  1  3>&2 2>&1 1>&3 | tee logs/tiny_bert.log

第四行,请问generall distillation为什么也用了task_data。

Can I use it to distill roberta?

I used it to distill a roberta model, but there are some errors the may be "out of the index" thing?I am confused if tinybert can distill roberta? Hope you help me .

/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [47,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

中文任务的数据增强

hello,我是用tinybert做韵律预测任务,我用中文的预训练任务做数据加强后,得到的数据感觉有问题,是不能用于韵律预测的,请问哪数据增强是不是不用了

中文数据增强

hi,蒸馏过程中,用做数据增强吗?数据增强会提高分类任务的精度吗?

训练步骤请教

训练过程中的2.用相关任务的数据对Bert进行fine-tune得到fine-tune的Bert base模型;请问这一步是在训练中的哪些步骤中体现的呀,感谢回复

Is there any format converting on corpus (like the step 1 in general distillation in original TinyBERT repo)?

Is there any format converting on corpus (like step 1 of general distillation in the original TinyBERT repo)?

${BERT_BASE_DIR}$ includes the BERT-base teacher model.

python pregenerate_training_data.py --train_corpus ${CORPUS_RAW} \
--bert_model ${BERT_BASE_DIR}$
--reduce_memory --do_lower_case
--epochs_to_generate 3
--output_dir ${CORPUS_JSON_DIR}$

I'm a little bit confused with this part, shall I download some pre-trained BERT model and the corpus at the beginning? And it seems this part is not included in your script. Is there any difference? Thx.

蒸馏后的模型推理速度是否提升

Excuse me, after the author uses the enhanced training set to distill, does the inference speed increase by 8 times as fast as the paper said? My test found that the single line is basically the same as the bert base, and the batch processing changes linearly, which is not as fast as the bert base. I wonder if the author knows the reason? Thank you for the answer.
请问一下,作者在使用增强训练集蒸馏完之后,推理速度有没有提升论文中说的快8倍?我测试发现单条跟bert base基本一致,批处理大体线性变化,还不如bert base快,不知作者是否知道是什么原因?谢谢解答。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.