princeton-nlp / dinkytrain Goto Github PK
View Code? Open in Web Editor NEWPrinceton NLP's pre-training library based on fairseq with DeepSpeed kernel integration ๐
License: MIT License
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration ๐
License: MIT License
Hi,
this is great work that makes BERT pre-training more transparent. Here I have a question about your architecture.
In https://github.com/princeton-nlp/DinkyTrain/blob/main/run_efficient_mlm_recipe.sh, you set the following two flags:
--arch roberta_large
--encoder-normalize-before
I understand you want to use pre-norm BERT. However, the default setting for roberta_large is with --layernorm-embedding=True, which means you will use two layernorm layers continuously after the word embedding layer. I think you also need to set --layernorm-embedding=False.
Hi,
Thanks for the wonderful work. I was wondering, if you have the PPL for all the masking rates reported in Table 3?
Hello, I noticed that you gave a search space of hyparameters on GLUE dataset. I am confused that how you search the hyparameters. Did you train each task on each of the hyparameters with different seeds? There are about fifty combinations of parameters. Did you finetune GLUE with each of the parameters? Thank you.
Hi,
in Table 8 in https://arxiv.org/pdf/2202.08005.pdf, the recipe is different from the original Roberta recipe. Roberta_large uses a batch size of 8196, a peak learning rate of 4e-4 and trains for 100K steps. Your parameter setting seems from Table 3 in https://arxiv.org/pdf/1907.11692.pdf. But this recipe is used for Roberta_base instead of Roberta_large.
Hi,
As we know, the STS-B task is a regression task where the targets are in [0, 5]. The .csv file submitted to the GLUE leaderboard is also required to be in [0, 5]. Otherwise, errors appear in the GLUE submission system.
During data preprocessing for GLUE data, fairseq script normalizes the target values to [0, 1]. MSE loss is applied to compute the difference between the logits and the normalized targets. For prediction, we need to multiply 5.
However, during prediction, how could we make sure the predicted values are restricted to [0, 1]? Since there is no activation function, like sigmoid, for the logits.
Hello, Thank you for your code. I am trying to reproducing your results by "GPU=8 DATA_DIR=/dev/gbert/dataset DEEPSPEED=1 bash run_efficient_mlm_recipe.sh" But I got an error:
fairseq-train: error: argument --arch/-a: invalid choice: 'deepspeed_roberta_large'
I noticed that you added the prefix "deepspeed_${ARCH}" in run_efficient_mlm_recipe.sh. And this arch is not registered in fairseq I think. Could you please tell me why you add a prefix and how to solve the problem? Thank you!
Using scripts/convert_fs_ckpt_to_hf_ckpt.py with source model trained using deepspeed fails with mismatched model keys.
Hi,
in your script https://github.com/princeton-nlp/DinkyTrain/blob/main/finetune_glue.sh, it seems you use last checkpoint to validate. May I ask why don't you use the best checkpoint?
In addition, if you use the last checkpoint to validate, does it means that you also use it to test?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.