Giter VIP home page Giter VIP logo

dinkytrain's People

Contributors

carlosejimenez avatar codecreator avatar danqi avatar gaotianyu1350 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dinkytrain's Issues

Why do you use both layer_norm for embedding and pre-norm at the same time?

Hi,

this is great work that makes BERT pre-training more transparent. Here I have a question about your architecture.

In https://github.com/princeton-nlp/DinkyTrain/blob/main/run_efficient_mlm_recipe.sh, you set the following two flags:

--arch roberta_large
--encoder-normalize-before

I understand you want to use pre-norm BERT. However, the default setting for roberta_large is with --layernorm-embedding=True, which means you will use two layernorm layers continuously after the word embedding layer. I think you also need to set --layernorm-embedding=False.

Could you please tell how to set the hyparameters of the GLUE?

Hello, I noticed that you gave a search space of hyparameters on GLUE dataset. I am confused that how you search the hyparameters. Did you train each task on each of the hyparameters with different seeds? There are about fifty combinations of parameters. Did you finetune GLUE with each of the parameters? Thank you.

Prediction values of the STS-B test set are not in 0~5

Hi,

As we know, the STS-B task is a regression task where the targets are in [0, 5]. The .csv file submitted to the GLUE leaderboard is also required to be in [0, 5]. Otherwise, errors appear in the GLUE submission system.

During data preprocessing for GLUE data, fairseq script normalizes the target values to [0, 1]. MSE loss is applied to compute the difference between the logits and the normalized targets. For prediction, we need to multiply 5.

However, during prediction, how could we make sure the predicted values are restricted to [0, 1]? Since there is no activation function, like sigmoid, for the logits.

fairseq-train: error: argument --arch/-a: invalid choice: 'deepspeed_roberta_large'

Hello, Thank you for your code. I am trying to reproducing your results by "GPU=8 DATA_DIR=/dev/gbert/dataset DEEPSPEED=1 bash run_efficient_mlm_recipe.sh" But I got an error:

fairseq-train: error: argument --arch/-a: invalid choice: 'deepspeed_roberta_large'

I noticed that you added the prefix "deepspeed_${ARCH}" in run_efficient_mlm_recipe.sh. And this arch is not registered in fairseq I think. Could you please tell me why you add a prefix and how to solve the problem? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.