Giter VIP home page Giter VIP logo

Comments (8)

jiahuigeng avatar jiahuigeng commented on July 29, 2024

anything wrong? or there should be some tricks here?

from xlm.

glample avatar glample commented on July 29, 2024

Hi,

Are you training with a single GPU? That may be the issue, as batch size is a critical parameter. Can you try to increase the batch size? If you have a single GPU, what you could try to do is to perform several forward / backward steps before performing the optimizer steps, it will simulate bigger batch sizes.

In practice you should get this at the end of the first epoch (epoch size was 300000 in my case):

epoch             ->  0.00
valid_en_pred_ppl -> 32.37
valid_en_pred_acc -> 40.87
valid_fr_pred_ppl -> 24.22
valid_fr_pred_acc -> 44.25
valid_pred_ppl    -> 28.29
valid_pred_acc    -> 42.56
test_en_pred_ppl  -> 28.95
test_en_pred_acc  -> 42.24
test_fr_pred_ppl  -> 14.40
test_fr_pred_acc  -> 50.10
test_pred_ppl     -> 21.68
test_pred_acc     -> 46.17

And if you train for a while:

epoch             -> 146.00
valid_en_pred_ppl ->   8.12
valid_en_pred_acc ->  58.40
valid_fr_pred_ppl ->   4.36
valid_fr_pred_acc ->  68.57
valid_pred_ppl    ->   6.24
valid_pred_acc    ->  63.48
test_en_pred_ppl  ->   5.87
test_en_pred_acc  ->  63.02
test_fr_pred_ppl  ->   3.87
test_fr_pred_acc  ->  69.09
test_pred_ppl     ->   4.87
test_pred_acc     ->  66.06

from xlm.

jiahuigeng avatar jiahuigeng commented on July 29, 2024

Thank you for these details, and should I keep lr=0.0001 or make it larger?

from xlm.

glample avatar glample commented on July 29, 2024

We tried different values of learning rate for all our experiments, and I think lr=0.0001 always worked, and it was often the best value, so I think you can keep it.

from xlm.

jiahuigeng avatar jiahuigeng commented on July 29, 2024

Fine! Could you share the batch_size or the number of GPUs (or recommended value) in your experiments?

from xlm.

glample avatar glample commented on July 29, 2024

Batch size we just use the biggest value that fits in memory, it was 32 for this experiment. And here we used 32GPU. Probably 8 would have been fine, just slower, I didn't try.

from xlm.

sarthakgarg avatar sarthakgarg commented on July 29, 2024

Thanks a lot for providing your logs! I had a doubt, what value of sample_alpha do you use for getting the above posted results?

from xlm.

glample avatar glample commented on July 29, 2024

sample_alpha is only for MLM pretraining. You can probably ignore this parameter and always set it to 0, this is what we did.

from xlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.