Hi, When I ran the translation task, I met a problem. I can get the similar result if

Cannot get good results if I train from original data and script about xlm HOT 8 CLOSED

facebookresearch commented on July 29, 2024

Cannot get good results if I train from original data and script

from xlm.

Comments (8)

jiahuigeng commented on July 29, 2024

anything wrong? or there should be some tricks here?

from xlm.

glample commented on July 29, 2024

Hi,

Are you training with a single GPU? That may be the issue, as batch size is a critical parameter. Can you try to increase the batch size? If you have a single GPU, what you could try to do is to perform several forward / backward steps before performing the optimizer steps, it will simulate bigger batch sizes.

In practice you should get this at the end of the first epoch (epoch size was 300000 in my case):

epoch             ->  0.00
valid_en_pred_ppl -> 32.37
valid_en_pred_acc -> 40.87
valid_fr_pred_ppl -> 24.22
valid_fr_pred_acc -> 44.25
valid_pred_ppl    -> 28.29
valid_pred_acc    -> 42.56
test_en_pred_ppl  -> 28.95
test_en_pred_acc  -> 42.24
test_fr_pred_ppl  -> 14.40
test_fr_pred_acc  -> 50.10
test_pred_ppl     -> 21.68
test_pred_acc     -> 46.17

And if you train for a while:

epoch             -> 146.00
valid_en_pred_ppl ->   8.12
valid_en_pred_acc ->  58.40
valid_fr_pred_ppl ->   4.36
valid_fr_pred_acc ->  68.57
valid_pred_ppl    ->   6.24
valid_pred_acc    ->  63.48
test_en_pred_ppl  ->   5.87
test_en_pred_acc  ->  63.02
test_fr_pred_ppl  ->   3.87
test_fr_pred_acc  ->  69.09
test_pred_ppl     ->   4.87
test_pred_acc     ->  66.06

from xlm.

jiahuigeng commented on July 29, 2024

Thank you for these details, and should I keep lr=0.0001 or make it larger?

from xlm.

glample commented on July 29, 2024

We tried different values of learning rate for all our experiments, and I think lr=0.0001 always worked, and it was often the best value, so I think you can keep it.

from xlm.

jiahuigeng commented on July 29, 2024

Fine! Could you share the batch_size or the number of GPUs (or recommended value) in your experiments?

from xlm.

glample commented on July 29, 2024

Batch size we just use the biggest value that fits in memory, it was 32 for this experiment. And here we used 32GPU. Probably 8 would have been fine, just slower, I didn't try.

from xlm.

sarthakgarg commented on July 29, 2024

Thanks a lot for providing your logs! I had a doubt, what value of sample_alpha do you use for getting the above posted results?

from xlm.

glample commented on July 29, 2024

sample_alpha is only for MLM pretraining. You can probably ignore this parameter and always set it to 0, this is what we did.

from xlm.

Cannot get good results if I train from original data and script about xlm HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent