facebookresearch / xlm Goto Github PK

PyTorch original implementation of Cross-lingual Language Model Pretraining.

License: Other

Python 79.36% Jupyter Notebook 6.98% Shell 12.11% Perl 1.56%

xlm's Issues

unsupervised NMT

RuntimeError: CUDA error: device-side assert triggered

Hello! I have been running your translate.py script and have been running into this error on a particular line of my input file, containing a BPE-ised URL but other than that nothing particular (only 13 subwords long) The error occurs with the following line of code:

decoded, dec_lengths = decoder.generate(encoded, lengths.cuda(), params.tgt_id, max_len=int(1.5 * lengths.max().item() + 10))`

Do you have any suggestions about what might be causing this error and how it could be fixed? Thank you very much in advance!

What is the learning rate that has been used during the fine-tuning of XLM(MLM+TLM) for XNLI task("Zero shot setting")?

In the paper, Section 5.1, "We sample the learning rate of the Adam optimizer with values from 5.10−4 to 2.10−4" but on the github it is "5.10-6" .

Embeddings for each subword in a sentence

Hi,

thanks for releasing the code for Cross-lingual Language Model Pretraining ❤️

I would like to know, if it's possible to: encode a whole sentence and get the embeddings for each token (or better subword). The notebook contains only example of how to encode a sentence, but could you also provide a way to get the embeddings for each subword?

Thanks :)

Adjust learning rate

Hi, I noticed that whether it is unsupervised NMT training or MLM training, the learning rate is 0.0001. Is this the learning rate when training with 8 GPUs? If I use 4 GPUs, how to adjust the learning rate and warm-up? Thank you very much.

Sentence delimiter <s> or </s>

I look into the notebook to embed a sentence.

https://github.com/facebookresearch/XLM/blob/master/generate-embeddings.ipynb

In the cell In[6] there is this code.

sentences = [(('</s> %s </s>' % sent.strip()).split(), lang) for sent, lang in sentences]

Isn't it supposed to be '<s> %s </s>' instead? Do I misunderstand something?

How can I get the words embeddings?

Hello!
Thank you for sharing this code!

Is there an easy way to get the embedding of a particular word?
Those found in table 5. of the paper.
Thank you!

Experience OOM error during evaluate_mt()

Dear authors,
Thank you so much for your codes. I'm trying to reproduce supervised MT results on wmt14 en-de. The training works fine with single(multi)-gpu. However, I frequently experience OOM error after one epoch and during evaluate_mt() step. Here's the script I used and the error message:

python train.py --exp_name wmt14_ende --dump_path ./dumped/ --data_path ./data/processed/wmt14_de-en --lgs 'en-de' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 2000 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --eval_bleu true --stopping_criterion 'valid_en-de_mt_bleu,10' --validation_metrics 'valid_en-de_mt_bleu' --mt_steps "en-de" --gpus '0'
(--gpus just indicates the gpuid to use)

Traceback (most recent call last):
File "train.py", line 325, in
main(params)
File "train.py", line 300, in main
scores = evaluator.run_all_evals(trainer)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/evaluation/evaluator.py", line 181, in run_all_evals
self.evaluate_mt(scores, data_set, lang1, lang2, eval_bleu)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/evaluation/evaluator.py", line 377, in evaluate_mt
word_scores, loss = decoder('predict', tensor=dec2, pred_mask=pred_mask, y=y, get_scores=True)
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/model/transformer.py", line 313, in forward
return self.predict(**kwargs)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/model/transformer.py", line 416, in predict
scores, loss = self.pred_layer(masked_tensor, y, get_scores)
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/model/transformer.py", line 132, in forward
loss = F.cross_entropy(scores, y, reduction='elementwise_mean')
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 975, in log_softmax
return input.log_softmax(dim)
RuntimeError: CUDA error: out of memory

The OOM always happens within F.cross_entropy(), although cross_entropy doesn't always trigger OOM. Do you have some idea to make it more stable?

Another thing: I uses pytorch 0.4.1 but didn't experience this #15, and if I update it to 1.0.1, I'll experience another error pytorch/pytorch#13273 (
_queue_reduction() doesn't take
torch.distributed.ProcessGroupNCCL object).

Best.
Yilin

Why SRC < TGT ?

XLM/get-data-nmt.sh

Line 52 in 14fe2d4

if [ "$SRC" \> "$TGT" ]; then echo "please ensure SRC < TGT"; exit; fi

Hi @glample,
Can you explain why do you make this assumption "SRC < TGT"?
I noticed it also in:

XLM/src/data/loader.py

Line 307 in 14fe2d4

if src < tgt and ((src, tgt) in required_para or (tgt, src) in required_para)

XLM/src/evaluation/evaluator.py

Line 76 in 14fe2d4

_lang1, _lang2 = (lang1, lang2) if lang1 < lang2 else (lang2, lang1)

XLM/src/evaluation/evaluator.py

Line 95 in 14fe2d4

assert lang1 < lang2

Pretrained word embeddings

First, thanks for sharing your code!

I really appreciate it.

I have a question about pre-trained word embeddings for unsupervised NMT task.

While reviewing code, I could find out that you guys never used pre-trained word embeddings.
(since --reload_emb is empty)

If this is true that pre-trained word embeddings has not beed used, is there a specific reason for not using pre-trained word embeddings?

Thank You!

The BLEU decreased when train on Unsupervised NMT

Hi，@glample

I pre-trained a language model and use it to train on Unsupervised NMT，but the BLEU becomes lower and lower. Is there something wrong?

Details:
The language model:
INFO - 03/14/19 16:57:00 - 23:43:04 - ============ End of epoch 11 ============
INFO - 03/14/19 16:57:06 - 23:43:10 - epoch -> 11.000000
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mn_mlm_ppl -> 12.698742
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mn_mlm_acc -> 61.901453
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_zh_mlm_ppl -> 482.045657
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_zh_mlm_acc -> 24.392448
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mlm_ppl -> 247.372200
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mlm_acc -> 43.146951
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mn_mlm_ppl -> 34.794975
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mn_mlm_acc -> 52.602524
INFO - 03/14/19 16:57:06 - 23:43:10 - test_zh_mlm_ppl -> 124.785448
INFO - 03/14/19 16:57:06 - 23:43:10 - test_zh_mlm_acc -> 34.501062
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mlm_ppl -> 79.790211
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mlm_acc -> 43.551793

Unsupervised NMT:

python3.6.2 train.py --exp_name unsupMT_mnzh --dump_path ./dumped/ --exp_id '190315' --reload_model './dumped/my_mnzh_mlm/190313/best-valid_mlm_ppl.pth,./dumped/my_mnzh_mlm/190313/best-valid_mlm_ppl.pth' --data_path ./data/processed/mn-zh/ --lgs 'mn-zh' --ae_steps 'mn,zh' --bt_steps 'mn-zh-mn,zh-mn-zh' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.1 --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 768 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 1000 --batch_size 16 --max_batch_size 64 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001,weight_decay=0 --epoch_size 300000 --eval_bleu true --stopping_criterion 'valid_mn-zh_mt_bleu,10' --validation_metrics 'valid_mn-zh_mt_bleu'

INFO - 03/15/19 12:54:23 - 3:17:34 - ============ End of epoch 0 ============
INFO - 03/15/19 12:56:06 - 3:19:16 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.mn-zh.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.valid.txt : 0.180000
INFO - 03/15/19 12:58:15 - 3:21:25 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.zh-mn.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.valid.txt : 2.740000
INFO - 03/15/19 12:58:36 - 3:21:47 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.mn-zh.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.test.txt : 0.000000
INFO - 03/15/19 12:59:01 - 3:22:12 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.zh-mn.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.test.txt : 2.160000
INFO - 03/15/19 12:59:01 - 3:22:12 - epoch -> 0.000000
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_mn-zh_mt_ppl -> 6020.106288
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_mn-zh_mt_acc -> 9.684522
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_mn-zh_mt_bleu -> 0.180000
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_zh-mn_mt_ppl -> 146.305114
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_zh-mn_mt_acc -> 40.263721
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_zh-mn_mt_bleu -> 2.740000
INFO - 03/15/19 12:59:01 - 3:22:12 - test_mn-zh_mt_ppl -> 6059.479785
INFO - 03/15/19 12:59:01 - 3:22:12 - test_mn-zh_mt_acc -> 12.168889
INFO - 03/15/19 12:59:01 - 3:22:12 - test_mn-zh_mt_bleu -> 0.000000
INFO - 03/15/19 12:59:01 - 3:22:12 - test_zh-mn_mt_ppl -> 488.040713
INFO - 03/15/19 12:59:01 - 3:22:12 - test_zh-mn_mt_acc -> 34.044409
INFO - 03/15/19 12:59:01 - 3:22:12 - test_zh-mn_mt_bleu -> 2.160000

INFO - 03/16/19 06:23:05 - 20:46:16 - ============ End of epoch 5 ============
INFO - 03/16/19 06:25:41 - 20:48:51 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.mn-zh.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.valid.txt : 0.000000
INFO - 03/16/19 06:27:31 - 20:50:41 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.zh-mn.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.valid.txt : 0.280000
INFO - 03/16/19 06:27:58 - 20:51:09 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.mn-zh.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.test.txt : 0.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.zh-mn.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.test.txt : 0.920000
INFO - 03/16/19 06:28:22 - 20:51:33 - epoch -> 5.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_mn-zh_mt_ppl -> 9263.390210
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_mn-zh_mt_acc -> 7.963293
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_mn-zh_mt_bleu -> 0.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_zh-mn_mt_ppl -> 195.211674
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_zh-mn_mt_acc -> 36.910448
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_zh-mn_mt_bleu -> 0.280000
INFO - 03/16/19 06:28:22 - 20:51:33 - test_mn-zh_mt_ppl -> 9938.071239
INFO - 03/16/19 06:28:22 - 20:51:33 - test_mn-zh_mt_acc -> 6.666667
INFO - 03/16/19 06:28:22 - 20:51:33 - test_mn-zh_mt_bleu -> 0.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - test_zh-mn_mt_ppl -> 619.158340
INFO - 03/16/19 06:28:22 - 20:51:33 - test_zh-mn_mt_acc -> 32.541759
INFO - 03/16/19 06:28:22 - 20:51:33 - test_zh-mn_mt_bleu -> 0.920000

StreamDataset and bptt

Hi, thanks for your work. The following code line confused me:

XLM/src/data/dataset.py

Line 36 in 1bf99af

buffer[t_size - n_tokens:] = sent

XLM/src/data/dataset.py

Line 37 in 1bf99af

buffer = buffer.reshape((bs, n_batches * bptt)).T

Which means cut off complete sentences. What is the advantage of doing this? How to set Hyper-parameter bptt.

How can I use multi-GPU to train UNMT

I add --local_rank, but raise error.

SLURM job: False
Traceback (most recent call last):
File "train.py", line 322, in
main(params)
File "train.py", line 198, in main
init_distributed_mode(params)
File "XLM/src/slurm.py", line 110, in init_distributed_mode
params.global_rank = int(os.environ['RANK'])
File "/usr/lib/python3.5/os.py", line 725, in getitem
raise KeyError(key) from None
KeyError: 'RANK'

Performance of Unsupervised NMT with 5M monolingual data

Hi, @glample . Thank you for your nice contribution.

I have noticed the demo you released only uses 5M monolingual data. I have tried and it seems it can not achieve the accuracy paper reported, but i want to know what accuracy it will achieve under 5M monolingual data (just for reference). Can you provide some helps?

pretrain a model with the MLM objective

Hi, How many GPU are used when training a model with the MLM objective?

truecasing

Hi,

did you do truecasing/lowercasing in your MT experiments? From the code I can't find any signs of this.

Is there any specific reason to do / not do it?

Thanks

This line seem incorrect?

Hi, @glample . This line seems incorrect.

The arguments you feed is as:

XLM/src/model/transformer.py

Line 541 in f516509

 generated_hyps = [BeamHypotheses(beam_size, length_penalty, early_stopping, max_len) for _ in range(bs)] 

However, the function is defined as

XLM/src/model/transformer.py

Line 683 in f516509

def __init__(self, n_hyp, max_len, length_penalty, early_stopping):

The order of arguments seem incorrect.

lowercase_and_remove_accent.py: No such file

How can find I lowercase_and_remove_accent.py file?

RuntimeError: CUDA out of memory. Tried to allocate 498.50 MiB (GPU 0; 7.92 GiB total capacity; 6.74 GiB already allocated; 307.56 MiB free; 3.53 MiB cached)

Hi，@glample

I pretrained a model with the MLM objective for Mongolian and Chinese, but when I used the pretrained model for mn-zh Machine Translation, the error came. I tried reducing --batch_size from default 32 to 16, 8, 4, 2, and 1, but that didn't help. Could you have any good solutions for this to share？

The pretrained result is：
INFO - 02/28/19 09:47:19 - 1 day, 1:01:20 - ============ End of epoch 7 ============
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - epoch -> 7.000000
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mn_mlm_ppl -> 20.055305
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mn_mlm_acc -> 56.151420
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_zh_mlm_ppl -> 1813.456839
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_zh_mlm_acc -> 28.312303
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mlm_ppl -> 916.756072
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mlm_acc -> 42.231861
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mn_mlm_ppl -> 8.259349
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mn_mlm_acc -> 65.375485
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_zh_mlm_ppl -> 11569.002599
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_zh_mlm_acc -> 15.452244
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mlm_ppl -> 5788.630974
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mlm_acc -> 40.413864

Train on unsupervised MT from the pretrained model
python train.py --exp_name unsupMT_mnzh --dump_path ./dumped/ --reload_model 'best-valid_mlm_ppl.pth,best-valid_mlm_ppl.pth' --data_path ./data/processed/mn-zh/ --lgs 'mn-zh' --ae_steps 'mn,zh' --bt_steps 'mn-zh-mn,zh-mn-zh' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.1 --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 2000 --batch_size 16 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.999,lr=0.0001 --epoch_size 300000 --eval_bleu true --stopping_criterion 'valid_mn-zh_mt_bleu,10' --validation_metrics 'valid_mn-zh_mt_bleu'

The result becomes 0 at the end of second epoch when I pretrain a model with the MLM objective for Mongolian and Chinese

Hi，@glample

The result becomes 0 at the end of second epoch when I pretrain a model with the MLM objective for Mongolian and Chinese. Is the preprocessing method inappropriate?

details:
python train.py --exp_name 'my_mnzh_mlm' --dump_path './dumped/' --exp_id '190225' --data_path './data/processed/mn-zh/' --lgs 'mn-zh' --clm_steps '' --mlm_steps 'mn,zh' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.2' --attention_dropout '0.2' --gelu_activation 'true' --batch_size '16' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '300000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'

python train.py --exp_name 'my_mnzh_mlm' --dump_path './dumped/' --exp_id '190225' --data_path './data/processed/mn-zh/' --lgs 'mn-zh' --clm_steps '' --mlm_steps 'mn,zh' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.2' --attention_dropout '0.2' --gelu_activation 'true' --batch_size '16' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '300000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'

INFO - 02/25/19 13:21:37 - 3:07:50 - ============ End of epoch 0 ============
INFO - 02/25/19 13:21:48 - 3:08:01 - epoch -> 0.000000
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mn_mlm_ppl -> 574.678424
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mn_mlm_acc -> 17.192429
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_zh_mlm_ppl -> 5591.294827
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_zh_mlm_acc -> 14.550473
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mlm_ppl -> 3082.986625
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mlm_acc -> 15.871451
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mn_mlm_ppl -> 436.168551
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mn_mlm_acc -> 13.728215
INFO - 02/25/19 13:21:48 - 3:08:01 - test_zh_mlm_ppl -> 32195.137737
INFO - 02/25/19 13:21:48 - 3:08:01 - test_zh_mlm_acc -> 7.138838
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mlm_ppl -> 16315.653144
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mlm_acc -> 10.433527

INFO - 02/25/19 16:29:17 - 6:15:30 - ============ End of epoch 1 ============
INFO - 02/25/19 16:29:28 - 6:15:41 - epoch -> 1.000000
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mn_mlm_ppl -> 966.486405
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mn_mlm_acc -> 7.886435
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_zh_mlm_ppl -> 8967.092445
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_zh_mlm_acc -> 0.000000
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mlm_ppl -> 4966.789425
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mlm_acc -> 3.943218
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mn_mlm_ppl -> 808.229061
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mn_mlm_acc -> 12.853917
INFO - 02/25/19 16:29:28 - 6:15:41 - test_zh_mlm_ppl -> 43495.881859
INFO - 02/25/19 16:29:28 - 6:15:41 - test_zh_mlm_acc -> 0.000000
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mlm_ppl -> 22152.055460
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mlm_acc -> 6.426958

Not able to learn with sinusoidal embeddings.

Hi,
I ran the MLM pretraining for en-fr using the default arguments.
I noticed that while I was able to learn using the learnt embeddings, using sinusoidal embeddings completely fails to learn and the validation accuracy stays around 5%.

Did you face similar issues while using sin embeddings too?

Thanks!

trainging model for languages with great differences

Thanks for you work. I have a question. When I training for languages with great differences, such as Chinese-English, English-Kazakh. Is it a good choice to share all parameters? I notice that XLM usually share all parameters.

Translation script

Hello! Do you happen to have a translate.py script so that the model can be used to translate new data? I saw the --eval_only parameter, but it seems that the file to be translated has to be named according to the naming conventions specified in the trainer (and the data folder has to contain all the training/validation files too). The evaluator also appears to be using the target language file to get the maximum sentence length, which we shouldn't have access to when translating a new document.

Thanks for your help!

What is the meaning of binarize dataset?

How to save fine-tune models for XNLI task?

Hi,
I ran XNLI fine-tuning task (with MLM+TLM) and got an average accuracy of 73.5 (compared to 75.1 in your paper). The code generated params.pkl, however, I could not find the fine-tuned model. How do I save the model after fine-tuning (or after every epoch of fine-tuning)?

Reloading model and params from Checkpoint

Hi,
How can I reload the checkpoint and model file in order to continue from the last epoch I have reached in previous (aborted) running ? I want to do this in the pretrain stage and also in the train stage

Thanks,
Odel

Is there a plan releasing the XNLI-15 MLM model?

Hi, thanks for your work. Can you also release the XNLI-15 model trained only with mlm objective?

loss.backward is blocked

Hi,
Thanks a lot for the awesome project~
I append a MLP after XLM sentence embedding to build a QA model. But after run several step(single GPU, 21000 step, 8 batch size), it is blocked on loss.backward step, without any error message. If run on 4 GPU, it will blocked sooner (like 4200 step, 4*8 batch size). Could you please give some hint how can I fix this bug?
Thanks a lot~

FP 16 Training for mt and bt steps.

Hi, I noticed in the code that fp16 training is disabled manually for machine translation and back translation updates by putting assert false statements.

Specifically I am trying to use the MT step. I commented the assert statement and added retain_graph=True in the first backward call. But I noticed that after doing this my throughput was actually lower than without fp16 enabled.

Can you help me with correctly setting up the fp16 training for mt step?

Bug with file path in get-data-xnli.sh

Hi, There were couple of bugs in the "get-data-xnli.sh" script related to file path. The following is the fix:

comment "mkdir -p $XNLI_PATH" (line 29) -- creating this directory prevents downloading XNLI-1.0.zip
replace
mkdir -p $PROCESSED_PATH/eval/XNLI
rm $PROCESSED_PATH/eval/XNLI/*. -- getting error "cannot remove...no such file..."
with
if [ -d $PROCESSED_PATH/eval/XNLI ]; then
rm -rf $PROCESSED_PATH/eval/XNLI
fi
mkdir -p $PROCESSED_PATH/eval/XNLI

reloading decoder from mlm_1024.pth

Hi, thanks for your work. When I reloading decoder from mlm_1024.pth that has been pretrained, the warnings are rised as follow:

WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter layer_norm15.0.weight not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter layer_norm15.0.bias not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter encoder_attn.0.q_lin.weight not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter encoder_attn.0.q_lin.bias not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter encoder_attn.0.k_lin.weight not found.
...

XLM/src/model/__init__.py

Line 151 in ffc64a9

if dec_path != '':

My training is for unsupervised NMT. Is is the normal? How to fix it? Thank you very much.

Subsampling frequent outputs

Hi,

thanks for sharing your code!
I'm just wondering if you have implemented the subsampling of frequent outputs (can't find it in your code) and if it was crucial for the performance.

Cheers,
Stephan

Memory is not released

Hi, when the program ends, the memory of the GPU0 is released, but the other GPUs are not released. Why that ?

Example for Supervised Machine Translation

Thanks for sharing this great work. Could you share the examples for supervised machine translation? Thank you.

loss of paddding

Hi, do we need to ignore padding's loss when we do back-translation? It seems that the code doesn't ignore the padding when we calculate loss. Thank you very much.

XLM/src/trainer.py

Line 824 in 14fe2d4

 pred_mask = alen[:, None] < len1[None] - 1 # do not predict anything given the last target word 

Can the TLM model also impove the NMT performace?

Question About Performance

The paper shows the best en-fr bleu is 33.4. The readme.md shows
'epoch -> 7
valid_fr-en_mt_bleu -> 28.36
valid_en-fr_mt_bleu -> 30.50
test_fr-en_mt_bleu -> 34.02
test_en-fr_mt_bleu -> 36.62'.
Does this result from the max_len parameter which removes the long sentences from parallel test corpus?

TypeError: cross_entropy() got an unexpected keyword argument 'reduction'

Hi, @glample

I trained with a single GPU and got an err just like the them shown.
Running command:
First: ./get-data-nmt.sh --src en --tgt fr
got:
===== Data summary
Monolingual training data:
en: ./data/processed/en-fr/train.en.pth
fr: ./data/processed/en-fr/train.fr.pth
Monolingual validation data:
en: ./data/processed/en-fr/valid.en.pth
fr: ./data/processed/en-fr/valid.fr.pth
Monolingual test data:
en: ./data/processed/en-fr/test.en.pth
fr: ./data/processed/en-fr/test.fr.pth
Parallel validation data:
en: ./data/processed/en-fr/valid.en-fr.en.pth
fr: ./data/processed/en-fr/valid.en-fr.fr.pth
Parallel test data:
en: ./data/processed/en-fr/test.en-fr.en.pth
fr: ./data/processed/en-fr/test.en-fr.fr.pth
And then run: python train.py --exp_name 'my_enfr_mlm' --dump_path './dumped/' --exp_id 'bs.20' --data_path './data/processed/en-fr/' --lgs 'en-fr' --clm_steps '' --mlm_steps 'en,fr' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.1' --attention_dropout '0.1' --gelu_activation 'true' --batch_size '8' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '300000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'

got the err.

Could you provide english-chinese pretrained-model?

Is there a plan releasing the TLM model?

Cannot get good results if I train from original data and script

Hi, When I ran the translation task, I met a problem. I can get the similar result if l load your mlm_enfr_1024.pth. But I cannot get good result if I start from your get-data-nmt.sh for both de-en, en-fr cases.

details:
Running command: python train.py --exp_name 'my_enfr_mlm' --dump_path './dumped/' --exp_id 'bs.20' --data_path './data/processed/en-fr/' --lgs 'en-fr' --clm_steps '' --mlm_steps 'en,fr' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.1' --attention_dropout '0.1' --gelu_activation 'true' --batch_size '32' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '200000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'

INFO - 02/20/19 17:14:59 - 0:50:54 - valid_en_mlm_ppl -> 1413.372916
INFO - 02/20/19 17:14:59 - 0:50:54 - log:{"epoch": 0, "valid_en_mlm_ppl": 1413.3729161899485, "valid_en_mlm_acc": 4.681079149544399, "valid_fr_mlm_ppl": 1137.9702763241598, "valid_fr_mlm_acc": 4.591462520170163, "valid_mlm_ppl": 1275.6715962570543, "valid_mlm_acc": 4.636270834857281, "test_en_mlm_ppl": 1377.6397512089368, "test_en_mlm_acc": 4.500805152979066, "test_fr_mlm_ppl": 1547.092026693417, "test_fr_mlm_acc": 4.81150066011442, "test_mlm_ppl": 1462.3658889511769, "test_mlm_acc": 4.656152906546742}
INFO - 02/20/19 18:05:31 - 1:41:26 - valid_en_mlm_ppl -> 2161.567965
INFO - 02/20/19 18:05:31 - 1:41:26 - log:{"epoch": 1, "valid_en_mlm_ppl": 2161.56796481175, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1688.979616470098, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1925.2737906409238, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2062.9860141920476, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2497.6693821048448, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2280.327698148446, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 18:56:00 - 2:31:55 - valid_en_mlm_ppl -> 2245.817440
INFO - 02/20/19 18:56:00 - 2:31:55 - log:{"epoch": 2, "valid_en_mlm_ppl": 2245.8174404810325, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1625.404408585545, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1935.6109245332887, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2138.2897057505943, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2388.5677765876662, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2263.4287411691303, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 19:46:31 - 3:22:26 - valid_en_mlm_ppl -> 2165.622311
INFO - 02/20/19 19:46:31 - 3:22:26 - log:{"epoch": 3, "valid_en_mlm_ppl": 2165.6223114703407, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1680.1268854516293, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1922.874598460985, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2075.5851921823105, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2465.9347158442074, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2270.7599540132587, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 20:37:00 - 4:12:55 - valid_en_mlm_ppl -> 2062.631943
INFO - 02/20/19 20:37:00 - 4:12:55 - log:{"epoch": 4, "valid_en_mlm_ppl": 2062.6319433943568, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1765.4204690043236, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1914.0262061993403, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 1966.636764557332, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2606.315150449565, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2286.4759575034486, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 21:27:28 - 5:03:23 - valid_en_mlm_ppl -> 2151.624741
INFO - 02/20/19 21:27:28 - 5:03:23 - log:{"epoch": 5, "valid_en_mlm_ppl": 2151.624740528933, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1690.7461604349478, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1921.1854504819405, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2054.5326346790675, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2479.448594677353, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2266.9906146782105, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 22:17:56 - 5:53:51 - valid_en_mlm_ppl -> 2155.638091
INFO - 02/20/19 22:17:56 - 5:53:51 - log:{"epoch": 6, "valid_en_mlm_ppl": 2155.6380909977584, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1699.0517872173994, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1927.3449391075787, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2053.9586330892766, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2483.16693279636, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2268.5627829428186, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 23:08:23 - 6:44:18 - valid_en_mlm_ppl -> 2133.608678
INFO - 02/20/19 23:08:23 - 6:44:18 - log:{"epoch": 7, "valid_en_mlm_ppl": 2133.608678409897, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1695.3582695161938, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1914.4834739630455, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2038.1278812563512, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2492.9029435971656, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2265.5154124267583, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 23:58:51 - 7:34:46 - valid_en_mlm_ppl -> 2065.049633
INFO - 02/20/19 23:58:51 - 7:34:46 - log:{"epoch": 8, "valid_en_mlm_ppl": 2065.049632547123, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1770.2985750724292, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1917.6741038097762, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 1973.5921541087191, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2588.5655595835324, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2281.0788568461257, "test_mlm_acc": 4.0700737027375675}
INFO - 02/21/19 00:49:20 - 8:25:15 - valid_en_mlm_ppl -> 2177.331599
INFO - 02/21/19 00:49:20 - 8:25:15 - log:{"epoch": 9, "valid_en_mlm_ppl": 2177.331599451264, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1664.960476646684, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1921.1460380489739, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2081.1290653201354, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2436.2827245826775, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2258.7058949514067, "test_mlm_acc": 4.0700737027375675}
INFO - 02/21/19 01:39:46 - 9:15:41 - valid_en_mlm_ppl -> 2110.860061
INFO - 02/21/19 01:39:46 - 9:15:41 - log:{"epoch": 10, "valid_en_mlm_ppl": 2110.8600607294125, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1716.5880506037283, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1913.7240556665704, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2007.549178045412, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2522.7412353839986, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2265.145206714705, "test_mlm_acc": 4.0700737027375675}
INFO - 02/21/19 02:30:13 - 10:06:08 - valid_en_mlm_ppl -> 2208.660441
INFO - 02/21/19 02:30:13 - 10:06:08 - log:{"epoch": 11, "valid_en_mlm_ppl": 2208.6604406115257, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1656.203270846642, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1932.431855729084, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2111.8613551170783, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2405.011263807759, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2258.4363094624186, "test_mlm_acc": 4.0700737027375675}

INFO - 02/20/19 16:24:05 - 0:00:00 - ============ Monolingual data (en)
INFO - 02/20/19 16:24:05 - 0:00:00 - Loading data from ./data/processed/en-fr/train.en.pth ...
INFO - 02/20/19 16:24:06 - 0:00:01 - 129033877 words (64139 unique) in 5000000 sentences. 0 unknown words (0 unique) covering 0.00% of the data.

INFO - 02/20/19 16:24:08 - 0:00:03 - Loading data from ./data/processed/en-fr/valid.en.pth ...
INFO - 02/20/19 16:24:08 - 0:00:03 - 69727 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.

INFO - 02/20/19 16:24:08 - 0:00:03 - Loading data from ./data/processed/en-fr/test.en.pth ...
INFO - 02/20/19 16:24:09 - 0:00:03 - 76017 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.

INFO - 02/20/19 16:24:09 - 0:00:04 - ============ Monolingual data (fr)
INFO - 02/20/19 16:24:09 - 0:00:04 - Loading data from ./data/processed/en-fr/train.fr.pth ...
INFO - 02/20/19 16:24:09 - 0:00:04 - 130884578 words (64139 unique) in 5000000 sentences. 0 unknown words (0 unique) covering 0.00% of the data.

INFO - 02/20/19 16:24:12 - 0:00:06 - Loading data from ./data/processed/en-fr/valid.fr.pth ...
INFO - 02/20/19 16:24:12 - 0:00:07 - 79585 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.

INFO - 02/20/19 16:24:12 - 0:00:07 - Loading data from ./data/processed/en-fr/test.fr.pth ...
INFO - 02/20/19 16:24:12 - 0:00:07 - 86351 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.

INFO - 02/20/19 16:24:13 - 0:00:08 - ============ Data summary
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - train - en: 5000000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - valid - en: 3000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - test - en: 3003
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - train - fr: 5000000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - valid - fr: 3000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - test - fr: 3003

How is XLM different from Multilingual BERT?

Great work and article! Just curious how is XLM different from multilingual BERT (https://github.com/google-research/bert/blob/master/multilingual.md) regarding pre-training objectives and methods?

Hyperparameters for replicating supervised MT ro -> en result.

Hi,

I am trying to replicate the supervised MT ro -> en baseline of 28.4 mentioned in the paper. I was hoping that you could give me some idea about the hyperparameters for that.
Specifically can you tell me the values of #of BPE operations, learning rate and learning rate schedule used, dropout and attention dropout values, embedding size of the network, batch size and # of gpus used during training.

Thanks!

Address already in use

I tried to run several multi-gpu programs on a single server.But I countered this problem
RuntimeError: Address already in use at /pytorch/torch/lib/THD/process_group/General.cpp:17
So, if I have 4 GPUS on a single server and want to run two programs on GPU 0,1 and 2,3, how can I set the parameter local_rank and master_port? @glample

How to get train/evaluation data about GLUE task

Thank you for the great contribution
Is there any script to download data of GLUE tasks?

couldn't match SOTA performance on wmt14 EnDe

Dear authors,

I understand this repo isn't very much for supervised MT. But your codebase contains Transformer Enc-Dec model and more importantly it is much simpler than standard supervised MT codebase (e.g. T2T, Fairseq, OpenNMT).

With the intention to reproduce wmt14 EnDe SOTA performance, I use the data & BPE from Fairseq, train the Transformer base (emb_dim=512) w/ only mt_step="en-de" on 4x 2080 Ti (one gpu even lower). And finally got a tokenized BLEU score of 25.63 w/ beam_size 4, length_penalty 0.6. It's more than 1 BLEU lower than reported in Transformer paper.

Training script:
export NGPU=4; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py --exp_name wmt14_ende --dump_path ./dumped/ --data_path ./data/processed/wmt14_de-en/fairseq --lgs 'en-de' --encoder_only false --emb_dim 512 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 6000 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --eval_bleu true --stopping_criterion 'valid_en-de_mt_bleu,10' --validation_metrics 'valid_en-de_mt_bleu' --mt_steps "en-de" --gpus '0,1,2,3'

Translate results:

valid_en-de_mt_ppl-> 5.401580
valid_en-de_mt_acc -> 65.806969
valid_en-de_mt_bleu -> 28.990000
test_en-de_mt_ppl -> 5.942769
test_en-de_mt_acc -> 66.605212
test_en-de_mt_bleu -> 25.630000

My intuition is the model structure is slightly different (gelu, layer_norm etc.). May I ask you have you tried it with supervised MT wmt14 benchmark, and what's your thoughts on this?

Best.

Error when using multi-GPU for training MT only

I tried to train a machine translation model using parallel data only. The script I used for training is as follows:

export NGPU=4; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
        --exp_name supMT_deen \
        --dump_path ./checkpoints/ \
        --data_path /unsullied/sharefs/zhaoyuekai/data/WMT/corpus/de-en/processed/ \
        --lgs 'de-en' \
        --mt_steps 'de-en' \
        --lambda_mt '0:1,100000:0.1,300000:0' \
         --encoder_only false \
        --emb_dim 1024 \
        --n_layers 6 \
         --n_heads 8 \
         --dropout 0.1 \
         --attention_dropout 0.1 \
         --gelu_activation true  \
         --tokens_per_batch 2000 \
         --batch_size 32 \
         --bptt 256 \
         --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
         --epoch_size 200000 \
         --eval_bleu true \
         --stopping_criterion 'valid_en-fr_mt_bleu,10' \
         --validation_metrics 'valid_en-fr_mt_bleu'

When training on only one GPU, no error was reported, however when I tried to train it on 4 GPUs, following error was encountered.

Traceback (most recent call last):
  File "train.py", line 341, in <module>
Traceback (most recent call last):
  File "train.py", line 341, in <module>
Traceback (most recent call last):
  File "train.py", line 341, in <module>
Traceback (most recent call last):
  File "train.py", line 341, in <module>
    main(params)
  File "train.py", line 300, in main
    main(params)
  File "train.py", line 300, in main
    main(params)
  File "train.py", line 300, in main
    trainer.mt_step(lang1, lang2, params.lambda_mt)
  File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 770, in mt_step
    trainer.mt_step(lang1, lang2, params.lambda_mt)
  File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 770, in mt_step
    trainer.mt_step(lang1, lang2, params.lambda_mt)
  File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 770, in mt_step
        self.optimize(loss, ['encoder', 'decoder'])self.optimize(loss, ['encoder', 'decoder'])

  File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 131, in optimize
  File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 131, in optimize
    self.optimize(loss, ['encoder', 'decoder'])
  File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 131, in optimize
    main(params)
  File "train.py", line 300, in main
        loss.backward()loss.backward()

  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
    loss.backward()
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)torch.autograd.backward(self, gradient, retain_graph, create_graph)

  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
        allow_unreachable=True)  # allow_unreachable flagallow_unreachable=True)  # allow_unreachable flag

  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
    allow_unreachable=True)  # allow_unreachable flag
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
        self._queue_reduction(bucket_idx)self._queue_reduction(bucket_idx)

  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
    self._queue_reduction(bucket_idx)
  File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
    self.device_ids)
TypeError    : self.device_ids)_queue_reduction(): incompatible function arguments. The following argument types are supported:
    1. (process_group: torch.distributed.ProcessGroup, grads_batch: List[List[at::Tensor]], devices: List[int]) -> Tuple[torch.distributed.Work, at::Tensor]

weird codes in Evaluator.get_iterator

Hi,

I just found out a weird piece at:

XLM/src/evaluation/evaluator.py

Lines 52 to 56 in 20c338e

 if len(self.params.langs) > 30: 

 eval_lgs = set(["ar", "bg", "de", "el", "en", "es", "fr", "hi", "ru", "sw", "th", "tr", "ur", "vi", "zh", "ab", "ay", "bug", "ha", "ko", "ln", "min", "nds", "pap", "pt", "tg", "to", "udm", "uk", "zh_classical"]) 

 eval_lgs = set(["ar", "bg", "de", "el", "en", "es", "fr", "hi", "ru", "sw", "th", "tr", "ur", "vi", "zh"]) 

 subsample = 10 if (data_set == 'test' or lang1 not in eval_lgs) else 5 

 n_sentences = 600 if (data_set == 'test' or lang1 not in eval_lgs) else 1500

If possible, may I ask the intuition behind this "hack".

Thanks.

Question About Decoder

How does the decoder know which direction go towards(lang1 or lang2) when input language is lang1?In other words, how does the decoder know which state it is at , DAE or MT ?
In the previous version(UNMT), it uses different project layers. In XLM, self.pred_layer is always same. @glample

info supported languages

Why Italian is not supported, almost in any lm?

	if len(self.params.langs) > 30:
	eval_lgs = set(["ar", "bg", "de", "el", "en", "es", "fr", "hi", "ru", "sw", "th", "tr", "ur", "vi", "zh", "ab", "ay", "bug", "ha", "ko", "ln", "min", "nds", "pap", "pt", "tg", "to", "udm", "uk", "zh_classical"])
	eval_lgs = set(["ar", "bg", "de", "el", "en", "es", "fr", "hi", "ru", "sw", "th", "tr", "ur", "vi", "zh"])
	subsample = 10 if (data_set == 'test' or lang1 not in eval_lgs) else 5
	n_sentences = 600 if (data_set == 'test' or lang1 not in eval_lgs) else 1500

facebookresearch / xlm Goto Github PK

xlm's Issues

Hi, When I ran the translation task, I met a problem. I can get the similar result if l load your mlm_enfr_1024.pth. But I cannot get good result if I start from your get-data-nmt.sh for both de-en, en-fr cases.

Recommend Projects

Recommend Topics

Recommend Org