facebookresearch / xlm Goto Github PK
View Code? Open in Web Editor NEWPyTorch original implementation of Cross-lingual Language Model Pretraining.
License: Other
PyTorch original implementation of Cross-lingual Language Model Pretraining.
License: Other
Hello! I have been running your translate.py script and have been running into this error on a particular line of my input file, containing a BPE-ised URL but other than that nothing particular (only 13 subwords long) The error occurs with the following line of code:
decoded,
dec_lengths = decoder.generate(encoded, lengths.cuda(), params.tgt_id, max_len=int(1.5 * lengths.max().item() + 10))`
Do you have any suggestions about what might be causing this error and how it could be fixed? Thank you very much in advance!
In the paper, Section 5.1, "We sample the learning rate of the Adam optimizer with values from 5.10−4 to 2.10−4" but on the github it is "5.10-6" .
Hi,
thanks for releasing the code for Cross-lingual Language Model Pretraining ❤️
I would like to know, if it's possible to: encode a whole sentence and get the embeddings for each token (or better subword). The notebook contains only example of how to encode a sentence, but could you also provide a way to get the embeddings for each subword?
Thanks :)
Hi, I noticed that whether it is unsupervised NMT training or MLM training, the learning rate is 0.0001. Is this the learning rate when training with 8 GPUs? If I use 4 GPUs, how to adjust the learning rate and warm-up? Thank you very much.
I look into the notebook to embed a sentence.
https://github.com/facebookresearch/XLM/blob/master/generate-embeddings.ipynb
In the cell In[6]
there is this code.
sentences = [(('</s> %s </s>' % sent.strip()).split(), lang) for sent, lang in sentences]
Isn't it supposed to be '<s> %s </s>' instead? Do I misunderstand something?
Hello!
Thank you for sharing this code!
Is there an easy way to get the embedding of a particular word?
Those found in table 5. of the paper.
Thank you!
Dear authors,
Thank you so much for your codes. I'm trying to reproduce supervised MT results on wmt14 en-de. The training works fine with single(multi)-gpu. However, I frequently experience OOM error after one epoch and during evaluate_mt() step. Here's the script I used and the error message:
python train.py --exp_name wmt14_ende --dump_path ./dumped/ --data_path ./data/processed/wmt14_de-en --lgs 'en-de' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 2000 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --eval_bleu true --stopping_criterion 'valid_en-de_mt_bleu,10' --validation_metrics 'valid_en-de_mt_bleu' --mt_steps "en-de" --gpus '0'
(--gpus just indicates the gpuid to use)
Traceback (most recent call last):
File "train.py", line 325, in
main(params)
File "train.py", line 300, in main
scores = evaluator.run_all_evals(trainer)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/evaluation/evaluator.py", line 181, in run_all_evals
self.evaluate_mt(scores, data_set, lang1, lang2, eval_bleu)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/evaluation/evaluator.py", line 377, in evaluate_mt
word_scores, loss = decoder('predict', tensor=dec2, pred_mask=pred_mask, y=y, get_scores=True)
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/model/transformer.py", line 313, in forward
return self.predict(**kwargs)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/model/transformer.py", line 416, in predict
scores, loss = self.pred_layer(masked_tensor, y, get_scores)
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/nfs/eecs-fserv/share/yangyil/XLM/src/model/transformer.py", line 132, in forward
loss = F.cross_entropy(scores, y, reduction='elementwise_mean')
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/nfs/stak/users/yangyil/shared/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 975, in log_softmax
return input.log_softmax(dim)
RuntimeError: CUDA error: out of memory
The OOM always happens within F.cross_entropy(), although cross_entropy doesn't always trigger OOM. Do you have some idea to make it more stable?
Another thing: I uses pytorch 0.4.1 but didn't experience this #15, and if I update it to 1.0.1, I'll experience another error pytorch/pytorch#13273 (
_queue_reduction() doesn't take
torch.distributed.ProcessGroupNCCL object).
Best.
Yilin
Line 52 in 14fe2d4
Hi @glample,
Can you explain why do you make this assumption "SRC < TGT"?
I noticed it also in:
Line 307 in 14fe2d4
XLM/src/evaluation/evaluator.py
Line 76 in 14fe2d4
XLM/src/evaluation/evaluator.py
Line 95 in 14fe2d4
First, thanks for sharing your code!
I really appreciate it.
I have a question about pre-trained word embeddings for unsupervised NMT task.
While reviewing code, I could find out that you guys never used pre-trained word embeddings.
(since --reload_emb is empty)
If this is true that pre-trained word embeddings has not beed used, is there a specific reason for not using pre-trained word embeddings?
Thank You!
Hi,@glample
I pre-trained a language model and use it to train on Unsupervised NMT,but the BLEU becomes lower and lower. Is there something wrong?
Details:
The language model:
INFO - 03/14/19 16:57:00 - 23:43:04 - ============ End of epoch 11 ============
INFO - 03/14/19 16:57:06 - 23:43:10 - epoch -> 11.000000
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mn_mlm_ppl -> 12.698742
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mn_mlm_acc -> 61.901453
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_zh_mlm_ppl -> 482.045657
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_zh_mlm_acc -> 24.392448
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mlm_ppl -> 247.372200
INFO - 03/14/19 16:57:06 - 23:43:10 - valid_mlm_acc -> 43.146951
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mn_mlm_ppl -> 34.794975
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mn_mlm_acc -> 52.602524
INFO - 03/14/19 16:57:06 - 23:43:10 - test_zh_mlm_ppl -> 124.785448
INFO - 03/14/19 16:57:06 - 23:43:10 - test_zh_mlm_acc -> 34.501062
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mlm_ppl -> 79.790211
INFO - 03/14/19 16:57:06 - 23:43:10 - test_mlm_acc -> 43.551793
Unsupervised NMT:
python3.6.2 train.py --exp_name unsupMT_mnzh --dump_path ./dumped/ --exp_id '190315' --reload_model './dumped/my_mnzh_mlm/190313/best-valid_mlm_ppl.pth,./dumped/my_mnzh_mlm/190313/best-valid_mlm_ppl.pth' --data_path ./data/processed/mn-zh/ --lgs 'mn-zh' --ae_steps 'mn,zh' --bt_steps 'mn-zh-mn,zh-mn-zh' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.1 --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 768 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 1000 --batch_size 16 --max_batch_size 64 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001,weight_decay=0 --epoch_size 300000 --eval_bleu true --stopping_criterion 'valid_mn-zh_mt_bleu,10' --validation_metrics 'valid_mn-zh_mt_bleu'
INFO - 03/15/19 12:54:23 - 3:17:34 - ============ End of epoch 0 ============
INFO - 03/15/19 12:56:06 - 3:19:16 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.mn-zh.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.valid.txt : 0.180000
INFO - 03/15/19 12:58:15 - 3:21:25 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.zh-mn.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.valid.txt : 2.740000
INFO - 03/15/19 12:58:36 - 3:21:47 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.mn-zh.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.test.txt : 0.000000
INFO - 03/15/19 12:59:01 - 3:22:12 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp0.zh-mn.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.test.txt : 2.160000
INFO - 03/15/19 12:59:01 - 3:22:12 - epoch -> 0.000000
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_mn-zh_mt_ppl -> 6020.106288
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_mn-zh_mt_acc -> 9.684522
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_mn-zh_mt_bleu -> 0.180000
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_zh-mn_mt_ppl -> 146.305114
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_zh-mn_mt_acc -> 40.263721
INFO - 03/15/19 12:59:01 - 3:22:12 - valid_zh-mn_mt_bleu -> 2.740000
INFO - 03/15/19 12:59:01 - 3:22:12 - test_mn-zh_mt_ppl -> 6059.479785
INFO - 03/15/19 12:59:01 - 3:22:12 - test_mn-zh_mt_acc -> 12.168889
INFO - 03/15/19 12:59:01 - 3:22:12 - test_mn-zh_mt_bleu -> 0.000000
INFO - 03/15/19 12:59:01 - 3:22:12 - test_zh-mn_mt_ppl -> 488.040713
INFO - 03/15/19 12:59:01 - 3:22:12 - test_zh-mn_mt_acc -> 34.044409
INFO - 03/15/19 12:59:01 - 3:22:12 - test_zh-mn_mt_bleu -> 2.160000
INFO - 03/16/19 06:23:05 - 20:46:16 - ============ End of epoch 5 ============
INFO - 03/16/19 06:25:41 - 20:48:51 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.mn-zh.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.valid.txt : 0.000000
INFO - 03/16/19 06:27:31 - 20:50:41 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.zh-mn.valid.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.valid.txt : 0.280000
INFO - 03/16/19 06:27:58 - 20:51:09 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.mn-zh.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.mn-zh.test.txt : 0.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - BLEU ./dumped/unsupMT_mnzh/190315/hypotheses/hyp5.zh-mn.test.txt ./dumped/unsupMT_mnzh/190315/hypotheses/ref.zh-mn.test.txt : 0.920000
INFO - 03/16/19 06:28:22 - 20:51:33 - epoch -> 5.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_mn-zh_mt_ppl -> 9263.390210
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_mn-zh_mt_acc -> 7.963293
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_mn-zh_mt_bleu -> 0.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_zh-mn_mt_ppl -> 195.211674
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_zh-mn_mt_acc -> 36.910448
INFO - 03/16/19 06:28:22 - 20:51:33 - valid_zh-mn_mt_bleu -> 0.280000
INFO - 03/16/19 06:28:22 - 20:51:33 - test_mn-zh_mt_ppl -> 9938.071239
INFO - 03/16/19 06:28:22 - 20:51:33 - test_mn-zh_mt_acc -> 6.666667
INFO - 03/16/19 06:28:22 - 20:51:33 - test_mn-zh_mt_bleu -> 0.000000
INFO - 03/16/19 06:28:22 - 20:51:33 - test_zh-mn_mt_ppl -> 619.158340
INFO - 03/16/19 06:28:22 - 20:51:33 - test_zh-mn_mt_acc -> 32.541759
INFO - 03/16/19 06:28:22 - 20:51:33 - test_zh-mn_mt_bleu -> 0.920000
I add --local_rank, but raise error.
SLURM job: False
Traceback (most recent call last):
File "train.py", line 322, in
main(params)
File "train.py", line 198, in main
init_distributed_mode(params)
File "XLM/src/slurm.py", line 110, in init_distributed_mode
params.global_rank = int(os.environ['RANK'])
File "/usr/lib/python3.5/os.py", line 725, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
Hi, @glample . Thank you for your nice contribution.
I have noticed the demo you released only uses 5M monolingual data. I have tried and it seems it can not achieve the accuracy paper reported, but i want to know what accuracy it will achieve under 5M monolingual data (just for reference). Can you provide some helps?
Hi, How many GPU are used when training a model with the MLM objective?
Hi,
did you do truecasing/lowercasing in your MT experiments? From the code I can't find any signs of this.
Is there any specific reason to do / not do it?
Thanks
How can find I lowercase_and_remove_accent.py file?
Hi,@glample
I pretrained a model with the MLM objective for Mongolian and Chinese, but when I used the pretrained model for mn-zh Machine Translation, the error came. I tried reducing --batch_size from default 32 to 16, 8, 4, 2, and 1, but that didn't help. Could you have any good solutions for this to share?
The pretrained result is:
INFO - 02/28/19 09:47:19 - 1 day, 1:01:20 - ============ End of epoch 7 ============
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - epoch -> 7.000000
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mn_mlm_ppl -> 20.055305
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mn_mlm_acc -> 56.151420
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_zh_mlm_ppl -> 1813.456839
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_zh_mlm_acc -> 28.312303
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mlm_ppl -> 916.756072
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - valid_mlm_acc -> 42.231861
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mn_mlm_ppl -> 8.259349
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mn_mlm_acc -> 65.375485
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_zh_mlm_ppl -> 11569.002599
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_zh_mlm_acc -> 15.452244
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mlm_ppl -> 5788.630974
INFO - 02/28/19 09:47:30 - 1 day, 1:01:31 - test_mlm_acc -> 40.413864
Train on unsupervised MT from the pretrained model
python train.py --exp_name unsupMT_mnzh --dump_path ./dumped/ --reload_model 'best-valid_mlm_ppl.pth,best-valid_mlm_ppl.pth' --data_path ./data/processed/mn-zh/ --lgs 'mn-zh' --ae_steps 'mn,zh' --bt_steps 'mn-zh-mn,zh-mn-zh' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.1 --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 2000 --batch_size 16 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.999,lr=0.0001 --epoch_size 300000 --eval_bleu true --stopping_criterion 'valid_mn-zh_mt_bleu,10' --validation_metrics 'valid_mn-zh_mt_bleu'
Hi,@glample
The result becomes 0 at the end of second epoch when I pretrain a model with the MLM objective for Mongolian and Chinese. Is the preprocessing method inappropriate?
details:
python train.py --exp_name 'my_mnzh_mlm' --dump_path './dumped/' --exp_id '190225' --data_path './data/processed/mn-zh/' --lgs 'mn-zh' --clm_steps '' --mlm_steps 'mn,zh' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.2' --attention_dropout '0.2' --gelu_activation 'true' --batch_size '16' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '300000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'
python train.py --exp_name 'my_mnzh_mlm' --dump_path './dumped/' --exp_id '190225' --data_path './data/processed/mn-zh/' --lgs 'mn-zh' --clm_steps '' --mlm_steps 'mn,zh' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.2' --attention_dropout '0.2' --gelu_activation 'true' --batch_size '16' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '300000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'
INFO - 02/25/19 13:21:37 - 3:07:50 - ============ End of epoch 0 ============
INFO - 02/25/19 13:21:48 - 3:08:01 - epoch -> 0.000000
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mn_mlm_ppl -> 574.678424
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mn_mlm_acc -> 17.192429
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_zh_mlm_ppl -> 5591.294827
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_zh_mlm_acc -> 14.550473
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mlm_ppl -> 3082.986625
INFO - 02/25/19 13:21:48 - 3:08:01 - valid_mlm_acc -> 15.871451
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mn_mlm_ppl -> 436.168551
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mn_mlm_acc -> 13.728215
INFO - 02/25/19 13:21:48 - 3:08:01 - test_zh_mlm_ppl -> 32195.137737
INFO - 02/25/19 13:21:48 - 3:08:01 - test_zh_mlm_acc -> 7.138838
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mlm_ppl -> 16315.653144
INFO - 02/25/19 13:21:48 - 3:08:01 - test_mlm_acc -> 10.433527
INFO - 02/25/19 16:29:17 - 6:15:30 - ============ End of epoch 1 ============
INFO - 02/25/19 16:29:28 - 6:15:41 - epoch -> 1.000000
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mn_mlm_ppl -> 966.486405
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mn_mlm_acc -> 7.886435
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_zh_mlm_ppl -> 8967.092445
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_zh_mlm_acc -> 0.000000
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mlm_ppl -> 4966.789425
INFO - 02/25/19 16:29:28 - 6:15:41 - valid_mlm_acc -> 3.943218
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mn_mlm_ppl -> 808.229061
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mn_mlm_acc -> 12.853917
INFO - 02/25/19 16:29:28 - 6:15:41 - test_zh_mlm_ppl -> 43495.881859
INFO - 02/25/19 16:29:28 - 6:15:41 - test_zh_mlm_acc -> 0.000000
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mlm_ppl -> 22152.055460
INFO - 02/25/19 16:29:28 - 6:15:41 - test_mlm_acc -> 6.426958
Hi,
I ran the MLM pretraining for en-fr using the default arguments.
I noticed that while I was able to learn using the learnt embeddings, using sinusoidal embeddings completely fails to learn and the validation accuracy stays around 5%.
Did you face similar issues while using sin embeddings too?
Thanks!
Thanks for you work. I have a question. When I training for languages with great differences, such as Chinese-English, English-Kazakh. Is it a good choice to share all parameters? I notice that XLM usually share all parameters.
Hello! Do you happen to have a translate.py script so that the model can be used to translate new data? I saw the --eval_only
parameter, but it seems that the file to be translated has to be named according to the naming conventions specified in the trainer (and the data folder has to contain all the training/validation files too). The evaluator also appears to be using the target language file to get the maximum sentence length, which we shouldn't have access to when translating a new document.
Thanks for your help!
Hi,
I ran XNLI fine-tuning task (with MLM+TLM) and got an average accuracy of 73.5 (compared to 75.1 in your paper). The code generated params.pkl, however, I could not find the fine-tuned model. How do I save the model after fine-tuning (or after every epoch of fine-tuning)?
Hi,
How can I reload the checkpoint and model file in order to continue from the last epoch I have reached in previous (aborted) running ? I want to do this in the pretrain stage and also in the train stage
Thanks,
Odel
Hi, thanks for your work. Can you also release the XNLI-15 model trained only with mlm objective?
Hi,
Thanks a lot for the awesome project~
I append a MLP after XLM sentence embedding to build a QA model. But after run several step(single GPU, 21000 step, 8 batch size), it is blocked on loss.backward step, without any error message. If run on 4 GPU, it will blocked sooner (like 4200 step, 4*8 batch size). Could you please give some hint how can I fix this bug?
Thanks a lot~
Hi, I noticed in the code that fp16 training is disabled manually for machine translation and back translation updates by putting assert false statements.
Specifically I am trying to use the MT step. I commented the assert statement and added retain_graph=True in the first backward call. But I noticed that after doing this my throughput was actually lower than without fp16 enabled.
Can you help me with correctly setting up the fp16 training for mt step?
Hi, There were couple of bugs in the "get-data-xnli.sh" script related to file path. The following is the fix:
comment "mkdir -p $XNLI_PATH" (line 29) -- creating this directory prevents downloading XNLI-1.0.zip
replace
mkdir -p $PROCESSED_PATH/eval/XNLI
rm $PROCESSED_PATH/eval/XNLI/*. -- getting error "cannot remove...no such file..."
with
if [ -d $PROCESSED_PATH/eval/XNLI ]; then
rm -rf $PROCESSED_PATH/eval/XNLI
fi
mkdir -p $PROCESSED_PATH/eval/XNLI
Hi, thanks for your work. When I reloading decoder from mlm_1024.pth that has been pretrained, the warnings are rised as follow:
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter layer_norm15.0.weight not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter layer_norm15.0.bias not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter encoder_attn.0.q_lin.weight not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter encoder_attn.0.q_lin.bias not found.
WARNING - 03/03/19 10:07:57 - 0:00:37 - Parameter encoder_attn.0.k_lin.weight not found.
...
Line 151 in ffc64a9
Hi,
thanks for sharing your code!
I'm just wondering if you have implemented the subsampling of frequent outputs (can't find it in your code) and if it was crucial for the performance.
Cheers,
Stephan
Hi, when the program ends, the memory of the GPU0 is released, but the other GPUs are not released. Why that ?
Thanks for sharing this great work. Could you share the examples for supervised machine translation? Thank you.
Hi, do we need to ignore padding's loss when we do back-translation? It seems that the code doesn't ignore the padding when we calculate loss. Thank you very much.
Line 824 in 14fe2d4
The paper shows the best en-fr bleu is 33.4. The readme.md shows
'epoch -> 7
valid_fr-en_mt_bleu -> 28.36
valid_en-fr_mt_bleu -> 30.50
test_fr-en_mt_bleu -> 34.02
test_en-fr_mt_bleu -> 36.62'.
Does this result from the max_len parameter which removes the long sentences from parallel test corpus?
Hi, @glample
I trained with a single GPU and got an err just like the them shown.
Running command:
First: ./get-data-nmt.sh --src en --tgt fr
got:
===== Data summary
Monolingual training data:
en: ./data/processed/en-fr/train.en.pth
fr: ./data/processed/en-fr/train.fr.pth
Monolingual validation data:
en: ./data/processed/en-fr/valid.en.pth
fr: ./data/processed/en-fr/valid.fr.pth
Monolingual test data:
en: ./data/processed/en-fr/test.en.pth
fr: ./data/processed/en-fr/test.fr.pth
Parallel validation data:
en: ./data/processed/en-fr/valid.en-fr.en.pth
fr: ./data/processed/en-fr/valid.en-fr.fr.pth
Parallel test data:
en: ./data/processed/en-fr/test.en-fr.en.pth
fr: ./data/processed/en-fr/test.en-fr.fr.pth
And then run: python train.py --exp_name 'my_enfr_mlm' --dump_path './dumped/' --exp_id 'bs.20' --data_path './data/processed/en-fr/' --lgs 'en-fr' --clm_steps '' --mlm_steps 'en,fr' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.1' --attention_dropout '0.1' --gelu_activation 'true' --batch_size '8' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '300000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'
got the err.
details:
Running command: python train.py --exp_name 'my_enfr_mlm' --dump_path './dumped/' --exp_id 'bs.20' --data_path './data/processed/en-fr/' --lgs 'en-fr' --clm_steps '' --mlm_steps 'en,fr' --emb_dim '1024' --n_layers '6' --n_heads '8' --dropout '0.1' --attention_dropout '0.1' --gelu_activation 'true' --batch_size '32' --bptt '256' --optimizer 'adam,lr=0.0001' --epoch_size '200000' --validation_metrics '_valid_mlm_ppl' --stopping_criterion '_valid_mlm_ppl,10'
INFO - 02/20/19 17:14:59 - 0:50:54 - valid_en_mlm_ppl -> 1413.372916
INFO - 02/20/19 17:14:59 - 0:50:54 - log:{"epoch": 0, "valid_en_mlm_ppl": 1413.3729161899485, "valid_en_mlm_acc": 4.681079149544399, "valid_fr_mlm_ppl": 1137.9702763241598, "valid_fr_mlm_acc": 4.591462520170163, "valid_mlm_ppl": 1275.6715962570543, "valid_mlm_acc": 4.636270834857281, "test_en_mlm_ppl": 1377.6397512089368, "test_en_mlm_acc": 4.500805152979066, "test_fr_mlm_ppl": 1547.092026693417, "test_fr_mlm_acc": 4.81150066011442, "test_mlm_ppl": 1462.3658889511769, "test_mlm_acc": 4.656152906546742}
INFO - 02/20/19 18:05:31 - 1:41:26 - valid_en_mlm_ppl -> 2161.567965
INFO - 02/20/19 18:05:31 - 1:41:26 - log:{"epoch": 1, "valid_en_mlm_ppl": 2161.56796481175, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1688.979616470098, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1925.2737906409238, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2062.9860141920476, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2497.6693821048448, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2280.327698148446, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 18:56:00 - 2:31:55 - valid_en_mlm_ppl -> 2245.817440
INFO - 02/20/19 18:56:00 - 2:31:55 - log:{"epoch": 2, "valid_en_mlm_ppl": 2245.8174404810325, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1625.404408585545, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1935.6109245332887, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2138.2897057505943, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2388.5677765876662, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2263.4287411691303, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 19:46:31 - 3:22:26 - valid_en_mlm_ppl -> 2165.622311
INFO - 02/20/19 19:46:31 - 3:22:26 - log:{"epoch": 3, "valid_en_mlm_ppl": 2165.6223114703407, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1680.1268854516293, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1922.874598460985, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2075.5851921823105, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2465.9347158442074, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2270.7599540132587, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 20:37:00 - 4:12:55 - valid_en_mlm_ppl -> 2062.631943
INFO - 02/20/19 20:37:00 - 4:12:55 - log:{"epoch": 4, "valid_en_mlm_ppl": 2062.6319433943568, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1765.4204690043236, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1914.0262061993403, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 1966.636764557332, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2606.315150449565, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2286.4759575034486, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 21:27:28 - 5:03:23 - valid_en_mlm_ppl -> 2151.624741
INFO - 02/20/19 21:27:28 - 5:03:23 - log:{"epoch": 5, "valid_en_mlm_ppl": 2151.624740528933, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1690.7461604349478, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1921.1854504819405, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2054.5326346790675, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2479.448594677353, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2266.9906146782105, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 22:17:56 - 5:53:51 - valid_en_mlm_ppl -> 2155.638091
INFO - 02/20/19 22:17:56 - 5:53:51 - log:{"epoch": 6, "valid_en_mlm_ppl": 2155.6380909977584, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1699.0517872173994, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1927.3449391075787, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2053.9586330892766, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2483.16693279636, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2268.5627829428186, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 23:08:23 - 6:44:18 - valid_en_mlm_ppl -> 2133.608678
INFO - 02/20/19 23:08:23 - 6:44:18 - log:{"epoch": 7, "valid_en_mlm_ppl": 2133.608678409897, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1695.3582695161938, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1914.4834739630455, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2038.1278812563512, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2492.9029435971656, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2265.5154124267583, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 23:58:51 - 7:34:46 - valid_en_mlm_ppl -> 2065.049633
INFO - 02/20/19 23:58:51 - 7:34:46 - log:{"epoch": 8, "valid_en_mlm_ppl": 2065.049632547123, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1770.2985750724292, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1917.6741038097762, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 1973.5921541087191, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2588.5655595835324, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2281.0788568461257, "test_mlm_acc": 4.0700737027375675}
INFO - 02/21/19 00:49:20 - 8:25:15 - valid_en_mlm_ppl -> 2177.331599
INFO - 02/21/19 00:49:20 - 8:25:15 - log:{"epoch": 9, "valid_en_mlm_ppl": 2177.331599451264, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1664.960476646684, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1921.1460380489739, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2081.1290653201354, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2436.2827245826775, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2258.7058949514067, "test_mlm_acc": 4.0700737027375675}
INFO - 02/21/19 01:39:46 - 9:15:41 - valid_en_mlm_ppl -> 2110.860061
INFO - 02/21/19 01:39:46 - 9:15:41 - log:{"epoch": 10, "valid_en_mlm_ppl": 2110.8600607294125, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1716.5880506037283, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1913.7240556665704, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2007.549178045412, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2522.7412353839986, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2265.145206714705, "test_mlm_acc": 4.0700737027375675}
INFO - 02/21/19 02:30:13 - 10:06:08 - valid_en_mlm_ppl -> 2208.660441
INFO - 02/21/19 02:30:13 - 10:06:08 - log:{"epoch": 11, "valid_en_mlm_ppl": 2208.6604406115257, "valid_en_mlm_acc": 5.074146864391638, "valid_fr_mlm_ppl": 1656.203270846642, "valid_fr_mlm_acc": 4.254070705588969, "valid_mlm_ppl": 1932.431855729084, "valid_mlm_acc": 4.664108784990304, "test_en_mlm_ppl": 2111.8613551170783, "test_en_mlm_acc": 4.186795491143317, "test_fr_mlm_ppl": 2405.011263807759, "test_fr_mlm_acc": 3.9533519143318174, "test_mlm_ppl": 2258.4363094624186, "test_mlm_acc": 4.0700737027375675}
INFO - 02/20/19 16:24:05 - 0:00:00 - ============ Monolingual data (en)
INFO - 02/20/19 16:24:05 - 0:00:00 - Loading data from ./data/processed/en-fr/train.en.pth ...
INFO - 02/20/19 16:24:06 - 0:00:01 - 129033877 words (64139 unique) in 5000000 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 02/20/19 16:24:08 - 0:00:03 - Loading data from ./data/processed/en-fr/valid.en.pth ...
INFO - 02/20/19 16:24:08 - 0:00:03 - 69727 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.
INFO - 02/20/19 16:24:08 - 0:00:03 - Loading data from ./data/processed/en-fr/test.en.pth ...
INFO - 02/20/19 16:24:09 - 0:00:03 - 76017 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 02/20/19 16:24:09 - 0:00:04 - ============ Monolingual data (fr)
INFO - 02/20/19 16:24:09 - 0:00:04 - Loading data from ./data/processed/en-fr/train.fr.pth ...
INFO - 02/20/19 16:24:09 - 0:00:04 - 130884578 words (64139 unique) in 5000000 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 02/20/19 16:24:12 - 0:00:06 - Loading data from ./data/processed/en-fr/valid.fr.pth ...
INFO - 02/20/19 16:24:12 - 0:00:07 - 79585 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.
INFO - 02/20/19 16:24:12 - 0:00:07 - Loading data from ./data/processed/en-fr/test.fr.pth ...
INFO - 02/20/19 16:24:12 - 0:00:07 - 86351 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 02/20/19 16:24:13 - 0:00:08 - ============ Data summary
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - train - en: 5000000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - valid - en: 3000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - test - en: 3003
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - train - fr: 5000000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - valid - fr: 3000
INFO - 02/20/19 16:24:13 - 0:00:08 - Monolingual data - test - fr: 3003
Great work and article! Just curious how is XLM different from multilingual BERT (https://github.com/google-research/bert/blob/master/multilingual.md) regarding pre-training objectives and methods?
Hi,
I am trying to replicate the supervised MT ro -> en baseline of 28.4 mentioned in the paper. I was hoping that you could give me some idea about the hyperparameters for that.
Specifically can you tell me the values of #of BPE operations, learning rate and learning rate schedule used, dropout and attention dropout values, embedding size of the network, batch size and # of gpus used during training.
Thanks!
I tried to run several multi-gpu programs on a single server.But I countered this problem
RuntimeError: Address already in use at /pytorch/torch/lib/THD/process_group/General.cpp:17
So, if I have 4 GPUS on a single server and want to run two programs on GPU 0,1 and 2,3, how can I set the parameter local_rank and master_port? @glample
Hi
Thank you for the great contribution
Is there any script to download data of GLUE tasks?
Dear authors,
I understand this repo isn't very much for supervised MT. But your codebase contains Transformer Enc-Dec model and more importantly it is much simpler than standard supervised MT codebase (e.g. T2T, Fairseq, OpenNMT).
With the intention to reproduce wmt14 EnDe SOTA performance, I use the data & BPE from Fairseq, train the Transformer base (emb_dim=512) w/ only mt_step="en-de" on 4x 2080 Ti (one gpu even lower). And finally got a tokenized BLEU score of 25.63 w/ beam_size 4, length_penalty 0.6. It's more than 1 BLEU lower than reported in Transformer paper.
Training script:
export NGPU=4; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py --exp_name wmt14_ende --dump_path ./dumped/ --data_path ./data/processed/wmt14_de-en/fairseq --lgs 'en-de' --encoder_only false --emb_dim 512 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 6000 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --eval_bleu true --stopping_criterion 'valid_en-de_mt_bleu,10' --validation_metrics 'valid_en-de_mt_bleu' --mt_steps "en-de" --gpus '0,1,2,3'
Translate results:
valid_en-de_mt_ppl-> 5.401580
valid_en-de_mt_acc -> 65.806969
valid_en-de_mt_bleu -> 28.990000
test_en-de_mt_ppl -> 5.942769
test_en-de_mt_acc -> 66.605212
test_en-de_mt_bleu -> 25.630000
My intuition is the model structure is slightly different (gelu, layer_norm etc.). May I ask you have you tried it with supervised MT wmt14 benchmark, and what's your thoughts on this?
Best.
I tried to train a machine translation model using parallel data only. The script I used for training is as follows:
export NGPU=4; python -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
--exp_name supMT_deen \
--dump_path ./checkpoints/ \
--data_path /unsullied/sharefs/zhaoyuekai/data/WMT/corpus/de-en/processed/ \
--lgs 'de-en' \
--mt_steps 'de-en' \
--lambda_mt '0:1,100000:0.1,300000:0' \
--encoder_only false \
--emb_dim 1024 \
--n_layers 6 \
--n_heads 8 \
--dropout 0.1 \
--attention_dropout 0.1 \
--gelu_activation true \
--tokens_per_batch 2000 \
--batch_size 32 \
--bptt 256 \
--optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
--epoch_size 200000 \
--eval_bleu true \
--stopping_criterion 'valid_en-fr_mt_bleu,10' \
--validation_metrics 'valid_en-fr_mt_bleu'
When training on only one GPU, no error was reported, however when I tried to train it on 4 GPUs, following error was encountered.
Traceback (most recent call last):
File "train.py", line 341, in <module>
Traceback (most recent call last):
File "train.py", line 341, in <module>
Traceback (most recent call last):
File "train.py", line 341, in <module>
Traceback (most recent call last):
File "train.py", line 341, in <module>
main(params)
File "train.py", line 300, in main
main(params)
File "train.py", line 300, in main
main(params)
File "train.py", line 300, in main
trainer.mt_step(lang1, lang2, params.lambda_mt)
File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 770, in mt_step
trainer.mt_step(lang1, lang2, params.lambda_mt)
File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 770, in mt_step
trainer.mt_step(lang1, lang2, params.lambda_mt)
File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 770, in mt_step
self.optimize(loss, ['encoder', 'decoder'])self.optimize(loss, ['encoder', 'decoder'])
File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 131, in optimize
File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 131, in optimize
self.optimize(loss, ['encoder', 'decoder'])
File "/unsullied/sharefs/zhaoyuekai/data/XLM/config/XLM.active/src/trainer.py", line 131, in optimize
main(params)
File "train.py", line 300, in main
loss.backward()loss.backward()
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
loss.backward()
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flagallow_unreachable=True) # allow_unreachable flag
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
allow_unreachable=True) # allow_unreachable flag
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
self._queue_reduction(bucket_idx)self._queue_reduction(bucket_idx)
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
self._queue_reduction(bucket_idx)
File "/home/zhaoyuekai/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
self.device_ids)
TypeError : self.device_ids)_queue_reduction(): incompatible function arguments. The following argument types are supported:
1. (process_group: torch.distributed.ProcessGroup, grads_batch: List[List[at::Tensor]], devices: List[int]) -> Tuple[torch.distributed.Work, at::Tensor]
Hi,
I just found out a weird piece at:
XLM/src/evaluation/evaluator.py
Lines 52 to 56 in 20c338e
If possible, may I ask the intuition behind this "hack".
Thanks.
How does the decoder know which direction go towards(lang1 or lang2) when input language is lang1?In other words, how does the decoder know which state it is at , DAE or MT ?
In the previous version(UNMT), it uses different project layers. In XLM, self.pred_layer
is always same. @glample
Why Italian is not supported, almost in any lm?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.