Thanks for sharing this great work. Could you share the examples for supervised machin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

+1. It would be nice if you couldshare the s to reproduce the supervised nmt exp

Example for Supervised Machine Translation,about facebookresearch/xlm

Comments (19)

glample commented on July 29, 2024 4

Yes we will try to add that soon. It's pretty straightforward given the get-data-nmt.sh file though, the only thing to do is to download and tokenize the parallel data, then apply the BPE codes learned on the monolingual data, and binarized these BPE files. Only difference is that you will have 2 extra files:

./data/processed/en-fr/train.en-fr.en.pth
./data/processed/en-fr/train.en-fr.en.pth

Then you can take the same command as the one given for unsupervised MT, and simply remove the --ae_steps and add --mt_steps "en-fr,fr-en" instead.

from xlm.

glample commented on July 29, 2024 2

@bhardwaj1230 do you mean that you use Europarl as a training set, but that you evaluate on newstest 2014? The fact that you do not get a great BLEU score it not so surprising to me, as the domain is quite different. But You should probably get more than 14 BLEU. Could you provide your full training log?

from xlm.

KelleyYin commented on July 29, 2024 1

@glample I am trying to train a machine translation model (en-fr), and confused with few of the parameters.
Its a supervised learning approach and I am using the parallel (en-fr) for training.

I just wanted to know if my below command is correct to train a NMT for (en-fr) ?

I am running the below command :

Running command: python train.py --exp_name test_enfr_tlm --dump_path './dumped/testing_1/' --data_path './data/processed/en-fr/' --lgs 'en-fr' --clm_steps '' --mlm_steps 'en,fr,en-fr' --n_layers 6 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --batch_size 32 --bptt 256 --optimizer 'adam,lr=0.0001' --epoch_size 200000 --eval_bleu True --mt_steps 'en-fr' --encoder_only False --validation_metrics _valid_mlm_ppl --stopping_criterion '_valid_mlm_ppl,10'

Data Summary :

INFO - 06/10/19 18:08:13 - 0:00:08 - ============ Data summary
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - train - en: 2007723
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - valid - en: 3000
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - test - en: 3003
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - train - fr: 2007723
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - valid - fr: 3000
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - test - fr: 3003
INFO - 06/10/19 18:08:13 - 0:00:08 - Parallel data - train - en-fr: 1974530
INFO - 06/10/19 18:08:13 - 0:00:08 - Parallel data - valid - en-fr: 3000
INFO - 06/10/19 18:08:13 - 0:00:08 - Parallel data - test - en-fr: 3003

You should remove --mlm_steps. You can refer to #79 for details.

from xlm.

KelleyYin commented on July 29, 2024 1

What speed were you expecting to get? What speed do you get on 1 GPU?

It's strange that the training speed is the same in both 1 GPU and 8 GPUs.

from xlm.

Dolprimates commented on July 29, 2024 1

@glample Did you released the

train.en-fr.en.pth
train.en-fr.fr.pth

train.en-de.en.pth
train.en-de.de.pth

train.en-ro.en.pth
train.en-ro.ro.pth

files you experimented?
(The last pair is already reported in your paper)

from xlm.

whr94621 commented on July 29, 2024

+1. It would be nice if you couldshare the scripts to reproduce the supervised nmt experiment in the paper.

from xlm.

whr94621 commented on July 29, 2024

Thank you! I would like to try it on WMT14 EN-DE translation.

from xlm.

yilinyang7 commented on July 29, 2024

Thank you! I would like to try it on WMT14 EN-DE translation.

Hi @whr94621, I'm also trying to reproduce SOTA performance on wmt14 en-de. Have you succeed?

from xlm.

whr94621 commented on July 29, 2024

Thank you! I would like to try it on WMT14 EN-DE translation.

Hi @whr94621, I'm also trying to reproduce SOTA performance on wmt14 en-de. Have you succeed?

Not yet.
I did not wait for training to converge, as computation resources in our lab are strained recently due to so many DDL.
In my mind, the performance on the dev data is running in the correct direction and seems to be faster than training from scratch. But I cannot find where I put the training log -_-|
Anyway how about your reproduction?

from xlm.

yilinyang7 commented on July 29, 2024

Thank you! I would like to try it on WMT14 EN-DE translation.

Hi @whr94621, I'm also trying to reproduce SOTA performance on wmt14 en-de. Have you succeed?

Not yet.
I did not wait for training to converge, as computation resources in our lab are strained recently due to so many DDL.
In my mind, the performance on the dev data is running in the correct direction and seems to be faster than training from scratch. But I cannot find where I put the training log -_-|
Anyway how about your reproduction?

Just starting it, will let you know later :)

from xlm.

sugeeth14 commented on July 29, 2024

Hi @whr94621 even I am trying on En-De translation I want to train a language model and then use it as encoder or decoder to train on WMT14 parallel data for En-De. Can you share any progress you made and scripts to replicate the same.
Thanks in advance !

from xlm.

bhardwaj1230 commented on July 29, 2024

@glample I am trying to train a machine translation model (en-fr), and confused with few of the parameters.
Its a supervised learning approach and I am using the parallel (en-fr) for training.

I just wanted to know if my below command is correct to train a NMT for (en-fr) ?

I am running the below command :

Running command: python train.py --exp_name test_enfr_tlm --dump_path './dumped/testing_1/' --data_path './data/processed/en-fr/' --lgs 'en-fr' --clm_steps '' --mlm_steps 'en,fr,en-fr' --n_layers 6 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --batch_size 32 --bptt 256 --optimizer 'adam,lr=0.0001' --epoch_size 200000 --eval_bleu True --mt_steps 'en-fr' --encoder_only False --validation_metrics _valid_mlm_ppl --stopping_criterion '_valid_mlm_ppl,10'

Data Summary :

INFO - 06/10/19 18:08:13 - 0:00:08 - ============ Data summary
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - train - en: 2007723
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - valid - en: 3000
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - test - en: 3003
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - train - fr: 2007723
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - valid - fr: 3000
INFO - 06/10/19 18:08:13 - 0:00:08 - Monolingual data - test - fr: 3003
INFO - 06/10/19 18:08:13 - 0:00:08 - Parallel data - train - en-fr: 1974530
INFO - 06/10/19 18:08:13 - 0:00:08 - Parallel data - valid - en-fr: 3000
INFO - 06/10/19 18:08:13 - 0:00:08 - Parallel data - test - en-fr: 3003

from xlm.

glample commented on July 29, 2024

Yes, you can indeed remove the --mlm_steps. --mt_steps en-fr will correspond to a baseline supervised training en->fr.

from xlm.

bhardwaj1230 commented on July 29, 2024

Thank you @KelleyYin and @glample for the quick response and now I am able to run the NMT with pre-trained language model you have uploaded on github for en-fr.
I have one doubt why --reload_model have two input, I guess one for encoder and another for decoder, and I am using this parameter to upload/reuse your model to train NMT.

Code:

python train.py --exp_name super_mt_en_fr --dump_path './dumped/testing1' --reload_model 'en_fr_data/XLM_model/XLM/dumped/mlm_enfr_1024.pth, en_fr_data/XLM_model/XLM/dumped/mlm_enfr_1024.pth' --data_path './data/processed/en-fr/' --lgs 'en-fr' --mt_steps 'en-fr' --encoder_only False --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --tokens_per_batch 4096 --bptt 256 --batch_size 32 --optimizer 'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0007' --epoch_size 200000 --eval_bleu True --stopping_criterion 'valid_en-fr_mt_bleu,10' --validation_metrics 'valid_en-fr_mt_bleu' --gelu_activation True

I used the same model for both input, since I have encoder only = False, but I get the below warning if I run this :

INFO - 06/11/19 23:55:48 - 0:00:00 - ============ Parallel data (en-fr)
INFO - 06/11/19 23:55:48 - 0:00:00 - Loading data from ./data/processed/en-fr/train.en-fr.en.pth ...
INFO - 06/11/19 23:55:48 - 0:00:00 - 62724457 words (64139 unique) in 2007723 sentences. 322009 unknown words (60 unique) covering 0.51% of the data.
INFO - 06/11/19 23:55:48 - 0:00:00 - Loading data from ./data/processed/en-fr/train.en-fr.fr.pth ...
INFO - 06/11/19 23:55:48 - 0:00:01 - 72796309 words (64139 unique) in 2007723 sentences. 975190 unknown words (54 unique) covering 1.34% of the data.
INFO - 06/11/19 23:55:48 - 0:00:01 - Removed 4967 empty sentences.
INFO - 06/11/19 23:55:49 - 0:00:02 - Removed 0 empty sentences.
INFO - 06/11/19 23:55:49 - 0:00:02 - Removed 28226 too long sentences.

INFO - 06/11/19 23:55:50 - 0:00:02 - Loading data from ./data/processed/en-fr/valid.en-fr.en.pth ...
INFO - 06/11/19 23:55:50 - 0:00:02 - 69727 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.
INFO - 06/11/19 23:55:50 - 0:00:02 - Loading data from ./data/processed/en-fr/valid.en-fr.fr.pth ...
INFO - 06/11/19 23:55:50 - 0:00:02 - 79585 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.
INFO - 06/11/19 23:55:50 - 0:00:03 - Removed 0 empty sentences.

INFO - 06/11/19 23:55:50 - 0:00:03 - Loading data from ./data/processed/en-fr/test.en-fr.en.pth ...
INFO - 06/11/19 23:55:50 - 0:00:03 - 76017 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 06/11/19 23:55:50 - 0:00:03 - Loading data from ./data/processed/en-fr/test.en-fr.fr.pth ...
INFO - 06/11/19 23:55:50 - 0:00:03 - 86351 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 06/11/19 23:55:50 - 0:00:03 - Removed 0 empty sentences.

INFO - 06/11/19 23:55:50 - 0:00:03 - ============ Data summary
INFO - 06/11/19 23:55:50 - 0:00:03 - Parallel data - train - en-fr: 1974530
INFO - 06/11/19 23:55:50 - 0:00:03 - Parallel data - valid - en-fr: 3000
INFO - 06/11/19 23:55:50 - 0:00:03 - Parallel data - test - en-fr: 3003

INFO - 06/11/19 23:55:55 - 0:00:07 - Reloading encoder from en_fr_data/XLM_model/XLM/dumped/mlm_enfr_1024.pth ...
INFO - 06/11/19 23:55:58 - 0:00:10 - Reloading decoder from en_fr_data/XLM_model/XLM/dumped/mlm_enfr_1024.pth ...
> WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.0.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.0.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.q_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.q_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.k_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.k_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.v_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.v_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.out_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.0.out_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.1.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.1.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.q_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.q_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.k_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.k_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.v_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.v_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.out_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.1.out_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.2.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.2.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.q_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.q_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.k_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.k_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.v_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.v_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.out_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.2.out_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.3.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.3.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.q_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.q_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.k_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.k_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.v_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.v_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.out_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.3.out_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.4.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.4.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.q_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.q_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.k_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.k_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.v_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.v_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.out_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.4.out_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.5.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter layer_norm15.5.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.q_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.q_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.k_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.k_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.v_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.v_lin.bias not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.out_lin.weight not found.
WARNING - 06/11/19 23:55:58 - 0:00:11 - Parameter encoder_attn.5.out_lin.bias not found.
INFO - 06/11/19 23:55:58 - 0:00:11 - Number of parameters (encoder): 141848203
INFO - 06/11/19 23:55:58 - 0:00:11 - Number of parameters (decoder): 167050891
INFO - 06/11/19 23:56:00 - 0:00:13 - ============ Starting epoch 0 ... ============
INFO - 06/11/19 23:56:00 - 0:00:13 - Creating new training data iterator (mt,en,fr) ...
anaconda3/envs/tf_gpu/lib/python3.6/site-packages/torch/nn/_reduction.py:16: UserWarning: reduction='elementwise_mean' is deprecated, please use reduction='mean' instead.
warnings.warn("reduction='elementwise_mean' is deprecated, please use reduction='mean' instead.")
INFO - 06/11/19 23:56:04 - 0:00:16 - 5 - 5.35 sent/s - 207.25 words/s - MT-en-fr: 11.9361 - Transformer LR = 9.7488e-07
INFO - 06/11/19 23:56:05 - 0:00:17 - 10 - 20.94 sent/s - 987.95 words/s - MT-en-fr: 11.3755 - Transformer LR = 1.8497e-06
INFO - 06/11/19 23:56:06 - 0:00:18 - 15 - 34.65 sent/s - 1037.56 words/s - MT-en-fr: 10.9374 - Transformer LR = 2.7246e-06
INFO - 06/11/19 23:56:07 - 0:00:19 - 20 - 22.87 sent/s - 992.44 words/s - MT-en-fr: 9.6973 - Transformer LR = 3.5995e-06
INFO - 06/11/19 23:56:08 - 0:00:20 - 25 - 29.04 sent/s - 989.52 words/s - MT-en-fr: 9.1091 - Transformer LR = 4.4744e-06
INFO - 06/11/19 23:56:09 - 0:00:21 - 30 - 30.33 sent/s - 1021.45 words/s - MT-en-fr: 8.6433 - Transformer LR = 5.3493e-06
INFO - 06/11/19 23:56:10 - 0:00:22 - 35 - 36.40 sent/s - 1080.28 words/s - MT-en-fr: 8.0691 - Transformer LR = 6.2241e-06
INFO - 06/11/19 23:56:11 - 0:00:23 - 40 - 37.89 sent/s - 1045.04 words/s - MT-en-fr: 7.9315 - Transformer LR = 7.0990e-06
INFO - 06/11/19 23:56:12 - 0:00:24 - 45 - 37.96 sent/s - 1047.87 words/s - MT-en-fr: 8.0192 - Transformer LR = 7.9739e-06

Thank you,
Shivendra.

from xlm.

glample commented on July 29, 2024

That's fine. These warnings just mean that there are no source attention parameters to reload from the reloaded model, i.e. the decoder is doing source attention on the encoder output, but the pretrained model was trained on language modeling so it did not have these source attention parameters at all.

from xlm.

KelleyYin commented on July 29, 2024

@glample Thanks for your reply.
I found my supervised training program is very slow, taking four hours per epoch.
My training data is WMT16-en-de and trained on 8 TianXPs.
The specific training scripts and logs are as follows.

  4 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
  5 export NGPU=8
  6 
  7 python -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
  8     --debug_train False \
  9     --exp_name supervised_MT \
 10     --exp_id 01 \
 11     --dump_path ./dumped/    \
 12     --data_path ./data/wmt16-en-de/ \
 13     --encoder_only False \
 14     --lgs 'en-de'                       \
 15     --clm_steps ''                      \
 16     --mlm_steps ''                 \
 17     --mt_steps "en-de" \
 18     --emb_dim 512                      \
 19     --n_layers 6                        \
 20     --n_heads 8                         \
 21     --dropout 0.1                       \
 22     --attention_dropout 0.1             \
 23     --gelu_activation true              \
 24     --tokens_per_batch 4096   \
 25     --batch_size 32 \
 26     --bptt 256  \
 27     --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0007 \
 28     --epoch_size 1000000                 \
 29     --eval_bleu true \
 30     --stopping_criterion 'valid_en-de_mt_bleu,10'\
 31     --validation_metrics 'valid_en-de_mt_bleu'\

INFO - 06/12/19 01:59:35 - 4:37:56 - 31240 - 176.01 sent/s - 4018.24 words/s - MT-en-de: 1.8243 - Transformer LR = 2.5048e-04
INFO - 06/12/19 01:59:37 - 4:37:59 - 31245 - 125.15 sent/s - 4096.38 words/s - MT-en-de: 2.0405 - Transformer LR = 2.5046e-04
INFO - 06/12/19 01:59:40 - 4:38:01 - 31250 - 113.32 sent/s - 3937.37 words/s - MT-en-de: 1.8901 - Transformer LR = 2.5044e-04
INFO - 06/12/19 01:59:40 - 4:38:01 - ============ End of epoch 0 ============
WARNING - 06/12/19 02:00:30 - 4:38:52 - Impossible to parse BLEU score! ""
INFO - 06/12/19 02:00:30 - 4:38:52 - BLEU ./dumped/supervised_MT/01/hypotheses/hyp0.en-de.valid.txt ./dumped/supervised_MT/01/hypotheses/ref.en-de.valid.txt : -1.000000
WARNING - 06/12/19 02:01:18 - 4:39:39 - Impossible to parse BLEU score! ""
INFO - 06/12/19 02:01:18 - 4:39:39 - BLEU ./dumped/supervised_MT/01/hypotheses/hyp0.en-de.test.txt ./dumped/supervised_MT/01/hypotheses/ref.en-de.test.txt : -1.000000
INFO - 06/12/19 02:01:18 - 4:39:39 - epoch -> 0.000000
INFO - 06/12/19 02:01:18 - 4:39:39 - valid_en-de_mt_ppl -> 5.492191
INFO - 06/12/19 02:01:18 - 4:39:39 - valid_en-de_mt_acc -> 65.289049
INFO - 06/12/19 02:01:18 - 4:39:39 - valid_en-de_mt_bleu -> -1.000000
INFO - 06/12/19 02:01:18 - 4:39:39 - test_en-de_mt_ppl -> 4.787062
INFO - 06/12/19 02:01:18 - 4:39:39 - test_en-de_mt_acc -> 66.969409
INFO - 06/12/19 02:01:18 - 4:39:39 - test_en-de_mt_bleu -> -1.000000
INFO - 06/12/19 02:01:18 - 4:39:39 - log:{"epoch": 0, "valid_en-de_mt_ppl": 5.492190698675863, "valid_en-de_mt_acc": 65.28904937256443, "valid_en-de_mt_bleu": -1, "test_en-de_mt_ppl": 4.787062307216703, "test_en-de_mt_acc": 66.96940890818091, "test_en-de_mt_bleu": -1}
INFO - 06/12/19 02:01:18 - 4:39:39 - New best score for valid_en-de_mt_bleu: -1.000000
INFO - 06/12/19 02:01:18 - 4:39:39 - Saving models to ./dumped/supervised_MT/01/best-valid_en-de_mt_bleu.pth ...
INFO - 06/12/19 02:01:18 - 4:39:40 - New best validation score: -1.000000
INFO - 06/12/19 02:01:18 - 4:39:40 - Saving checkpoint to ./dumped/supervised_MT/01/checkpoint.pth ...
INFO - 06/12/19 02:01:19 - 4:39:41 - ============ Starting epoch 1 ... ============
INFO - 06/12/19 02:01:21 - 4:39:43 - 31255 - 3.03 sent/s - 99.35 words/s - MT-en-de: 1.9065 - Transformer LR = 2.5042e-04

INFO - 06/12/19 15:33:06 - 18:11:27 - 124995 - 99.54 sent/s - 4124.21 words/s - MT-en-de: 1.8934 - Transformer LR = 1.2522e-04
INFO - 06/12/19 15:33:08 - 18:11:30 - 125000 - 110.78 sent/s - 3976.16 words/s - MT-en-de: 1.7640 - Transformer LR = 1.2522e-04
INFO - 06/12/19 15:33:08 - 18:11:30 - ============ End of epoch 3 ============
WARNING - 06/12/19 15:33:52 - 18:12:13 - Impossible to parse BLEU score! ""
INFO - 06/12/19 15:33:52 - 18:12:13 - BLEU ./dumped/supervised_MT/01/hypotheses/hyp3.en-de.valid.txt ./dumped/supervised_MT/01/hypotheses/ref.en-de.valid.txt : -1.000000
WARNING - 06/12/19 15:34:39 - 18:13:00 - Impossible to parse BLEU score! ""
INFO - 06/12/19 15:34:39 - 18:13:00 - BLEU ./dumped/supervised_MT/01/hypotheses/hyp3.en-de.test.txt ./dumped/supervised_MT/01/hypotheses/ref.en-de.test.txt : -1.000000
INFO - 06/12/19 15:34:39 - 18:13:00 - epoch -> 3.000000
INFO - 06/12/19 15:34:39 - 18:13:00 - valid_en-de_mt_ppl -> 4.721469
INFO - 06/12/19 15:34:39 - 18:13:00 - valid_en-de_mt_acc -> 67.391227
INFO - 06/12/19 15:34:39 - 18:13:00 - valid_en-de_mt_bleu -> -1.000000
INFO - 06/12/19 15:34:39 - 18:13:00 - test_en-de_mt_ppl -> 4.044715
INFO - 06/12/19 15:34:39 - 18:13:00 - test_en-de_mt_acc -> 69.530310
INFO - 06/12/19 15:34:39 - 18:13:00 - test_en-de_mt_bleu -> -1.000000
INFO - 06/12/19 15:34:39 - 18:13:00 - log:{"epoch": 3, "valid_en-de_mt_ppl": 4.721468734611085, "valid_en-de_mt_acc": 67.39122662757856, "valid_en-de_mt_bleu": -1, "test_en-de_mt_ppl": 4.044714902376891, "test_en-de_mt_acc": 69.53031049213749, "test_en-de_mt_bleu": -1}
INFO - 06/12/19 15:34:39 - 18:13:00 - Not a better validation score (2 / 10).
INFO - 06/12/19 15:34:39 - 18:13:00 - Saving checkpoint to ./dumped/supervised_MT/01/checkpoint.pth ...
INFO - 06/12/19 15:34:41 - 18:13:03 - ============ Starting epoch 4 ... ============
INFO - 06/12/19 15:34:43 - 18:13:05 - 125005 - 4.08 sent/s - 104.93 words/s - MT-en-de: 1.6936 - Transformer LR = 1.2522e-04
INFO - 06/12/19 15:34:46 - 18:13:07 - 125010 - 68.84 sent/s - 3901.15 words/s - MT-en-de: 1.7856 - Transformer LR = 1.2521e-04
INFO - 06/12/19 15:34:49 - 18:13:10 - 125015 - 122.12 sent/s - 3955.76 words/s - MT-en-de: 1.7050 - Transformer LR = 1.2521e-04

from xlm.

glample commented on July 29, 2024

What speed were you expecting to get? What speed do you get on 1 GPU?

from xlm.

bhardwaj1230 commented on July 29, 2024

Hello @glample and @yilinyang7 for responce on my previous questions,

I have on problem, I am trying to train supervised MT model based on all your previous help, my training parallel data is Europarl en-fr and or validation/testing using the data you have given in preprocess.py file , I am not able to achine the baseline results.
Another doubt, for me it is running very fast, compared to other people who is trying unspervised model, I thinking I am making very stupid mistake. YOur advise will be helpful.

My results:
python train.py --exp_name super_pretrain_model --dump_path './dumped/testing/' --data_path './data/processed/en-fr/' --lgs 'en-fr' --n_layers 6 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --batch_size 32 --emb_dim 1024 --bptt 256 --optimizer 'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001' --epoch_size 50000 --eval_bleu True --mt_steps 'en-fr' --encoder_only False --validation_metrics 'valid_en-fr_mt_bleu' --stopping_criterion 'valid_en-fr_mt_bleu,10' --tokens_per_batch 2000

bhardwajs@NRC-004220:/srv/gluster/users/bhardwajs/en_fr_data/XLM/dumped/testing/super_pretrain_model/ecmqxy9b8q$ tail train.log
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - epoch -> 130.000000
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - valid_en-fr_mt_ppl -> 39.023443
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - valid_en-fr_mt_acc -> 44.860447
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - valid_en-fr_mt_bleu -> 11.990000
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - test_en-fr_mt_ppl -> 33.288534
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - test_en-fr_mt_acc -> 48.119838
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - test_en-fr_mt_bleu -> 13.980000
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - log:{"**

epoch": 130, "valid_en-fr_mt_ppl": 39.02344311954371, "valid_en-fr_mt_acc": 44.86044681237513, "valid_en-fr_mt_bleu": 11.99, "test_en-fr_mt_ppl": 33.28853423464438, "test_en-fr_mt_acc": 48.11983794793742, "test_en-fr_mt_bleu": 13.98

**}
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - Not a better validation score (10 / 10).
INFO - 06/19/19 10:32:19 - 1 day, 22:39:05 - Stopping criterion has been below its best value for more than 10 epochs. Ending the experiment...

from xlm.

bhardwaj1230 commented on July 29, 2024

Hello @glample,
Attaching the log file for the issue above:
#2 (comment)

I am using the Europarl data to train and test on Newstest 2014, know the doman don't match, but I was surprised with the accuracy and Bleu score, it goes up in the starting but it settels soon to 10-14.

Any comment on based on log will be helpful, or how can I improve the model.
train.log

from xlm.

Example for Supervised Machine Translation about xlm HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent