devsinghsachan / multilingual_nmt Goto Github PK

Experiments on Multilingual NMT

Python 82.24% Shell 6.00% Perl 11.76%

transformer attention-is-all-you-need neural-machine-translation pytorch deep-learning nlp natural-language-processing attention-mechanism pytorch-implementation

multilingual_nmt's Introduction

Multilingual Translation

This codebase was used for the multilingual translation experiments for the paper "Parameter Sharing Methods for Multilingual Self-Attentional Translation Models, WMT-EMNLP 2018".

The multilingual model is based on the Transformer model and also contains the following features:

positional encoding
multi-head dot-product attention
label smoothing
warm-up steps based training of Adam optimizer
shared weights of the embedding and softmax layers
beam search with length normalization
exponential moving average checkpoint of parameters

Requirements

One can install the required packages from the requirements file.

pip install -r requirements.txt

Dataset

Download the TED talks dataset as:

bash download_teddata.sh

This command will download, decompress, and will save the train, dev, and test splits of the TED talks under data directory.

One can use the script ted_reader.py to specify language pairs for both bilingual/multilingual translation tasks.

For bilingual/multilingual translation, just specify the source and target languages as

python ted_reader.py -s ja en zh fr ro -t en zh fr ro ja

For multilingual translation, by default the training data will consist of the cartesian product of all the source and target language pairs.
If all possible combinations of the language pairs are not needed, then just use the option of -ncp

python ted_reader.py -s ja en zh fr ro -t en zh fr ro ja -ncp

Above command will only create training data for the corresponding language pairs, i.e. [(ja, en), (en, zh), (zh, fr), (fr, ro), (fr, ja)]

For evaluating the multiingual model, one can generate the test set for each bilingual pair using the above command.

Instructions

For convenience, there are some example shell scripts under tools directory

Bilingual Translation (NS)

bash tools/bpe_pipeline_bilingual.sh src_lang tgt_lang

Fully Shared Multilingual Translation (FS)

bash tools/bpe_pipeline_fully_shared_multilingual.sh src_lang tgt_lang1 tgt_lang2

Partial Sharing Multilingual Translation (PS)

bash tools/bpe_pipeline_MT.sh src_lang tgt_lang1 tgt_lang2 share_sublayer share_attn

An example of sharing the Key(k), Query(q) in both the attention layers (Self, Source)

bash tools/bpe_pipeline_MT.sh src_lang tgt_lang1 tgt_lang2 k,q self,source

Experiments

Dataset Statistics

Dataset	Train	Dev	Test
English-Vietnamese (IWSLT 2015)	133,317	1,553	1,268
English-German (TED talks)	167,888	4,148	4,491
English-Romanian (TED talks)	180,484	3,904	4,631
English-Dutch (TED talks)	183,767	4,459	5,006

Bilingual Translation Tasks

language pairs	this repo	tensor2tensor	GNMT
En -> Vi (IWSLT 2015)	28.84	28.12	26.50
En -> De	29.31	28.68	27.01
En -> Ro	26.81	26.38	23.92
En -> Nl	32.42	31.74	30.64
De -> En	37.33	36.96	35.46
Ro -> En	37.00	35.45	34.77
Nl -> En	38.59	37.71	35.81

Multilingual Translation Tasks

Method	En->De+Tr	En->De+Ja	En->Ro+Fr	En->De+Nl
	->De ->Tr	->De ->Ja	->Ro ->Fr	->De ->Nl
GNMT NS	27.01 16.07	27.01 16.62	24.38 40.50	27.01 30.64
GNMT FS	29.07 18.09	28.24 17.33	26.41 42.46	28.52 31.72
Transformer NS	29.31 18.62	29.31 17.92	26.81 42.95	29.31 32.43
Transformer FS	28.74 18.69	29.68 18.50	28.52 44.28	30.45 33.69
Transformer PS	30.71 19.67	30.48 19.00	27.58 43.84	30.70 34.05

Citation

If you find this code useful, please consider citing our paper as:

@InProceedings{devendra2018multilingual,
  author = 	"Sachan, Devendra
		and Neubig, Graham,
  title = 	"Parameter Sharing Methods for Multilingual Self-Attentional Translation Models",
  booktitle = 	"Proceedings of the Third Conference on Machine Translation",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  location = 	"Brussels, Belgium"
}

multilingual_nmt's People

Stargazers

Watchers

Forkers

jdegange zhangjiekui dcaragea vyraun bharathichezhiyan nlpming imohammad12 sandy4321

multilingual_nmt's Issues

Use of uninitialized value

python: symbol lookup error: /usr/local/python3/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
BPE decoding/detokenising target to match with references
Step 4a: Evaluate Test
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
Step 4b: Evaluate Dev
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
[root@79dd8fc

Can you provide more detailed readme text?

How to tokenize the multilingual text corpus?

After running python ted_reader.py -s zh ja -t en, I obtain the multilingual corpus. But before learning BPE, I need to tokenize the corpus composed of several different languages such as japanese, chinese and english. The script tokenizer.perl does not seem to work. How can I tokenize the multilingual text?

There are some problems with applying your code of multilingual_nmt

First of all, thank you for open source your code, which has provided me a great help.But, in my practice,I find some problems,I really want your help. I would like to ask what do you mean by this code?In /multilingual_nmt/models/transformer.py at line 582(yy_mask *= self.make_history_mask(y_in_block)),why use * in variable,I can't understand this and run.
Second, when I use your example,tools/bpe_pipeline_bilingual.sh to train en to ja model, I find the loss can't drop normally that the fist is 10 the second to be nan.
I am looking forward to your reply,this is my email:[email protected].

Pretrained model

Thanks for the great work, you code is very clean and easy to understand. I have trained Spanish to English translation model and it is working fine. I was wondering if you already have a pertained bilingual or multilingual translation model? If you have any can you please share?

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'

tep 2: Train
Training command :: python /home/work/notebooks/multilingual_nmt/train.py -i temp/run_src_lang_tgt_lang/data --data processed --model_file temp/run_src_lang_tgt_lang/models/model_run_src_lang_tgt_lang.ckpt --best_model_file temp/run_src_lang_tgt_lang/models/model_best_run_src_lang_tgt_lang.ckpt --data processed --batchsize 30 --tied --beam_size 5 --epoch 30 --layers 6 --multi_heads 8 --gpu0 --max_decode_len 70 --dev_hyp temp/run_src_lang_tgt_lang/test/valid.out --test_hyp temp/run_src_lang_tgt_lang/test/test.out --model Transformer --metric bleu --wbatchsize 3000
{
"input": "temp/run_src_lang_tgt_lang/data",
"data": "processed",
"report_every": 50,
"model": "Transformer",
"pshare_decoder_param": false,
"pshare_encoder_param": false,
"lang1": null,
"lang2": null,
"share_sublayer": null,
"attn_share": null,
"batchsize": 30,
"wbatchsize": 3000,
"epoch": 30,
"gpu": 0,
"resume": false,
"start_epoch": 0,
"debug": false,
"grad_accumulator_count": 1,
"seed": 1234,
"fp16": false,
"static_loss_scale": 1,
"dynamic_loss_scale": false,
"multi_gpu": [
0
],
"n_units": 512,
"n_hidden": 2048,
"layers": 6,
"multi_heads": 8,
"dropout": 0.1,
"attention_dropout": 0.1,
"relu_dropout": 0.1,
"layer_prepostprocess_dropout": 0.1,
"tied": true,
"pos_attention": false,
"label_smoothing": 0.1,
"embed_position": false,
"max_length": 500,
"use_pad_remover": true,
"optimizer": "Noam",
"grad_norm_for_yogi": false,
"warmup_steps": 16000,
"learning_rate": 0.2,
"learning_rate_constant": 2.0,
"optimizer_adam_beta1": 0.9,
"optimizer_adam_beta2": 0.997,
"optimizer_adam_epsilon": 1e-09,
"ema_decay": 0.999,
"eval_steps": 1000,
"beam_size": 5,
"metric": "bleu",
"alpha": 1.0,
"max_sent_eval": 500,
"max_decode_len": 70,
"out": "results",
"model_file": "temp/run_src_lang_tgt_lang/models/model_run_src_lang_tgt_lang.ckpt",
"best_model_file": "temp/run_src_lang_tgt_lang/models/model_best_run_src_lang_tgt_lang.ckpt",
"dev_hyp": "temp/run_src_lang_tgt_lang/test/valid.out",
"test_hyp": "temp/run_src_lang_tgt_lang/test/test.out",
"log_path": "results/log.txt"
}
/usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))

number of parameters: 64387713
encoder: 18903040
decoder: 25200640
Transformer(
(embed_word): ScaledEmbedding(39041, 512, padding_idx=0)
(embed_dropout): Dropout(p=0.1)
(encoder): Encoder(
(layers): ModuleList(
(0): EncoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout2): Dropout(p=0.1)
)
(1): EncoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout2): Dropout(p=0.1)
)
(2): EncoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout2): Dropout(p=0.1)
)
(3): EncoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout2): Dropout(p=0.1)
)
(4): EncoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout2): Dropout(p=0.1)
)
(5): EncoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout2): Dropout(p=0.1)
)
)
(ln): LayerNorm()
)
(decoder): Decoder(
(layers): ModuleList(
(0): DecoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(source_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout2): Dropout(p=0.1)
(ln_3): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout3): Dropout(p=0.1)
)
(1): DecoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(source_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout2): Dropout(p=0.1)
(ln_3): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout3): Dropout(p=0.1)
)
(2): DecoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(source_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout2): Dropout(p=0.1)
(ln_3): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout3): Dropout(p=0.1)
)
(3): DecoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(source_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout2): Dropout(p=0.1)
(ln_3): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout3): Dropout(p=0.1)
)
(4): DecoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(source_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout2): Dropout(p=0.1)
(ln_3): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout3): Dropout(p=0.1)
)
(5): DecoderLayer(
(ln_1): LayerNorm()
(self_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout1): Dropout(p=0.1)
(ln_2): LayerNorm()
(source_attention): MultiHeadAttention(
(W_Q): Linear(in_features=512, out_features=512, bias=False)
(W_K): Linear(in_features=512, out_features=512, bias=False)
(W_V): Linear(in_features=512, out_features=512, bias=False)
(finishing_linear_layer): Linear(in_features=512, out_features=512, bias=False)
(dropout): Dropout(p=0.1)
)
(dropout2): Dropout(p=0.1)
(ln_3): LayerNorm()
(feed_forward): FeedForwardLayer(
(W_1): Linear(in_features=512, out_features=2048, bias=True)
(act): ReLU()
(dropout): Dropout(p=0.1)
(W_2): Linear(in_features=2048, out_features=512, bias=True)
)
(dropout3): Dropout(p=0.1)
)
)
(ln): LayerNorm()
)
(affine): Linear(in_features=512, out_features=39041, bias=True)
(criterion): KLDivLoss()
)
Approximate number of iter/epoch = 3589
Traceback (most recent call last):
File "/home/work/notebooks/multilingual_nmt/train.py", line 457, in
main()
File "/home/work/notebooks/multilingual_nmt/train.py", line 315, in main
loss, stat = model(*in_arrays)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/work/notebooks/multilingual_nmt/models/transformer.py", line 601, in forward
y_out_block)
File "/home/work/notebooks/multilingual_nmt/models/transformer.py", line 551, in output_and_loss
stats = utils.Statistics(loss=loss.data.cpu() * n_total,
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'
BPE decoding/detokenising target to match with references
Step 4a: Evaluate Test
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt/tools/multi-bleu.perl line 148.
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt/tools/multi-bleu.perl line 148.
Step 4b: Evaluate Dev
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt/tools/multi-bleu.perl line 148.
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt/tools/multi-bleu.perl line 148.
Traceback (most recent call last):
File "/home/work/notebooks/multilingual_nmt/bin/t2t-bleu", line 208, in
case_sensitive=False)
File "/home/work/notebooks/multilingual_nmt/bin/t2t-bleu", line 189, in bleu_wrapper
assert len(ref_lines) == len(hyp_lines)
AssertionError
[root@ccb234d5f670 multilingual_nmt]#

Code download failed, can you provide other download channels?

Could you please share the pretrained models？

I am interested in your experiments since you have trained the transformer model using the TED dataset. Since I have very limited training resources, could you please share your pretrained model with me for the task of English to German?

Thanks!

unicodeDecodeError

unicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 904: ordinal not in range(128) [root@ccb234d5f670 multilingual_nmt]

CUDA version is insufficient

I enconter a question,CUDA version is insufficient,the error logs display as follows:
/usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))

number of parameters: 42336897
encoder: 9452032
decoder: 12600832
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=74 error=35 : CUDA driver version is insufficient for CUDA runtimeversion
Traceback (most recent call last):
File "/home/work/notebooks/multilingual_nmt-master/train.py", line 457, in
main()
File "/home/work/notebooks/multilingual_nmt-master/train.py", line 209, in main
model.cuda(args.gpu)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 258, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:74
BPE decoding/detokenising target to match with references
Step 4a: Evaluate Test
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
Step 4b: Evaluate Dev
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.
Use of uninitialized value $length_reference in numeric eq (==) at /home/work/notebooks/multilingual_nmt-master/tools/multi-bleu.perl line 148.

environment :centos7
run command:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
bash download_teddata.sh
python ted_reader.py -s ja en zh fr ro -t en zh fr ro ja
python ted_reader.py -s ja en zh fr ro -t en zh fr ro ja -ncp
bash tools/bpe_pipeline_bilingual.sh src_lang tgt_lang

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.