The machine-translation from lukemelas

machine-translation's Introduction

Machine Translation with PyTorch

This repository contains an implementation of a Seq2seq neural network model for machine translation. More details on sequence to sequence machine translation and hyperparameter tuning may be found in Massive Exploration of Neural Machine Translation Architectures.

To train our model, clone the repo and run main.py:

usage: main.py [-h] [--lr N] [--hs N] [--emb N] [--nlayers N] [--dp N]
               [--unidir] [--attn STR] [--reverse_input] [-v N] [-b N]
               [--epochs N] [--model DIR] [-e] [--visualize] [--predict DIR]
               [--predict_outfile DIR] [--predict_from_input STR]

Machine Translation with Attention

optional arguments:
  -h, --help            show this help message and exit
  --lr N                learning rate, default: 2e-3
  --hs N                size of hidden state, default: 300
  --emb N               embedding size, default: 300
  --nlayers N           number of layers in rnn, default: 2
  --dp N                dropout probability, default: 0.30
  --unidir              use unidirectional encoder, default: bidirectional
  --attn STR            attention: dot-product, additive or none, default:
                        dot-product
  --reverse_input       reverse input to encoder, default: False
  -v N                  vocab size, use 0 for maximum size, default: 0
  -b N                  batch size, default: 64
  --epochs N            number of epochs, default: 50
  --model DIR           path to model, default: None
  -e, --evaluate        only evaluate model, default: False
  --visualize           visualize model attention distribution
  --predict DIR         directory with final input data for predictions,
                        default: None

For example, to train with the default parameters, run:

python main.py

Note: To generate smaller or larger versions of the training dataset for experimenting with the model, you may use the following commands.

cp .data/iwslt/de-en/train.de-en-full.en .data/iwslt/de-en/train.de-en.en && cp .data/iwslt/de-en/train.de-en-full.de .data/iwslt/de-en/train.de-en.de
cp .data/iwslt/de-en/train.de-en-small.en .data/iwslt/de-en/train.de-en.en && cp .data/iwslt/de-en/train.de-en-small.de .data/iwslt/de-en/train.de-en.de

machine-translation's People

Contributors

Stargazers

Watchers

machine-translation's Issues

unk replacement with corresponding words in source input in the output sentences when using --predict_from_input

please i would like to replace the unk in the output sentences with the corresponding words in the source sentence when using the option: --predict_from_input. Meaning going from german to english, if a word was seen as unk in the provided sentence it should appear exactly in the output sentence at the good position.

Embedding Error Index out of Range in self

`This is a test of our preprocessing function. It took 46.7 seconds to load the data. Our German vocab has size 13354 and our English vocab has size 11560. Our training data has 7443 batches, each with 16 sentences, and our validation data has 570 batches.
Validating Epoch [0/50] Average loss: 9.379 Perplexity: 11833.884
Epoch [1/50] Batch [10/7443] Loss: 8.345
Epoch [1/50] Batch [1010/7443] Loss: 4.692
Epoch [1/50] Batch [2010/7443] Loss: 4.192
Traceback (most recent call last):
File "/home/hasara/python code/benchmark- IWSLT data/test.py", line 334, in
train(train_iter, val_iter, model, criterion, optimizer, 50)
File "/home/hasara/python code/benchmark- IWSLT data/test.py", line 255, in train
scores = model(src, tgt)
File "/home/hasara/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/hasara/python code/benchmark- IWSLT data/test.py", line 159, in forward
out_e, final_e = self.encoder(src)
File "/home/hasara/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/hasara/python code/benchmark- IWSLT data/test.py", line 75, in forward
x = self.dropout(self.embedding(x)) # embedding
File "/home/hasara/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/hasara/.local/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/hasara/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1724, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

`
How can I resolve this error?

Recommend Projects

lukemelas / machine-translation Goto Github PK

machine-translation's Introduction

Machine Translation with PyTorch

machine-translation's People

Contributors

Stargazers

Watchers

Forkers

machine-translation's Issues

unk replacement with corresponding words in source input in the output sentences when using --predict_from_input

Embedding Error Index out of Range in self

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent