Giter VIP home page Giter VIP logo

mshadloo / neural-machine-translation-with-attention Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 163 KB

I implement encoder-decoder based seq2seq models with attention using Keras. The encoder can be a Bidirectional LSTM, a simple LSTM, or a GRU, and the decoder can be an LSTM or a GRU. I evaluate the models on an English-French dataset.

Jupyter Notebook 65.87% Python 33.64% Shell 0.49%
neural-machine-translation bidirectional-lstm lstm gru encoder-decoder-architecture attention-mechanism

neural-machine-translation-with-attention's Introduction

I implement encoder-decoder based seq2seq models with attention. The encoder can be a Bidirectional LSTM, a simple LSTM or a GRU, and the decoder can be an LSTM, or a GRU. I have a argument for encoder type (RNN model used in encoder); it can be 'bidirectional', 'lstm' or 'gru'. When this argument is set to 'bidirectional', the model has Bidirectional LSTM as the enocder a simple LSTM as the decoder. When it is set to 'lstm', the encoder and decoder are both simple LSTMs, and for the 'gru' value, they are both GRUs. Thus, I can have different three models.

To translate a sentence from a language to another one, a human translator reads the sentence part by part, and generates part of translation. A neural machine translation with attention like a human translator looks at the sentence part by part. To generate each part of translation, the attention mechanism tells a Neural Machine Translation model where it should pay attention to. A simple encoder-decoder model without the attention mechanism tends to forget the earlier part of the sequence once they process further. With the attention mechanism, the model can deal with long sequences.

Dataset

To evaluate the models, I use English-French dataset provided by http://www.manythings.org/anki/

Experiment

I computed accuracy and loss on both training and validation set on all of these three models and compared the resutls. The experiments show that the model with a Bidirectional LSTM as the encoder outperforms the rest.

NMT with Bidirectional LSTM has lowest loss during 20 epochs NMT with Bidirectional LSTM has highest accuracy during 20 epochs

How to run:

git clone https://github.com/mshadloo/Neural-Machine-Translation-with-Attention.git
cd Neural-Machine-Translation-with-Attention
chmod +x data.sh && ./data.sh
chmod +x run.sh && ./run.sh

Steps

Data Preprocessing

First of all, like any other NLP task, we load the text data and perform pre-processing and also do a train-test split.

The data needs some cleaning before being used to train our neural translation model.

  1. Normalizing case to lowercase.
  2. Removing punctuation from each word.
  3. Removing non-printable characters.
  4. Converting French characters to Latin characters.
  5. Removing words that contain non-alphabetic characters.
  6. Add a special token <eos> at the end of target sentences
  7. Create two dictionaries mapping from each word in vocabulary to an id, and the id to the word.
  8. Mark all out of vocabulary (OOV) words with a special token <unk>
  9. Pad each sentence to a maximum length by adding special token <pad> at the end of the sentence.
  10. Convert each sentence to its feature vector

Define The Model

I implement encoder-decoder based seq2seq models with attention. The encoder and the decoder are pre-attention and post-attention RNNs on both sides of the attention mechanism.

  • Encoder:a RNN (Bidirectional LSTM, LSTM, GRU)
    • The encoder goes through ๐‘‡๐‘ฅ time steps (๐‘‡๐‘ฅ: maximum length of the input sequence).
  • Decoder: a RNN (LSTM, GRU)
    • The decoder goes through ๐‘‡๐‘ฆ time steps (๐‘‡๐‘ฆ: maximum length of the output sequence).
  • The attention mechanism computes the context variable ๐‘๐‘œ๐‘›๐‘ก๐‘’๐‘ฅ๐‘กโŸจ๐‘กโŸฉ for each timestep in the output ( ๐‘ก=1,โ€ฆ,๐‘‡๐‘ฆ ).

neural-machine-translation-with-attention's People

Contributors

mshadloo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.