Bahdanau Attention Mechanism Implementation for Language Translation.

Introduction:

We have 362861 rows of Italian to English translated sentences as raw data.
Appropriate preprocessing was done. Input english sentences(related to decoder block) were appended by 'start' and output decoder sentences were appended with 'end' token.
307077 sentences were used for training, 54191 sentences for validation and 1088 sentences as test dataset.
Both italian and english sentences were tokenize and maximum sequence length of 20 tokens was selected. Finally we had 13335 english tokens and 27402 italian tokens.
Appropriate Dataset Loader code was written to return encoder sequence, decoder input/output sequence.

Model:

ref: https://guillaumegenthial.github.io/sequence-to-sequence.html

Encoder Layer:

Italian tokens were embedded to vectors as per given dimensions using embedding layer.
Output dimensions: [batch, max_len, embed-size]
Individual LSTM output are used to get cross attention score in further layers.
LSTM output dimensions: [batch, max_len, lstm-units]

Attention Mechanism Layer:

Decoder input is transformed to match encoder output dimensions and attention weights are calculated based on similarity score using dot products and weighted sum of encoder hidden state vector is returned as context vector to be used by decoder.
Context vector dimensions: [batch,encoder_lstm_units]
Attention weights dimensions: [batch,max_len,1]

Decoder Encoder Cross Attention Layer:

It performs cross attention between embedded decoder input(can be glove vectors) and embedded encoder input using previous attention mechanism layer, concatanate the attention updated/weighted decoder input with embedded decoder input and pass it to to lstm layer. Then to dense layer having units equal to output vocab size. The decoder input is passeed one word at a time over batch(matrix form) i.e. cross attention is performed one embedded token at a time over whole batch.
Final output dimensions: [batch,tar_vocab_size]

Decoder Layer:

It performs cross attention using Decoder Encoder Cross Attention Layer and gives logits values for full decoder input length.
Final Logits output shape: [batch,max_len,tar_vocab_size]

Final Translation Model:

Using dataset generator appropriate data is passed to encoder and decoder block and final logits values are returned over whole batch.

Custom Loss Function and Metric:

Custom loss function and metric will not consider the loss for padded zero.

Training:

Following hyperparameters are choosen for training the model:
- encoder_inputs_length = 20
- decoder_inputs_length = 20
- vocab_size_ita = vocab_size_ita
- vocab_size_eng = vocab_size_eng
- embedding_dim_enc = 100
- embedding_dim_dec = 100
- enc_units = 128
- dec_units = 128
- lstm_dropout = 0.2
- recurrent_dropout = 0.2
- optimizer = tf.keras.optimizers.Adam()
After 70 epochs we get validation accuracy of 0.86% (model not trained further due to resources constraints)
Some translated sentence from test datasets:
- Italian: vedo cosavete fatto lì
  
  English True: i see what you did there
  
  Model Translation: i i see what you have done there
- Italian: tom non è un fisico
  
  English True: tom is not a physician
  
  Model Translation: tom tom is not a physician
- Italian: cè un costo di consegna
  
  English True: is there a delivery charge
  
  Model Translation: there there is a charge of the delivery
- Italian: è un tizio strano
  
  English True: he is a strange guy
  
  Model Translation: it it is a strange guy
- Italian: tutti qua sanno che non mangiamo la carne di maiale
  
  English True: everyone here knows that we do not eat pork
  
  Model Translation: everyone everyone here knows we do not eat pork
Average test data bleu score: 0.4451662890214658
Average test data cumulative 4-gram bleu score: 1.479362713798278e-231

m92vyas / implementing_attention_mechanism_language_translation Goto Github PK

implementing_attention_mechanism_language_translation's Introduction

Bahdanau Attention Mechanism Implementation for Language Translation.

Introduction:

Model:

Encoder Layer:

Attention Mechanism Layer:

Decoder Encoder Cross Attention Layer:

Decoder Layer:

Final Translation Model:

Custom Loss Function and Metric:

Training:

implementing_attention_mechanism_language_translation's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent