dongjunlee / transformer-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

346.0 14.0 111.0 842 KB

TensorFlow implementation of 'Attention Is All You Need (2017. 6)'

Python 66.06% Jupyter Notebook 32.86% Shell 1.08%

tensorflow translation nlp transformer attention experiments deep-learning hb-experiment

transformer-tensorflow's Introduction

transformer

TensorFlow implementation of Attention Is All You Need. (2017. 6)

Requirements

Python 3.6
TensorFlow 1.8
hb-config (Singleton Config)
nltk (tokenizer and blue score)
tqdm (progress bar)
Slack Incoming Webhook URL

Project Structure

init Project by hb-base

.
├── config                  # Config files (.yml, .json) using with hb-config
├── data                    # dataset path
├── notebooks               # Prototyping with numpy or tf.interactivesession
├── transformer             # transformer architecture graphs (from input to logits)
    ├── __init__.py             # Graph logic
    ├── attention.py            # Attention (multi-head, scaled_dot_product and etc..)
    ├── encoder.py              # Encoder logic
    ├── decoder.py              # Decoder logic
    └── layer.py                # Layers (FFN)
├── data_loader.py          # raw_date -> precossed_data -> generate_batch (using Dataset)
├── hook.py                 # training or test hook feature (eg. print_variables)
├── main.py                 # define experiment_fn
└── model.py                # define EstimatorSpec

Reference : hb-config, Dataset, experiments_fn, EstimatorSpec

Todo

Train and evaluate with 'WMT German-English (2016)' dataset

Config

Can control all Experimental environment.

example: check-tiny.yml

data:
  base_path: 'data/'
  raw_data_path: 'tiny_kor_eng'
  processed_path: 'tiny_processed_data'
  word_threshold: 1

  PAD_ID: 0
  UNK_ID: 1
  START_ID: 2
  EOS_ID: 3

model:
  batch_size: 4
  num_layers: 2
  model_dim: 32
  num_heads: 4
  linear_key_dim: 20
  linear_value_dim: 24
  ffn_dim: 30
  dropout: 0.2

train:
  learning_rate: 0.0001
  optimizer: 'Adam'  ('Adagrad', 'Adam', 'Ftrl', 'Momentum', 'RMSProp', 'SGD')
  
  train_steps: 15000
  model_dir: 'logs/check_tiny'
  
  save_checkpoints_steps: 1000
  check_hook_n_iter: 100
  min_eval_frequency: 100
  
  print_verbose: True
  debug: False
  
slack:
  webhook_url: ""  # after training notify you using slack-webhook

debug mode : using tfdbg
check-tiny is a data set with about 30 sentences that are translated from Korean into English. (recommend read it :) )

Usage

Install requirements.

pip install -r requirements.txt

Then, pre-process raw data.

python data_loader.py --config check-tiny

Finally, start train and evaluate model

python main.py --config check-tiny --mode train_and_evaluate

Or, you can use IWSLT'15 English-Vietnamese dataset.

sh prepare-iwslt15.en-vi.sh                                        # download dataset
python data_loader.py --config iwslt15-en-vi                       # preprocessing
python main.py --config iwslt15-en-vi --mode train_and_evalueate   # start training

Predict

After training, you can test the model.

command

python predict.py --config {config} --src {src_sentence}

example

$ python predict.py --config check-tiny --src "안녕하세요. 반갑습니다."

------------------------------------
Source: 안녕하세요. 반갑습니다.
 > Result: Hello . I'm glad to see you . <\s> vectors . <\s> Hello locations . <\s> will . <\s> . <\s> you . <\s>

Experiments modes

✅ : Working
◽ : Not tested yet.

✅ evaluate : Evaluate on the evaluation data.
◽ extend_train_hooks : Extends the hooks for training.
◽ reset_export_strategies : Resets the export strategies with the new_export_strategies.
◽ run_std_server : Starts a TensorFlow server and joins the serving thread.
◽ test : Tests training, evaluating and exporting the estimator for a single step.
✅ train : Fit the estimator using the training data.
✅ train_and_evaluate : Interleaves training and evaluation.

Tensorboar

tensorboard --logdir logs

check-tiny example

Reference

hb-research/notes - Attention Is All You Need
Paper - Attention Is All You Need (2017. 6) by A Vaswani (Google Brain Team)
tensor2tensor - A library for generalized sequence to sequence models (official code)

Author

Dongjun Lee ([email protected])

transformer-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

bogdanfedoniuk duke24k junbeomlee codeaudit rubenszimbres wanjinchang cclauss nagyistge liyuanyaun zhongxingpeng yangyaoyunshu josephpaik fendaq jeanxi cvtower hb-research rahulremanan lgdkobe24 cxncu001 njust-taoye ahyuan xibin hccho2 hongxi233 wangtao1321 zbn123 caitouwh sherlockls datianshi21 thanhtd91 khuangatwork hsiaoyetgun huangpeng1126 flyflywang bloodd xzzxxzzx alphacyc joytianya xiantotikui delaiahz zhongyunuestc liu-nlper wurentidai gumanchang myvrml woohaoshu iris8beiny wangxshu gbacillus machine4life shaoxiaoyu yanhank jakisou aozhi borispolonsky envibus imath123 thurday yjygo janeshenyy owen864720655 gccrpm wangkanger zorrock lianxiaolei qxcssdl tuanshanyou rosssong waynesuzq bailianfa ibrahim85 xiaodanjiao crystal22 forjiankeliuend lovekey111 davidpeng11 xucong053 jiajiadf jerrycatleung kingwpf vlice wibrow hayleyshim ssitb overseer66 algebra-cadabra 0215arthur eminemrain covernal fenss pquochuy wangderfulth 24flyman jgr98 rsdljm daeseunglee vatthaphon leoota luyumingl wangdayaya

transformer-tensorflow's Issues

questions and answers are undefined names

flake8 testing of https://github.com/DongjunLee/transformer-tensorflow on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./data_loader.py:262:21: F821 undefined name 'questions'
    prepare_dataset(questions, answers)
                    ^
./data_loader.py:262:32: F821 undefined name 'answers'
    prepare_dataset(questions, answers)
                               ^

https://github.com/DongjunLee/transformer-tensorflow/blob/master/data_loader.py#L262

Stop prediction when EOS is reached

Hi Dongjun,
In line 38, of the Graph class, the following loop continues until max sequence length is decoded.
for i in range(2, Config.data.max_seq_length):

Is it possible to break the loop when EOS(end of the sequence) is reached for the batch?

Thank you,
Surendra

new problem

in the model.py 49line
why: self.decoder_inputs = tf.concat([start_tokens, target_slice_last_1], axis=1)
self.decoder_inputs is word_id why concat a zeros matrix

Does this support beam search?

positional encoding seems different from the paper

In the paper, it says:

PE(pos,2i)=sin(pos/10000 ** (2i/dmodel))
PE(pos,2i+1)=cos(pos/10000 ** (2i/dmode)l)

So basically for dim i, the denominator should be 10000 ** (2 * (i//2)/dmodel).

I rewrite the function as:

def get_positional_encoding(dim, sentence_length, dtype=tf.float32):
    div_term = numpy.power(10000.0, - (numpy.arange(dim)//2).astype(numpy.float32) * 2.0 / dim)
    div_term = div_term.reshape(1, -1)
    pos = numpy.arange(sentence_length, dtype=numpy.float32).reshape(-1, 1)
    encoded_vec = numpy.matmul(pos, div_term)
    encoded_vec[:, 0::2] = numpy.sin(encoded_vec[:, 0::2])
    encoded_vec[:, 1::2] = numpy.cos(encoded_vec[:, 1::2])

    return tf.convert_to_tensor(encoded_vec.reshape([sentence_length, dim]), dtype=dtype)

How to run predictions?

I am getting an error "iterator has not been initialized" while running with --mode=test option.
This project would be really useful if we can predict from it. Could you help fix this issue?

Encoder self_attention input tensor clarification.

Hello, I was looking through your implementation, and I got a little confused by your encoder and decoder input tensors.
For encoder you have this in the build() for encoder.py:

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
                                                                 k=encoder_inputs,
                                                                 v=encoder_inputs), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

What I'm confused about is why you're using encoder_inputs as the query, key, and value tensors for self_attention function. For the stacked layers, shouldn't the q, k, v tensors be the "o1 = tf.identity(o3)" outputs? In other words, shouldn't the build function be like so:

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
                                                                 k=encoder_inputs,
                                                                 v=encoder_inputs), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)
                encoder_inputs = o1     # Set the attention input tensors for next stack to be the output of this stack?

        return o3

Likewise in the decoder, "decoder_inputs" never get reset in the stacks building loop. Any clarification would be great! Thanks!