Giter VIP home page Giter VIP logo

transformer's Introduction

[UPDATED] A TensorFlow Implementation of Attention Is All You Need

When I opened this repository in 2017, there was no official code yet. I tried to implement the paper as I understood, but to no surprise it had several bugs. I realized them mostly thanks to people who issued here, so I'm very grateful to all of them. Though there is the official implementation as well as several other unofficial github repos, I decided to update my own one. This update focuses on:

  • readable / understandable code writing
  • modularization (but not too much)
  • revising known bugs. (masking, positional encoding, ...)
  • updating to TF1.12. (tf.data, ...)
  • adding some missing components (bpe, shared weight matrix, ...)
  • including useful comments in the code.

I still stick to IWSLT 2016 de-en. I guess if you'd like to test on a big data such as WMT, you would rely on the official implementation. After all, it's pleasant to check quickly if your model works. The initial code for TF1.2 is moved to the tf1.2_lecacy folder for the record.

Requirements

  • python==3.x (Let's move on to python 3 if you still use python 2)
  • tensorflow==1.12.0
  • numpy>=1.15.4
  • sentencepiece==0.1.8
  • tqdm>=4.28.1

Training

bash download.sh

It should be extracted to iwslt2016/de-en folder automatically.

  • STEP 2. Run the command below to create preprocessed train/eval/test data.
python prepro.py

If you want to change the vocabulary size (default:32000), do this.

python prepro.py --vocab_size 8000

It should create two folders iwslt2016/prepro and iwslt2016/segmented.

  • STEP 3. Run the following command.
python train.py

Check hparams.py to see which parameters are possible. For example,

python train.py --logdir myLog --batch_size 256 --dropout_rate 0.5
  • STEP 3. Or download the pretrained models.
wget https://dl.dropbox.com/s/4lom1czy5xfzr4q/log.zip; unzip log.zip; rm log.zip

Training Loss Curve

Learning rate

Bleu score on devset

Inference (=test)

  • Run
python test.py --ckpt log/1/iwslt2016_E19L2.64-29146 (OR yourCkptFile OR yourCkptFileDirectory)

Results

  • Typically, machine translation is evaluated with Bleu score.
  • All evaluation results are available in eval/1 and test/1.
tst2013 (dev) tst2014 (test)
28.06 23.88

Notes

  • Beam decoding will be added soon.
  • I'm going to update the code when TF2.0 comes out if possible.

transformer's People

Contributors

andy-yangz avatar eternalfeather avatar kimdwkimdw avatar kyubyong avatar maximedb avatar xu-song avatar yinnxinn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformer's Issues

hello

I have a little problem, I think you do a little less and twice in each sublayer of the encoder and decoder, and do not know whether my understanding is right or wrong.

batch() got an unexpected keyword argument

Hi, I directly run the code and it shows as below:
File "C:\Users\hp\Desktop\transformer-master0\transformer-master\data_load.py", line 104, in get_batch_data
allow_smaller_final_batch=False)
TypeError: batch() got an unexpected keyword argument 'min_after_dequeue'

all my packages meet the requirements and I can find out why. Thank you.

Training process killed

I tried to train transformer model on my own parallel corpus (about 250MB).

But after the graph is constructed, the process is killed before session started.

Graph loaded
WARNING:tensorflow:From train.py:171: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-11-27 12:32:22.021904: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-27 12:32:22.279206: I tensorflow/compiler/xla/service/service.cc:149] XLA service 0x5607d324dc90 executing computations on platform CUDA. Devices:
2018-11-27 12:32:22.279319: I tensorflow/compiler/xla/service/service.cc:157]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2018-11-27 12:32:22.286826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 11.91GiB freeMemory: 10.98GiB
2018-11-27 12:32:22.286958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2018-11-27 12:32:22.288905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 12:32:22.288978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-11-27 12:32:22.289007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-11-27 12:32:22.289527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10682 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
Killed

Any ideas?

saver error occors

Caused by op u'save/Assign_136', defined at:
File "eval.py", line 81, in
eval()
File "eval.py", line 35, in eval
sv = tf.train.Supervisor()
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 300, in init
self._init_saver(saver=saver)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 446, in _init_saver
saver = saver_mod.Saver()
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1056, in init
self.build()
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1086, in build
restore_sequentially=self._restore_sequentially)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 155, in restore
self.op.get_shape().is_fully_defined())
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 270, in assign
validate_shape=validate_shape)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/yinxiaoyi/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [9786,512] rhs shape= [9796,512]
[[Node: save/Assign_136 = Assign[T=DT_FLOAT, _class=["loc:@encoder/enc_embed/lookup_table"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](encoder/enc_embed/lookup_table, save/RestoreV2_136/_901)]]

Question on maskings

Hi @Kyubyong,

Can you help explain a bit on the following masking codes (the Key Masking and Query Masking) in the modules.py? Why we need them? We only need the causality, right?

# Key Masking
        key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)
        key_masks = tf.tile(key_masks, [num_heads, 1]) # (h*N, T_k)
        key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(queries)[1], 1]) # (h*N, T_q, T_k)
        
        paddings = tf.ones_like(outputs)*(-2**32+1)
        outputs = tf.where(tf.equal(key_masks, 0), paddings, outputs) # (h*N, T_q, T_k)
# Query Masking
        query_masks = tf.sign(tf.abs(tf.reduce_sum(queries, axis=-1))) # (N, T_q)
        query_masks = tf.tile(query_masks, [num_heads, 1]) # (h*N, T_q)
        query_masks = tf.tile(tf.expand_dims(query_masks, -1), [1, 1, tf.shape(keys)[1]]) # (h*N, T_q, T_k)
        outputs *= query_masks # broadcasting. (N, T_q, C)

Thanks!

why normalization variables are trainable

In function normalize(in modules.py), beta & gamma are set as variables. I don't know why should they be trainable. Couldn't I just use 0. & 1.?

def normalize(inputs,
              epsilon=1e-8,
              scope="ln",
              reuse=None):
    """Applies layer normalization.

    Args:
      inputs: A tensor with 2 or more dimensions, where the first dimension has
        `batch_size`.
      epsilon: A floating number. A very small number for preventing ZeroDivision Error.
      scope: Optional scope for `variable_scope`.
      reuse: Boolean, whether to reuse the weights of a previous layer
        by the same name.

    Returns:
      A tensor with the same shape and data dtype as `inputs`.
    """
    with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]

        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
        beta = tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ((variance + epsilon) ** (.5))
        outputs = gamma * normalized + beta

    return outputs

eval.py file number of item diff between test and result

File eval.py line 48

for i in range(len(X) // hp.batch_size):

I am modify some of your code to fit with my data.
I do not know what it's mean. But i figure out that number of item in result file is not equal number of item in test file? It also right in your result and your test data.
Can u help me undertand this

This model cannot handle extremely large dataset

Just to point out that use
tf.convert_to_tensor -> tf.train.slice_input_producer -> tf.train.shuffle_batch
will meet an error

ValueError: Cannot create a tensor proto whose content is larger than 2GB.

if dataset is too large

Shape mismatch error in eval

I just downloaded the corpora and the trained model, and then ran the eval.py script. I'm getting the following error:

$ python eval.py 
Graph loaded
2017-07-31 22:29:04.547163: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-31 22:29:04.547205: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-31 22:29:04.547210: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-31 22:29:04.547215: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
WARNING:tensorflow:Standard services need a 'logdir' passed to the SessionManager
Traceback (most recent call last):
  File "eval.py", line 81, in <module>
    eval()
  File "eval.py", line 38, in eval
    sv.saver.restore(sess, tf.train.latest_checkpoint(hp.logdir))
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [9786,512] rhs shape= [9796,512]
	 [[Node: save/Assign_136 = Assign[T=DT_FLOAT, _class=["loc:@encoder/enc_embed/lookup_table"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](encoder/enc_embed/lookup_table, save/RestoreV2_136)]]

Caused by op u'save/Assign_136', defined at:
  File "eval.py", line 81, in <module>
    eval()
  File "eval.py", line 35, in eval
    sv = tf.train.Supervisor()
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 300, in __init__
    self._init_saver(saver=saver)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 448, in _init_saver
    saver = saver_mod.Saver()
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
    validate_shape=validate_shape)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
    use_locking=use_locking, name=name)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [9786,512] rhs shape= [9796,512]
	 [[Node: save/Assign_136 = Assign[T=DT_FLOAT, _class=["loc:@encoder/enc_embed/lookup_table"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](encoder/enc_embed/lookup_table, save/RestoreV2_136)]]

Any idea what might be wrong?

why "split" to get multi-head?

as the paper said or in some other implementation:
self.w_qs = nn.Linear(d_model, n_head * d_k)
the data size is larger.
but in this project, it is

       Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) # (h*N, T_q, C/h) 
       K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) # (h*N, T_k, C/h) 
       V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) # (h*N, T_k, C/h)

it's like using partial of Q/K/V to form one head.
Can anyone help to explain why it uses "split" and "concat" to get multi-head?

Thanks!

tqdm dependency

Not really a big issue: I believe the tqdm module also belongs in the requirements.txt file?

PS: I really like your implementation! It was quite "painless" to get this working, compared to many other seq2seq repo's out there

👍 🥇

Are the projection layers among multiple blocks shared?

Hi, I have a question about the codes.

        # Linear projections
        Q = tf.layers.dense(queries, num_units, activation=tf.nn.relu) # (N, T_q, C)
        K = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)
        V = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)

Is there a mechanism that tied these three layers between multiple blocks? It seems their parameters are not shared between different blocks. What should i do to tie them?

Thanks!

error running code

i followed all the steps but get the following error.

File "/home/ashishkr/projects/python/transformer/modules.py", line 227, in multihead_attention
tril = tf.contrib.linalg.LinearOperatorTriL(diag_vals).to_dense() # (T_q, T_k)
AttributeError: 'module' object has no attribute 'LinearOperatorTriL'

tf version 1.5.0 and 1.7.0
would be great if you can point out a solution

now ,my nltk,numpy ,regex and tensorflow all meet the requirements.i do not know why. I put the moduls of the "tf contrib. Linalg. LinearOperatorTriL " change to "tf.linalg.LinearOperatorLowerTriangular" ,the project can run.But I do not know if it is right or not.

now ,my nltk,numpy ,regex and tensorflow all meet the requirements.i do not know why.
I put the moduls of the "tf contrib. Linalg. LinearOperatorTriL " change to "tf.linalg.LinearOperatorLowerTriangular" ,the project can run.But I do not know if it is right or not.

train.py error

python train.py


Traceback (most recent call last):
File "train.py", line 147, in
g = Graph("train"); print("Graph loaded")
File "train.py", line 28, in init
self.decoder_inputs = tf.concat((tf.ones_like(self.y[:, :1])*2, self.y[:, :-1]), -1) # 2:
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 867, in concat
dtype=dtypes.int32).get_shape(
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 657, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 743, in _autopacking_conversion_function
return _autopacking_helper(v, inferred_dtype, name or "packed")
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 706, in _autopacking_helper
return gen_array_ops._pack(elems_as_tensors, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1680, in _pack
result = _op_def_lib.apply_op("Pack", values=values, axis=axis, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2382, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1783, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 596, in call_cpp_shape_fn
raise ValueError(err.message)
ValueError: Dimension 1 in both shapes must be equal, but are 1 and 9
From merging shape 0 with other shapes.


how to fix this error ?

Wrong Batch Normalization

In function normalize()
`

    with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]
        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
        print ('mean.get_shape()',mean.get_shape())
        beta= tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
        outputs = gamma * normalized + beta

`
but i think the second parameter of tf.nn.moments() should not be [-1], since we need to consider the batch information.
After modification the code shown as below:

`

 with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]
        axis = list(range(len(inputs_shape) - 1))
        mean, variance = tf.nn.moments(inputs, axis, keep_dims=True)
        print ('mean.get_shape()',mean.get_shape())
        beta= tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
        outputs = gamma * normalized + beta

`

Traing Problem

When I trained this model.something wrong with tensor

'Tensor' object has no attribute 'to_proto'
WARNING:tensorflow:Error encountered when serializing global_step.
Type is unsupported, or the types of the items don't match field type in CollectionDef.

How to solve it

beam search implementation

i really love the results of training my custom dataset.its simple and doesnt consume alot of gpu as no extra gpu is needed for dev and validation dataset.

Read the train sequences

How to deal with the training sequences with more than 10 words? In your codes, I found you have seemed to throw them away. I meat that if I set the length of sentence N, if the training sentence has N+3 words, the more 3 words have to be throw away?

A problem about decoder input

At training time, the input of decoder is right-shifted golden output embeddings. But at inference time, the input of decoder is zero embeddings. Is that right?

Wrong positional encoding

position_enc = np.array([
      [pos / np.power(10000, 2.*i/num_units) for i in range(num_units)]
      for pos in range(T)])

In the paper, section 3.5, the embedding (before the application of sine or cosine) for even positions in [0..num_units] indexed by 2*i is the same as that for odd positions.
The correct code should be

position_enc = np.array([
      [pos / np.power(10000, (i-i%2)/num_units) for i in range(num_units)]
      for pos in range(T)])

Why can't I run this file

Error said no tf contrib. Linalg. LinearOperatorTriL this function, I now use is tf1.8, be cancelled

Understanding load_train_data.

Hey,

I'm New to NMT. This is not an issue, just a newbie question to understand the function load_train_data()
I wanted to understand this pre-processing step,

de_sents = [re.sub("[^\s\p{Latin}']", "", line) for line in codecs.open(hp.source_train, 'r', 'utf-8').read().split("\n") if line and line[0] != "<"]

printing de_sents[1] gives me this "i n nn ini itn a in i n"

if I don't use the reg ex while creating de_sents,
de_sents = [line for line in codecs.open(hp.source_train, 'r', 'utf-8').read().split("\n") if line and line[0] != "<"]

de_sents[1] now becomes "Wir werden Ihnen einige Geschichten über das Meer in Videoform erzählen."

I wanted to know why we use the reg ex substitution step while creating de_sents or en_sents as the contents "i n nn ini itn a in i n" doesn't make much sense.

Thanks,

Evaluation

If I understood correctly, at evaluation you run

preds = np.zeros((hp.batch_size, hp.maxlen), np.int32)
for j in range(hp.maxlen):
_preds = sess.run(g.preds, {g.x: x, g.y: preds})
preds[:, j] = _preds[:, j]

Does that mean that the encoding part runs at every timestep of the decoding process ?

Thanks for the great work 👍

Embedding()

In the paper, the authors have specifically mentioned that they used learned embeddings to convert the input tokens and output tokens to vectors. Why did you learn these embeddings as opposed to using learned embeddings?

corpus_bleu module errors

the error occurs when running:
from nltk.translate.bleu_score import corpus_bleu

the logs is as show below:
File "C:\Users\10649\AppData\Roaming\Python\Python36\site-packages\sklearn\datasets\mldata.py", line 12, in <module> from urllib2 import HTTPError File "E:\python36\lib\site-packages\urllib2.py", line 220 raise AttributeError, attr ^ SyntaxError: invalid syntax

was this because python version or others?

The way feed data in training

Hello, In your code, 'batch_size' is the count of sentences . But in the paper, 'batch_size' means the count of words in a batch . so, Have you try that way described in paper ?

Add validation set

I can see the model is training for a fixed number of epochs and there is no validation set. How do you know when to stop model training, and also how to add one? Thank you!

Error for positional encoding

I am trying to run for sinusoid PE, but it throws the following error.

File "train.py", line 51, in __init__ scope="enc_pe") File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 885, in binary_op_wrapper y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y") File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 836, in convert_to_tensor as_ref=False) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 774, in _TensorTensorConversionFunction (dtype.name, t.dtype.name, str(t))) ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: 'Tensor("encoder/enc_pe/embedding_lookup:0", shape=(32, 49, 512), dtype=float64)'

About Train Time

Hi, when i run train.py it seem work , but log is still as below
0%| | 0/1703 [00:00<?, ?b/s] and not change

I think the training proccess seem not to go normally...
I dont know where wrong and i want to know the time of training is how long in one step
i used given dataset
But my environment is tensorflow version 1.8

Can anyone who run the training code nomally tell me?

PS GPU is 1070~~~~~

does the key masking work?

Hi @Kyubyong
as you can see the key masking code as following:

# Key Masking
key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)
key_masks = tf.tile(key_masks, [num_heads, 1]) # (h*N, T_k)
key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(queries)[1], 1]) # (h*N, T_q, T_k)

the params keys,is the sum of word_embedding and position_embedding. it means that even the word in a sentence is padding 0, as add postion_embedding to the word_embedding, there's no 0 vector for the final word_embedding. therefore, the key_masks must all be one, no zero! so I'm confused if the code works?

Linear transform with bias at multi-head attention

In the paper, Attention is All You Need, query, key, value are linear transformed without bias at the multi-head attention.
However, the variables in your code are transformed with bias. Is there any reason for using bias? Or is there something I do not know...?

Thanks.

transformer/modules.py

Lines 201 to 203 in 6672f93

Q = tf.layers.dense(queries, num_units, activation=tf.nn.relu) # (N, T_q, C)
K = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)
V = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)

possible error in positional encoding computation

Hi, I was just looking through the positional encoding code, and I see this line:

rad_block = tf.pow(tf.div(position_block, tf.multiply(10000, 1)), tf.div(unit_block, num_units // 2))

It looks wrong to me. Shouldn't it be something like the following?

 rad_block = tf.div(position_block, tf.pow(10000, tf.div(unit_block, num_units // 2))) 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.