thushv89 / attention_keras Goto Github PK

View Code? Open in Web Editor NEW

441.0 441.0 268.0 338 KB

Keras Layer implementation of Attention for Sequential models

Home Page: https://towardsdatascience.com/light-on-math-ml-attention-with-keras-dc8dbc1fad39

License: MIT License

Python 73.86% Dockerfile 0.49% Shell 2.70% Jupyter Notebook 22.95%

deep-learning keras lstm rnn tensorflow

attention_keras's Introduction

Hi there! Amazing fellow human being 👋

I am a data scientist and I love getting my hands 🤚 and feet 🦶 wet in problem infested ☠ murky waters 🚤 of data, to unearth invaluable information and actionable insights 💎.

What do I love?

🐍 Python
🤖 Machine Learning, Deep Learning, Artificial Intelligence
🧮 TensorFlow

Some of my publications

📙 Natural Language Processing with TensorFlow 2 (Packt)
📙 TensorFlow 2 in Action (Manning)
🎬 Machine Translation in Python (DataCamp)essing-TensorFlow-language-ebook/dp/B077Q3VZFR)
📑 Medium Articles
🎬 DeepLearningHero Youtube Channel

attention_keras's People

Contributors

Stargazers

Watchers

Forkers

alaughinghorse hailusong phamdinhkhanh zini-julia prithivirajdamodaran bczhu yejiachen sgabor1 cbeutenmueller lkh-1 nicklu-hq vish76 zameji erayon rudransh071 kmfeng maukwm nehatamore winggy rithesh-k tkoj4552 warezt ahmadhajmosa avontd2868 lujina1234 kalpit181 zeromaxinumxz piginzoo crystal22 1202zhyl pbabvey ankur287 qinxiwang wangkh1993 yangcx0901 thanhtd91 mma1979 kinokoberuji jbot19k-fork anupam-ranjan springri wjwenoch saugatapaul1010 k-obenschain cvelas31 maaslak omartahoun mrjaggu aceyiyuan burnedsheep ziadloo hbagaria4998 rooierakkert blossomwill pwforks okisy xifengbishu cm747 chhavijain2212 vaibhavkaushik2205 sohamb55 danjizquierdo alan-aisera winnawinnachickendinna yaroschak phillip1029 amareshp72 pankajpanku68 ashishkiitm laurao0404 zuhbeer alanrabelo alexmaehon preesee censomin lovelynns momolunar harikiranthatha fatimahabib ayan-banik jonathanhth myj-python anurag-anand71994 mbencherif liyuanz ezzeddin mrgwyn patrickcnkm bala-ai gottfrid91 sanyamharsha3011 sdasadia anki54 akhilbobby newstown19 isukrit manishkolachalam caoyj1010 ammarkamoona sanjeet3633

attention_keras's Issues

TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [1, Dimension(None)]. Consider casting elements to a supported type.

I try to add your attention layer into my encoder-decoder model. the code is like this

latent_dim = 256 # LSTM hidden units
dropout = .20
encoder_inputs = Input(shape=(None, n_inputs), name='encoder_inputs')
encoder_lstm = LSTM(latent_dim, dropout=dropout, return_sequences=True, return_state=True, name='encoder_lstm')
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, n_outputs), name='decoder_inputs')

decoder_lstm = LSTM(latent_dim, dropout=dropout, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)

attn_layer = AttentionLayer(name='attention_layer')
attn_outputs, attn_states = attn_layer([encoder_outputs, decoder_outputs])

decoder_concat_inputs = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_outputs])

decoder_dense = Dense(n_outputs, activation='softmax', name='softmax_layer') # 1 continuous output at each timestep
decoder_outputs = decoder_dense(decoder_concat_inputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

but i meet the error like

TypeError Traceback (most recent call last)
/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
526 try:
--> 527 str_values = [compat.as_bytes(x) for x in proto_values]
528 except TypeError:

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py in (.0)
526 try:
--> 527 str_values = [compat.as_bytes(x) for x in proto_values]
528 except TypeError:

/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py in as_bytes(bytes_or_text, encoding)
60 raise TypeError('Expected binary or unicode string, got %r' %
---> 61 (bytes_or_text,))
62

TypeError: Expected binary or unicode string, got 1

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
in ()
29 # define attention layer
30 attn_layer = AttentionLayer(name='attention_layer')
---> 31 attn_outputs, attn_states = attn_layer([encoder_outputs, decoder_outputs])
32
33 # concatenate attention output and decoder output as an input to the softmax layer

~/.local/lib/python3.6/site-packages/keras/engine/base_layer.py in call(self, inputs, **kwargs)
455 # Actually call the layer,
456 # collecting output(s), mask(s), and shape(s).
--> 457 output = self.call(inputs, **kwargs)
458 output_mask = self.compute_mask(inputs, previous_mask)
459

in call(self, inputs, verbose)
95
96 fake_state_c = create_inital_state(encoder_out_seq, encoder_out_seq.shape[-1])
---> 97 fake_state_e = create_inital_state(encoder_out_seq, encoder_out_seq.shape[1]) # <= (batch_size, enc_seq_len, latent_dim
98
99 """ Computing energy outputs """

in create_inital_state(inputs, hidden_size)
91 fake_state = K.sum(fake_state, axis=[1, 2]) # <= (batch_size)
92 fake_state = K.expand_dims(fake_state) # <= (batch_size, 1)
---> 93 fake_state = K.tile(fake_state, [1, hidden_size]) # <= (batch_size, latent_dim
94 return fake_state
95

~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in tile(x, n)
2189 if isinstance(n, int):
2190 n = [n]
-> 2191 return tf.tile(x, n)
2192
2193

/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py in tile(input, multiples, name)
8803 if _ctx is None or not _ctx._eager_context.is_eager:
8804 _, _, _op = _op_def_lib._apply_op_helper(
-> 8805 "Tile", input=input, multiples=multiples, name=name)
8806 _result = _op.outputs[:]
8807 _inputs_flat = _op.inputs

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
511 except TypeError as err:
512 if dtype is None:
--> 513 raise err
514 else:
515 raise TypeError(

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
508 dtype=dtype,
509 as_ref=input_arg.is_ref,
--> 510 preferred_dtype=default_dtype)
511 except TypeError as err:
512 if dtype is None:

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
1144
1145 if ret is None:
-> 1146 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1147
1148 if ret is NotImplemented:

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
227 as_ref=False):
228 _ = as_ref
--> 229 return constant(v, dtype=dtype, name=name)
230
231

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name, verify_shape)
206 tensor_value.tensor.CopyFrom(
207 tensor_util.make_tensor_proto(
--> 208 value, dtype=dtype, shape=shape, verify_shape=verify_shape))
209 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
210 const_tensor = g.create_op(

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
529 raise TypeError("Failed to convert object of type %s to Tensor. "
530 "Contents: %s. Consider casting elements to a "
--> 531 "supported type." % (type(values), values))
532 tensor_proto.string_val.extend(str_values)
533 return tensor_proto

TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [1, Dimension(None)]. Consider casting elements to a supported type.

Do you have any idea to solve that? THX a lot!

License ?

Hi - Thank you for this awesome work! Can you add a licensing file to your code so that we can understand the reuse policies ?

ValueError: The first argument to `Layer.call` must always be passed.

Hello, thank you for sharing this.

I am getting this error when am trying to run this in Colab
"ValueError: The first argument to Layer.call must always be passed."

This is my model code:
from attention import AttentionLayer

from keras import backend as K
K.clear_session()
latent_dim = 100
embedding_dim=100

Encoder

encoder_inputs = Input(shape=(max_len_text,))
enc_emb = Embedding(x_voc_size, latent_dim,trainable=True)(encoder_inputs)

#LSTM 1
encoder_lstm1 = LSTM(latent_dim,return_sequences=True,return_state=True)
encoder_output1, state_h1, state_c1 = encoder_lstm1(enc_emb)

#LSTM 2
encoder_lstm2 = LSTM(latent_dim,return_sequences=True,return_state=True)
encoder_output2, state_h2, state_c2 = encoder_lstm2(encoder_output1)

#LSTM 3
encoder_lstm3=LSTM(latent_dim, return_state=True, return_sequences=True)
encoder_outputs, state_h, state_c= encoder_lstm3(encoder_output2)

Set up the decoder.

decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(y_voc_size, latent_dim,trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)

#LSTM using encoder_states as initial state
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=[state_h, state_c])

#Attention Layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer()([encoder_outputs, decoder_outputs])

Concat attention output and decoder LSTM output

decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])

#Dense layer
decoder_dense = TimeDistributed(Dense(y_voc_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)

Define the model

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

Please advice if I am missing something, thank you

Attention layer does not support masking.

Any ideas about implementing it??

Support for using None in sequence length

The problem
Tensorflow has an awesome feature where one can use None as sequence length, this allows using variable sequence lengths across different batches. The code of the AttentionLayer gives an error when trying to use None as sequence length.

The error trace

Desktop\attention_keras\examples\nmt_bidirectional\layers\attention.py:98 create_inital_state  *
        fake_state = K.tile(fake_state, [1, hidden_size])  # <= (batch_size, latent_dim
    AppData\Local\Temp\tmptt0gyc3b.py:153 create_inital_state
        fake_state = ag__.converted_call(K.tile, create_inital_state_scope.callopts, (fake_state, [1, hidden_size]), None, create_inital_state_scope)
    anaconda3\lib\site-packages\tensorflow_core\python\keras\backend.py:3014 tile
        return array_ops.tile(x, n)
    anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_array_ops.py:11310 tile
        "Tile", input=input, multiples=multiples, name=name)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py:530 _apply_op_helper
        raise err
    anaconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py:527 _apply_op_helper
        preferred_dtype=default_dtype)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1296 internal_convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\constant_op.py:286 _constant_tensor_conversion_function
        return constant(v, dtype=dtype, name=name)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\constant_op.py:227 constant
        allow_broadcast=True)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\constant_op.py:265 _constant_impl
        allow_broadcast=allow_broadcast))
    anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_util.py:545 make_tensor_proto
        "supported type." % (type(values), values))

    TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [1, None]. Consider casting elements to a supported type.

AttributeError: 'tuple' object has no attribute '_keras_shape'

I'm getting this error when using this code:

def adjust(x):
     return K.cast(K.reshape(x, (batch_size, 697, 64)), dtype='float32')

input_1 = Input(shape=(maxlen,), name="input1")
x_ = Embedding(vocab_size, embed_size, weights=[embed], name='embedding_1', trainable=False)(input_1)
encoder_out, forward_h, backward_h = Bidirectional(GRU(32, return_sequences=True, return_state=True))(_x)
decoder_out, forward_h, backward_h = Bidirectional(GRU(32, return_sequences=True, return_state=True))(_x, initial_state=[forward_h, backward_h])
encoder_out = Lambda(adjust)(encoder_out)
decoder_out = Lambda(adjust)(decoder_out)
attn_out, attn_states = AttentionLayer()([encoder_out, decoder_out])
a = concatenate([decoder_out, attn_out], axis=1)

I'm under the impression this has something to do with how I'm reshaping the encoder & decoder inputs but when I don't reshape them to give a defined shape their shape is == to (?, ?, 64) and I get the error "ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 64)" followed by "TypeError: int returned non-int (type NoneType)".

When I use:

def adjust(x):
     return K.cast(tf.shape(x), dtype='float32')

Instead, my encoder & decoder shapes become (3,) which is quite confusing.

I'm assuming that this is my mistake but if it isn't then I thought i'd make you aware of it! Great repository btw! I appreciate all & any advice!

Getting "TypeError: Exception encountered when calling layer "tf.keras.backend.rnn" (type TFOpLambda)" when I employ the Attention Layer

I'm trying to re-implement the text summarization tutorial here. Getting the following error when I employ the Attention Layer:

/usr/local/lib/python3.7/dist-packages/keras/engine/keras_tensor.py in __array__(self, dtype)
    253   def __array__(self, dtype=None):
    254     raise TypeError(
--> 255         f'You are passing {self}, an intermediate Keras symbolic input/output, '
    256         'to a TF API that does not allow registering custom dispatchers, such '
    257         'as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. '

TypeError: Exception encountered when calling layer "tf.keras.backend.rnn" (type TFOpLambda).

You are passing KerasTensor(type_spec=TensorSpec(shape=(None, 101), dtype=tf.float32, name=None), name='tf.compat.v1.nn.softmax_1/Softmax:0', description="created by layer 'tf.compat.v1.nn.softmax_1'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

Call arguments received:
  • step_function=<function AttentionLayer.call.<locals>.energy_step at 0x7f1d5ff279e0>
  • inputs=tf.Tensor(shape=(None, None, 256), dtype=float32)
  • initial_states=['tf.Tensor(shape=(None, 101), dtype=float32)']
  • go_backwards=False
  • mask=None
  • constants=None
  • unroll=False
  • input_length=None
  • time_major=False
  • zero_output_for_mask=False

How can I overcome this error? I've added my software stack below:

- OS: macOS
- TensorFlow: 2.8.0
- Keras: 2.8.0
- Python Version: 3.7.12 (default, Jan 15 2022, 18:48:18)

Help Required

Hello Thushan,

I hope your doing good.

I went through your article on Attention mechanism. It was really informative.

I have applied attention mechanism on a classification task and wanted to visualize the attention weights. I tried several ways but failed. I found your article very interesting with clear understanding.

Would be happy to help with this?

Thank You.

Best Wishes,
Dhvani

Attention implementation for time series

I was wondering if it would be possible to apply the code in this repository to a multivariate time series problem. I have been trying to do it myself but I am not sure if this is possible with the code given here.

I have managed to implement and compile the model, but the weights do not seem to update.

That's why I would like to confirm if it is possible to directly apply this code to a time series problem or if it could be possible with some changes.

"tf.keras.backend.rnn"

no access https://github.com/thushv89/attention_keras/tree/tf2-fix

index out of range

Hey

Just trying to test this out with your example.
I downloaded the data set you mentioned and im getting:

File "train.py", line 124, in
tr_en_text, tr_fr_text, ts_en_text, ts_fr_text = get_data(train_size=train_size)
File "train.py", line 38, in get_data
tr_fr_text = [fr_text[ti] for ti in train_inds]
File "train.py", line 38, in
tr_fr_text = [fr_text[ti] for ti in
IndexError: list index out of range

I've tried changing the train_size, which doesnt help

AttentionLayer does not support model.save() and load_model()

Not able to use the load_model() function in Keras, as the AttentionLayer() definition does not have get_config() for proper saving of model.
More details on this problem: keras-team/keras#4871
For now, I can only use save_weights() and load_weights() for this custom layer, for which the model has to be built first every time it's loaded.
If anyone has a modified version that supports this please consider merging with main branch

module object is not callable

Hi,
i used AttentionLayer(name='attention_layer') and i received following error for this line :
TypeError: 'module' object is not callable
why?

infer_nmt for bidirectional

Hi,
Can you update your infer_nmt() for bidirectional GRU? I have updated one. However, it is not working.
Thank you

None Input Shape

Hi,
First Thanks a lot for your contribution it's really appreciated.

I'm training a sequence to sequence model and i would like to have a None parameter in my input_shape and it is currently not possible.
Any possible advice on this issue ?

Thank you.
Best.

Translation is wrong + error ?

Hi,

Nice module, however I have some concern with the examples.
I am using keras 2.2.4 and TF 1.13.1 GPU.
The nmt_bidirectional example is failing:

examples.nmt_bidirectional.train | INFO | Loss in epoch 5: 0.00024022466588457427
examples.nmt_bidirectional.train | INFO | Translating: the united states is sometimes chilly during december , but it is sometimes freezing in june .

examples.nmt_bidirectional.train | INFO |       French:
Traceback (most recent call last):
  File "train.py", line 164, in <module>
    plot_attention_weights(test_en_seq, attn_weights, en_index2word, fr_index2word)
  File "/gpfs/workdir/popineau/attention_keras/examples/utils/model_helper.py", line 17, in plot_attention_weights
    assert len(attention_weights) != 0, assert_msg
AssertionError: Your attention weights was empty. Please check if the decoder produced  a proper translation

Regards,

I got this error: 'tuple' object has no attribute '_keras_shape' any help please?

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Full StackTrace

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Not working for different timesteps at encoder and decoder

It can't handle sequences of different lengths at encoder and decoder. For example, when en_timesteps, fr_timesteps are different it can't concatenate encoder_output and decoder_output, as an input to attention layer

TypeError: object of type 'Concatenate' has no len()

I was trying to run a NMT model using GRU and the posted Attention Layer. But I am getting an issue citing "TypeError: object of type 'Concatenate' has no len()" . I am using sparse_categorical_crossentropy as my data is integer encoded and not one-hot encoded. I am very new to keras, and it would be helpful if you can point out my mistake. I am attaching the code snippet below. Another question is that, since I have written return_sequences=True in the case of decoder, do I need to use TimeDistributed Dense?

enc_inp=Input(shape=(286,))
enc_emb=Embedding(input_dim=91,output_dim=100)(enc_inp)
enc_out,forward_h,backward_h=Bidirectional(GRU(64, return_sequences=True,return_state=True))(enc_emb)
state_h = Concatenate()([forward_h, backward_h])
encoder_states=[state_h]

dec_inp=Input(shape=(352,))
dec_emb=Embedding(input_dim=117,output_dim=100)(dec_inp)
dec_out,_=GRU(128,return_sequences=True,return_state=True)(dec_emb,initial_state=encoder_states)

attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([enc_out, dec_out])

decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([dec_out, attn_out])

dec_dense=TimeDistributed(Dense(117, activation='softmax'))(decoder_concat_input)

model=Model([enc_inp,dec_inp],dec_dense)
model.compile(optimizer='nadam', loss='sparse_categorical_crossentropy', metrics=["sparse_categorical_accuracy"])
model.fit([en_inp,fr_inp],fr_out, batch_size=512, epochs=25, validation_split=0.1)
model.save('model.h5')`

Minor issues - but nothing's broken

Thanks for this amazing code. I was working on your implementation when I came across a bunch of issues in it. Since my changes might not be to your liking, I'm not going to make a PR (my implementation is a little bit different). But I thought I should report the issues I've faced.

Before I list the issues, you need to know that none of them are breaking the implementation, meaning that your code works perfectly just the way it is. But it's just that if someone wants to make a change to it (like me), they'll have a headache. And also, some of the issues are just wrong.

1. The way the fake states are initialized is unnecessary:

def create_inital_state(inputs, hidden_size):
    # We are not using initial states, but need to pass something to K.rnn funciton
    fake_state = K.zeros_like(inputs)  # <= (batch_size, enc_seq_len, latent_dim
    fake_state = K.sum(fake_state, axis=[1, 2])  # <= (batch_size)
    fake_state = K.expand_dims(fake_state)  # <= (batch_size, 1)
    fake_state = K.tile(fake_state, [1, hidden_size])  # <= (batch_size, latent_dim
    return fake_state

fake_state_c = create_inital_state(encoder_out_seq, encoder_out_seq.shape[-1])
fake_state_e = create_inital_state(encoder_out_seq, encoder_out_seq.shape[1])  # <= (batch_size, enc_seq_len, latent_dim

Just simply initialize the tensors with zeros:

fake_state_e = K.zeros_like(K.placeholder(shape=(decoder_out_seq.shape[0], 1)))
fake_state_c = K.zeros_like(K.placeholder(shape=(decoder_out_seq.shape[0], 1)))

2. In both your step functions you return the state like this:

def energy_step(inputs, states):
    ...
    return e_i, [e_i]

def context_step(inputs, states):
    ...
    return c_i, [c_i]

While this does not throw any errors (because the states are discarded), but this is actually wrong. You just have to return the states, like this:

def energy_step(inputs, states):
    ...
    return e_i, states

def context_step(inputs, states):
    ...
    return c_i, states

3. You already have a PR on this. I'm just going to mention it for the sake of completeness. Your output shape is this:

def compute_output_shape(self, input_shape):
    """ Outputs produced by the layer """
    return [
        tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[1][2])),
        tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
    ]

But it should be this:

def compute_output_shape(self, input_shape):
    """ Outputs produced by the layer """
    return [
        tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][2])),
        tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
    ]

Thanks again. I learned a lot from your code.

Failed to convert object of type

I'm trying to use this attention implementation to build a model for converting graphemes to phonemes, but I got this error:
TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [1, Dimension(None)]. Consider casting elements to a supported type.

any help?

are some utils missing?

from utils.data_helper import read_data, sents2sequences

Thanks

TypeError: int returned non-int (type NoneType)

First thank you for your implement of attention.
when I built a lstm seq2seq chatbot use your implement, I got an error in line
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm])
which throw me an error like
TypeError: __int__ returned non-int (type NoneType)
And my core code here:
embed_layer = Embedding(input_dim=vocab_size, output_dim=50, trainable=True)
embed_layer.build((None,)) embed_layer.set_weights([embedding_matrix])

LSTM_cell = Bidirectional(LSTM(128, return_sequences=True, return_state=True)) LSTM_decoder = LSTM(256, return_sequences=True, return_state=True)

dense = TimeDistributed(Dense(vocab_size, activation='softmax'))

input_context = Input(shape=(maxLen, ), dtype='int32', name='input_context') #maxLen=20

input_target = Input(shape=(maxLen, ), dtype='int32', name='input_target')

input_context_embed = embed_layer(input_context) input_target_embed = embed_layer(input_target)

encoder_out, forward_h, forward_c, backward_h, backward_c = LSTM_cell(input_context_embed) context_h = Concatenate()([forward_h, backward_h]) context_c = Concatenate()([forward_c, backward_c])

decoder_lstm, _, _ = LSTM_decoder(input_target_embed, initial_state=[context_h, context_c])

print('decoder_lstm.shape: ', decoder_lstm.shape) #(?, ?, 256) print('encoder_out.shape: ', encoder_out.shape) #(?, ?, 256)

# ***********************Start Code Here**********************

''' Attention layer ***** A '''
attn_layer = AttentionLayer(name='attention_layer') attn_out, attn_states = attn_layer([encoder_out, decoder_lstm]) merge = Concatenate(axis=-1, name='concat_layer' )([decoder_lstm, attn_out])

# ***********************End Code Here**********************

output = dense(merge) model.summary() model = Model([input_context, input_target, s0, c0], output)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit([context_, final_target_], outs, epochs=2, batch_size=128, validation_split=0.2)

And the error detail below:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\array_ops.py in zeros(shape, dtype, name)
1810 shape = constant_op._tensor_shape_tensor_conversion_function(
-> 1811 tensor_shape.TensorShape(shape))
1812 except (TypeError, ValueError):

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\constant_op.py in _tensor_shape_tensor_conversion_function(s, dtype, name, as_ref)
324 raise ValueError(
--> 325 "Cannot convert a partially known TensorShape to a Tensor: %s" % s)
326 s_list = s.as_list()

ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 256)

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
in
35 '''
36 attn_layer = AttentionLayer(name='attention_layer')
---> 37 attn_out, attn_states = attn_layer([encoder_out, decoder_lstm])
38 merge = Concatenate(axis=-1,
39 name='concat_layer'

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py in call(self, inputs, *args, **kwargs)
552 # In graph mode, failure to build the layer's graph
553 # implies a user-side bug. We don't catch exceptions.
--> 554 outputs = self.call(inputs, *args, **kwargs)
555 else:
556 try:

~\AppData\Roaming\Python\Python36\site-packages\keras\layers\attention.py in call(self, inputs, verbose)
93
94 # We are not using initial states, but need to pass something to K.rnn funciton
---> 95 fake_state_c = K.zeros(shape=(encoder_out_seq.shape[0], encoder_out_seq.shape[-1]))
96 fake_state_e = K.zeros(shape=(encoder_out_seq.shape[0], encoder_out_seq.shape[1]))
97

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\backend.py in zeros(shape, dtype, name)
1066 dtype = floatx()
1067 tf_dtype = dtypes_module.as_dtype(dtype)
-> 1068 v = array_ops.zeros(shape=shape, dtype=tf_dtype, name=name)
1069 if py_all(v.shape.as_list()):
1070 return variable(v, dtype=dtype, name=name)

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\array_ops.py in zeros(shape, dtype, name)
1812 except (TypeError, ValueError):
1813 # Happens when shape is a list with tensor elements
-> 1814 shape = ops.convert_to_tensor(shape, dtype=dtypes.int32)
1815 if not shape._shape_tuple():
1816 shape = reshape(shape, [-1]) # Ensure it's a vector

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
1037 ValueError: If the value is a tensor not of given dtype in graph mode.
1038 """
-> 1039 return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
1040
1041

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py in convert_to_tensor_v2(value, dtype, dtype_hint, name)
1095 name=name,
1096 preferred_dtype=dtype_hint,
-> 1097 as_ref=False)
1098
1099

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_symbolic_tensors)
1173
1174 if ret is None:
-> 1175 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1176
1177 if ret is NotImplemented:

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
302 as_ref=False):
303 _ = as_ref
--> 304 return constant(v, dtype=dtype, name=name)
305
306

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\constant_op.py in constant(value, dtype, shape, name)
243 """
244 return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 245 allow_broadcast=True)
246
247

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
281 tensor_util.make_tensor_proto(
282 value, dtype=dtype, shape=shape, verify_shape=verify_shape,
--> 283 allow_broadcast=allow_broadcast))
284 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
285 const_tensor = g.create_op(

c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape, allow_broadcast)
465 else:
466 _AssertCompatible(values, dtype)
--> 467 nparray = np.array(values, dtype=np_dt)
468 # check to them.
469 # We need to pass in quantized values as tuples, so don't apply the shape

TypeError: int returned non-int (type NoneType)`

Thank you for your contribution!

TypeError: The added layer must be an instance of class Layer.

Hey, I included your code as a Python package in my Pycharm project and imported it like this from my_project.attention.layers.attention import AttentionLayer in my script. Then I created the attention layer in the way that is shown in the README: attn_layer = AttentionLayer(name='attention_layer');.

The code for creating my model is:

    embedding_layer = set_up_embedding_layer(word_idx_training, em_dim, maximum_sequence_length, embedding_type);
    model.add(embedding_layer);
    attn_layer = AttentionLayer(name='attention_layer');
    model.add(attn_layer);
    model.add(LSTM(16));
    model.add(Dense(int(class_number), activation='softmax'));

When I run the script, I am getting an error message that the attention layer is not of class Layer. I understand that the layer is intended to work with GRUs, but nonetheless it looks like your class AttentionLayer is of type Layer. Is this correct? And, can I use the attention mechanism as I show above, or only by concatanating the attention input and the decoder GRU output?

Thanks

Installation failed

Hey, thanks for sharing your code for attention in Keras.
I tried to install the package with pip install git+https://github.com/thushv89/attention_keras.git, (having already activated my current conda environment for deep learning experiments) and this is the error I get:

  Cloning https://github.com/thushv89/attention_keras.git to /tmp/pip-req-build-qrcv9rfd
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/konstantina/anaconda3/envs/my_env/lib/python3.6/tokenize.py", line 452, in open
        buffer = _builtin_open(filename, 'rb')
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-req-build-qrcv9rfd/setup.py'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-qrcv9rfd/

Any idea why?

Very important!!!!

How can I add the attention layer in Google Colab pleaseeee!!!

The vocab size definition for English and French Tokenizers is incorrect

The variables en_vsize and fr_vsize are dependant on the chosen size of the training set and therefore will vary for each test/train ratio. It is better to calculate these 2 variables by fitting the entire dataset into the tokenizer in order to compute vocab size for the entire dataset and then storing statically as a property of the given dataset

This avoids unseen word error while testing/validating the model and removes unnecessary vocab size dependancies.

AttributeError: 'tuple' object has no attribute 'layer'

Hi, I get the following error message. Can you help resolve this?
Thanks.

Dean

Code:

from keras.layers import Bidirectional, CuDNNLSTM
from keras.callbacks import History, ReduceLROnPlateau, EarlyStopping
from keras.optimizers import RMSprop, Adam
from keras import regularizers

#callbacks
h = History()
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5,patience=10, min_lr=0.000001, verbose=1, min_delta=1e-5)
es = EarlyStopping(monitor='val_loss', min_delta=1e-6, patience=10, verbose=0, mode='auto')


#---------------------- Encoder--------------------------#
#peviously used 250
lstm_dim = 500
input_shape= pr_train.shape[1:]

#input shape
encoder_input = Input(shape=input_shape)

#first encoder layer
encoder_LSTM = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_01'))
encoder1_output, forward_h1, forward_c1, backward_h1, backward_c1 = encoder_LSTM (encoder_input)


#second encoder layer
encoder_LSTM2 = Bidirectional(CuDNNLSTM(lstm_dim // 2, return_state=True, return_sequences=True, name='bd_enc_LSTM_02'))
encoder2_output, forward_h2, forward_c2, backward_h2, backward_c2 = encoder_LSTM2 (encoder1_output)

# Concatenate all states together
encoder_states = Concatenate(axis=-1)([forward_h1, forward_c1, forward_h2, forward_c2,
                                       backward_h1, backward_c1, backward_h2, backward_c2])

encoder_dense = Dense(lstm_dim, activation='relu', name="enc_dense")(encoder_states)

#---------------------- Decoder--------------------------#

#states for the first LSTM layer
decoder_input = Input(shape=(None, len(charset))) #teacher training
dense_h = Dense(lstm_dim, activation='relu', name="dec_dense_h1")
dense_c = Dense(lstm_dim, activation='relu', name="dec_dense_c1")
state_h = dense_h(encoder_dense)
state_c = dense_c(encoder_dense)
states1 =[state_h, state_c]


#states for the second LSTM layer
dense_h2 = Dense(lstm_dim, activation='relu', name="dec_dense_h2")
dense_c2 = Dense(lstm_dim, activation='relu', name="dec_dense_c2")
state_h2 = dense_h2(encoder_dense)
state_c2 = dense_c2(encoder_dense)
states2 =[state_h2, state_c2]

#this goes through a decoding lstm
decoder_LSTM1 = CuDNNLSTM(lstm_dim, return_sequences=True, name='bd_dec_LSTM_01')
decoder1_output = decoder_LSTM1(decoder_input, initial_state=states1)


#couple the first LSTM with the 2nd LSTM
decoder_LSTM2 = CuDNNLSTM(lstm_dim, return_sequences=True, name='bd_dec_LSTM_02')
decoder2_output = decoder_LSTM2(decoder1_output, initial_state=states2) 

#attention layers
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder2_output, decoder2_output])
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_out, attn_out])


#pass hidden states of decoder2_outputs to dense layer with softmax
decoder_dense = Dense(len(charset), activation='softmax', name="dec_dense_softmax")
decoder_out = decoder_dense (decoder_concat_input)



#----------------------compilations------------#

#model compilation (canonical)
model = Model(inputs=[encoder_input, decoder_input], outputs=[decoder_out])
#Run training
start = time.time()
opt=Adam(lr=0.01) #try 0.005
#test other optimizers (rmsprop)
model.compile(optimizer=opt, loss='categorical_crossentropy')

#fit
model.fit(x=[pr_train, rx_train], 
          y=rx_target, 
          batch_size=250, 
          epochs=2,
          shuffle = True,
          validation_split=0.2,
          callbacks = [h, rlr, es])

end = time.time()
print(end - start)

Error:

AttributeError Traceback (most recent call last)
in
64 #attention layers
65 attn_layer = AttentionLayer(name='attention_layer')
---> 66 attn_out, attn_states = attn_layer([encoder2_output, decoder2_output])
67 decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_out, attn_out])
68

~/miniconda3/envs/tf_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, inputs, *args, **kwargs)
661 kwargs.pop('training')
662 inputs, outputs = self.set_connectivity_metadata(
--> 663 inputs, outputs, args, kwargs)
664 self._handle_activity_regularization(inputs, outputs)
665 self._set_mask_metadata(inputs, outputs, previous_mask)

~/miniconda3/envs/tf_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in set_connectivity_metadata(self, inputs, outputs, args, kwargs)
1706 kwargs.pop('mask', None) # mask should not be serialized.
1707 self._add_inbound_node(
-> 1708 input_tensors=inputs, output_tensors=outputs, arguments=kwargs)
1709 return inputs, outputs
1710

~/miniconda3/envs/tf_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _add_inbound_node(self, input_tensors, output_tensors, arguments)
1793 """
1794 inbound_layers = nest.map_structure(lambda t: t._keras_history.layer,
-> 1795 input_tensors)
1796 node_indices = nest.map_structure(lambda t: t._keras_history.node_index,
1797 input_tensors)

~/miniconda3/envs/tf_gpu/lib/python3.7/site-packages/tensorflow/python/util/nest.py in map_structure(func, *structure, **kwargs)
513
514 return pack_sequence_as(
--> 515 structure[0], [func(*x) for x in entries],
516 expand_composites=expand_composites)
517

~/miniconda3/envs/tf_gpu/lib/python3.7/site-packages/tensorflow/python/util/nest.py in (.0)
513
514 return pack_sequence_as(
--> 515 structure[0], [func(*x) for x in entries],
516 expand_composites=expand_composites)
517

~/miniconda3/envs/tf_gpu/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in (t)
1792 call method of the layer at the call that created the node.
1793 """
-> 1794 inbound_layers = nest.map_structure(lambda t: t._keras_history.layer,
1795 input_tensors)
1796 node_indices = nest.map_structure(lambda t: t._keras_history.node_index,

AttributeError: 'tuple' object has no attribute 'layer'

Bahdanau attention

attention_keras/examples/nmt/model.py

Line 30 in f7c6f40

 decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_out, attn_out]) 

Is the implementation here a variation of the Bahdanau attention paper?. As per the paper during training the alignment vector is concatenated with the embedded target of the previous timestep then this vector is supplied to the decoder.

attention_keras/examples/nmt/model.py

Line 35 in f7c6f40

decoder_pred = dense_time(decoder_concat_input)

In the code base here, this concatenated vector is directly 'softmaxed' to get the predicted output.

Are these implementation fundamentally the same?

AttributeError: 'tuple' object has no attribute '_keras_mask'

I'm having this problem after reshaping the tensors 'encoder_out' and 'decoder_out', in order for its shapes to be explicit and avoid the error adressed in https://github.com/thushv89/attention_keras/issues/9.
But then i get the following error:

File "C:\Users\ganem\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\layers\base.py", line 738, in call
outputs._keras_mask = output_mask # pylint: disable=protected-access

AttributeError: 'tuple' object has no attribute '_keras_mask'

Heres the full code:

`
#import modules

import tensorflow as tf
from attention import AttentionLayer
from tensorflow.python.keras.layers import Input, GRU, Dense, Concatenate, TimeDistributed
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.callbacks import EarlyStopping
from tensorflow.python.keras.layers import Dropout, Concatenate ,Dense ,LSTM,MaxPooling1D ,Conv1D, MaxPooling1D,Input, TimeDistributed, Flatten, Conv2D,Reshape,Permute, Flatten
from tensorflow.python.keras.models import Model, Sequential
from tensorflow.python.keras.utils import plot_model
import TimeSeriesUtils as TSU
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np

set model params

latent_dim = 10
look_back_period = 12
pred_period = 8
dropout_rate = 0.5
batch_size = 128

#read data and convert to np array

data = pd.read_table(r'~\gas_prices_brazil\2004-2019.tsv',delim_whitespace=False,header=0)
data['DATA INICIAL'] = pd.to_datetime(data['DATA INICIAL'])
data['DATA FINAL'] = pd.to_datetime(data['DATA FINAL'])
data.set_index('DATA FINAL',inplace = True)
data.replace('-',np.nan, inplace = True)
number_columns = []
for column in data.columns:
try:
data[column] = data[column].astype(float)
number_columns.append(column)
print(column)
except:
pass

features = ['PREÇO MÉDIO REVENDA', 'DESVIO PADRÃO REVENDA', 'PREÇO MÍNIMO REVENDA', 'PREÇO MÁXIMO REVENDA', 'MARGEM MÉDIA REVENDA', 'COEF DE VARIAÇÃO REVENDA', 'PREÇO MÉDIO DISTRIBUIÇÃO', 'DESVIO PADRÃO DISTRIBUIÇÃO', 'PREÇO MÍNIMO DISTRIBUIÇÃO', 'PREÇO MÁXIMO DISTRIBUIÇÃO', 'COEF DE VARIAÇÃO DISTRIBUIÇÃO']
train_data = data[(data['ESTADO'] == 'SAO PAULO') & (data['PRODUTO'] == 'GASOLINA COMUM')].resample('W').mean()
train_data = train_data[features]

scaled_featues = []
dependent_variable = ['PREÇO MÉDIO REVENDA']
train_data = train_data[features+dependent_variable]
train_data = train_data.fillna(method = 'ffill', limit = 300)
train_data = train_data.astype('float')
assert not any(any(train_data.isna()))

X = TSU.chunk_data_by_date(train_data.values,pred_period,look_back_period)
X_train, y_train, X_val, y_val = X
X_train_teacher_forcing,X_val_teacher_forcing,X_train_no_teacher_forcing,X_val_no_teacher_forcing = TSU.teacher_forcing_generator(y_train,y_val)

defining layers

encoder_input = Input(shape = (X_train.shape[1],X_train.shape[2]),batch_size = 128,name = 'encoder_input')
decoder_input = Input(shape = (look_back_period,1),batch_size = 128,name = 'decoder_input')

encoderLSTM = LSTM(units = latent_dim,return_state = True,return_sequences = True,name = 'enc_LSTM',dropout = dropout_rate)
attention1 = AttentionLayer()
decoderLSTM = LSTM(units = latent_dim,return_state = True,return_sequences = True,name = 'dec_LSTM',dropout = dropout_rate)
dense_output = TimeDistributed(Dense(1, activation = 'relu'),name = 'time_distirbuted_dense_output')

building model

encoder_out, encoder_states = encoderLSTM(encoder_input)[0], encoderLSTM(encoder_input)[1:]

decoder_out, decoder_states = decoderLSTM(decoder_input,initial_state = encoder_states)[0],decoderLSTM(decoder_input,initial_state = encoder_states)[1:]

output = dense_output(decoder_out)

explicitly define tensor shapes as TensorShape([Dimension(128), Dimension(12), Dimension(10)]) and TensorShape([Dimension(128), Dimension(8), Dimension(10)])

decoder_out.set_shape((tf.Dimension(batch_size),tf.Dimension(8),tf.Dimension(10)))
encoder_out.set_shape((tf.Dimension(batch_size),tf.Dimension(12),tf.Dimension(10)))

attention1_out, attention1_states = attention1([encoder_out, decoder_out])
`

i get the error in the last line

any heads up on this issue?

Masking Support

Hi @thushv89,

Thank you for your code. And the blog post describing it here.
I wonder if your implementation supports masking?

In Keras, if mask_zero=True is set in the embeddings layer then proper masking should be supported by the consecutive layers.
Any thoughts?

Best Regards,
Omnia

state

a(s_(i-1), h_j) = v_atanh(W_as_(i-1) + U_a*h_j)

after looking the code, I think you use the inputs state as s_i, I do not know if you confuse the S_i and H_i?