philipperemy / keras-attention Goto Github PK

Keras Attention Layer (Luong and Bahdanau scores).

License: Apache License 2.0

Python 100.00%

keras keras-neural-networks attention-mechanism attention-model deep-learning

keras-attention's Issues

Attention Visualization

In the final visualization of the attention weights it says this is showing the attention over input dimensions but the x axis goes to the length of the time steps. So it is showing how important the time step is and not each feature. Shouldn't it be the other way? Where each x is a feature?

When I apply this to my own dataset it just says the most recent time steps are the most important.

get_activations not producing list

Thanks for uploading this to github! Great for learning more about attention models. When I run attention_dense.py, however, I get this error (after the model finishing training):

IndexError Traceback (most recent call last)
in ()
37 # Attention vector corresponds to the second matrix.
38 # The first one is the Inputs output.
---> 39 attention_vector = get_activations(m, testing_inputs_1, print_shape_only=True)[1].flatten()
40 print('attention =', attention_vector)
41

IndexError: list index out of range

Any idea why the get_activations function isn't working properly?

Visualizing attention weights with input arrays

When predicting on test data with the trained model, how can I visualize the attention weights? I'd like to study where the model designates as "important areas".

For reference, my input data is usually of shape (100, 900, 4) with 3 output classification options.

Thanks!

Output with multiple time steps

Hi,

Can this be used for predicting output with multiple time-steps?
If no, how can the code be changed to accommodate this? Thanks.

possible bug in attention_lstm.py

lines 56-59 should be

if APPLY_ATTENTION_BEFORE_LSTM:
  m = model_attention_applied_before_lstm()
else:
  m = model_attention_applied_after_lstm()

Is this attention is applicable for use with the encoder/decoder mechanism?

use attention_3d_block in many to many mapping

Hi, I'm beginner of Keras and tring to use attention_3d_block in translation module.
I have input of 5 sentences, each sentences has padding to 6 words, each word is presented in 620 dim(as embedding dim).
And the output is 5 sentences, sentences padding to 9 words, and word is presented in 1-of-k in 30 dim(as vocabulary size)
How to use attention_3d_block in this scenario as the LSTM is many to many?

get_config

Hi,
Perhaps do you have another implementation with the get_config function for saving the model in keras? I had been trying but I always get this error:
raise ValueError('A Dot layer should be called '

ValueError: A Dot layer should be called on a list of 2 inputs.

Thanks!

visualizing soft attention

How can we visualize the soft attention similar to the Bengio et al. paper?

TypeError: 'module' object is not callable

output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
At this line error happens:

output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
TypeError: 'module' object is not callable

Not sure what is wrong. Could you help to resolve?

attention when using more than one feature

Hi Philip
Your example of attention has 1 feature (2000, 20,1), my dataset has 60 features (200, 1000,60), in that case I have to do something different to what you do in your example?

Thank you!

What is the logic behind the attention layer?

Ik would like to understand intuitively or theoretically, how the attention layer reflects the attention of the model for a prediction?
Because it is easy for the model to give equal weight for each input feature in the attention layer, and that defeats the purpose of the attention layer.

One to One keras model with Attention in Keras

Hello,

I have a keras model that has sequence of inputs and sequence of outputs where each input has an associated output(Label). lets say (part of speech tagging (POS tagging)

Seq_in[0][0:3]
array([[15],[28], [23]])

Seq_out[0][0:3]
array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]],
dtype=float32)

I want to build attention on top of the lstm layer. I am following this work " Attention-Based Bidirectional Long Short-Term Memory Networks for
Relation Classification " Zhou et al, 2016

X_train, X_val, Y_train, Y_val = train_test_split(Seq_in,Seq_out, test_size=0.20)

TIME_STEPS = 500
INPUT_DIM = 1
lstm_units = 256

inputs = Input(shape=(TIME_STEPS, INPUT_DIM))

activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(inputs) # First laer bidirictional
activations = Dropout(0.2)(activations)
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(activations) # Second layer bidirectional
activations = Dropout(0.2)(activations)
attention = Dense(1,activation='tanh')(activations) # This is equation (9) in the paper. Squashing each output state vector to a scaler.
attention = Flatten()(attention)
attention = Activation('softmax')(attention) # This is equation (10) in the paper.
attention = RepeatVector(512)(attention) # Repeating the softmax vector to have the same dimintion as the output state vector (512)
attention = Permute([2,1])(attention) # permute

sent_representation = multiply([activations,attention]) # multiply the attention vector with the output state vector element-wise.
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-1))(sent_representation) # summation of all output state vectors
sent_representation = RepeatVector(TIME_STEPS)(sent_representation) # Repeat vector to be the same diminsion as the time steps
sent_representation = concatenate([activations,sent_representation]) # concatenate the sentence representation to the output states

output = Dense(15, activation='softmax')(sent_representation)#(out_attention_mul) # Find the softmax for the current label
model = Model(inputs=inputs, outputs=output)

sgd = optimizers.SGD(lr=.1,momentum=0.9,decay=1e-3,nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=2, validation_data=(X_val, Y_val),verbose=1)

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 500, 1) 0

bidirectional_1 (Bidirectional) (None, 500, 512) 528384 input_1[0][0]

dropout_1 (Dropout) (None, 500, 512) 0 bidirectional_1[0][0]

bidirectional_2 (Bidirectional) (None, 500, 512) 1574912 dropout_1[0][0]

dropout_2 (Dropout) (None, 500, 512) 0 bidirectional_2[0][0]

dense_1 (Dense) (None, 500, 1) 513 dropout_2[0][0]

flatten_1 (Flatten) (None, 500) 0 dense_1[0][0]

activation_1 (Activation) (None, 500) 0 flatten_1[0][0]

repeat_vector_1 (RepeatVector) (None, 512, 500) 0 activation_1[0][0]

permute_1 (Permute) (None, 500, 512) 0 repeat_vector_1[0][0]

multiply_1 (Multiply) (None, 500, 512) 0 dropout_2[0][0]
permute_1[0][0]

lambda_1 (Lambda) (None, 500) 0 multiply_1[0][0]

repeat_vector_2 (RepeatVector) (None, 500, 500) 0 lambda_1[0][0]

concatenate_1 (Concatenate) (None, 500, 1012) 0 dropout_2[0][0]
repeat_vector_2[0][0]

dense_2 (Dense) (None, 500, 15) 15195 concatenate_1[0][0]

Total params: 2,119,004
Trainable params: 2,119,004
Non-trainable params: 0

I think this code performs what the paper does, except that the concatenate step merges the attention weights to all the output state vectors and do not change them for each time step so for each output label.
So I think, for each time step output, I have to do something so the attention weights differ. Am I right?
Any help is appreciated

Thanks in advance

Loading model problems

When I'm trying to load a saved model, I get the following error. ! "A Dot layer should be called on a list of 2 inputs".

How to implement Multi-Hop Attention using Keras?

MultiHopAttention was proposed by Fackbook.

why Permute before attention dense layer in attention_3d_block?

    a = Permute((2, 1))(inputs)
    a = Dense(TIME_STEPS, activation='softmax')(a)

this line ,why do you permute times_tep and input_dim
what if I don't permute , and followed by a dense layer with input_dim ? since dense layer is with the shape of "time_Step *time_step" ,what is the difference when I change it to "input_dim * input_dim"
Dense(input_Dim activation='softmax')(a)

Is this Reshape step redundant?

See this line of code: https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py#L19

Isnt this redundant? Because the Permute layer right before it will reshape the Tensor.

Let me know if I'm missing something. I am trying to understand attention and thus far your writeup is helping

2D LSTM attention

Can we use the same code for 2D LSTM attention ?

SINGLE_ATTENTION_VECTOR = false

Do you have some reference paper, about SINGLE_ATTENTION_VECTOR = false ?

As far as I know, most of papers will set SINGLE_ATTENTION_VECTOR = true.

where is dense attention implementation ？

You code is outdated!

Your code doesn't fit to new versions of keras
To fix it change those strings in "attention_dense.py":

"from keras.layers import Input, Dense, merge" on "from keras.layers import Input, Dense,multiply";
"attention_mul = merge([inputs, attention_probs], output_shape=32, name='attention_mul', mode='mul')" on "attention_mul = multiply([inputs, attention_probs],name='attention_mul')" ;
and in "attention_lstm.py":
in"attention_lstm.py":
1. import multiply too;
1. "output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')" change on "output_attention_mul = multiply([inputs, a_probs], name='attention_mul')"

Restricting attention weights to domain

In my application, the attention weights are centering on locations which are indicative of a subset of the classes. Therefore, while the algorithm performs well on this subset, it sometimes misclassifies on the other classes because the attention weights cause the obvious differences to be considered "residual".

Is there a documented way of restricting the attention weights to a certain value or index domain to enforce constraints on its focus? This question makes me think of NLP problems where frameworks commonly pair ML methodologies with a set of predetermined rules (usually defined with spacy).

Any thoughts? Thanks in advance.

what is the meaning of the second parameter in dot([], [1, 1], name='context_vector')

Hi, Thanks for your awesome work.
I have a confusion about the code: context_vector = dot([hidden_states, attention_weights], [1, 1], name='context_vector')
What is the meaning of the second parameter?

Add guidance to README to use Functional API for saving models that use this layer

Hi there!

Thanks so much for implementing this and all of the other work that you do!

I ran in to an issue with loading a model uses this the Attention layer in a sequential model. However, the Attention layer is defined using the Function API and Keras does not like it when you try to load a mixed model.

Specifically, my error was

m = keras.models.load_model('saved_mixed_model_path',
            custom_objects = { 'Attention': Attention}
           )

=> ValueError: A merge layer should be called on a list of inputs.

To solve this, I had to convert my model to one that uses the functional API and retrain.

Part of my confusion stems from the examples where both the Sequential and Functional APIs are used. In this example you successfully save and load a model using only Functional API. But in this lstm example the Sequential API is used and no loading/saving is done.

Could a caveat be added to the README.md saying that if you plan to load/save these models, only the Functional API should be used when building the model that uses the Attention layer?

Cheers

weird attention weights when adding sequence of numbers.

I am trying to slightly modify your example of adding numbers such that the target is the sum of all the numbers in the sequence before delimiter. Below is the modified code

def add_numbers_before_delimiter(n: int, seq_length: int, delimiter: float = 0.0,
                                         index_1: int = None) -> (np.array, np.array):
    """
    Task: Add all the numbers that come before the delimiter.
    x = [1, 2, 3, 0, 4, 5, 6, 7, 8, 9]. Result is y =  6.
    @param n: number of samples in (x, y).
    @param seq_length: length of the sequence of x.
    @param delimiter: value of the delimiter. Default is 0.0
    @param index_1: index of the number that comes after the first 0.
    @return: returns two numpy.array x and y of shape (n, seq_length, 1) and (n, 1).
    """
    x = np.random.uniform(0, 1, (n, seq_length))
    y = np.zeros(shape=(n, 1))
    for i in range(len(x)):
        if index_1 is None:
            a = np.random.choice(range(1, len(x[i])), size=1, replace=False)
        else:
            a = index_1
        y[i] =  np.sum(x[i, 0:a])
        x[i, a] = delimiter

    x = np.expand_dims(x, axis=-1)
    return x, y


def main():
    numpy.random.seed(7)

    # data. definition of the problem.
    seq_length = 20
    x_train, y_train = add_numbers_before_delimiter(20_000, seq_length)
    x_val, y_val = add_numbers_before_delimiter(4_000, seq_length)

    # just arbitrary values. it's for visual purposes. easy to see than random values.
    test_index_1 = 4
    x_test, _ = add_numbers_before_delimiter(10, seq_length, 0, test_index_1)
    # x_test_mask is just a mask that, if applied to x_test, would still contain the information to solve the problem.
    # we expect the attention map to look like this mask.
    x_test_mask = np.zeros_like(x_test[..., 0])
    x_test_mask[:, test_index_1:test_index_1 + 1] = 1

    model = Sequential([
        LSTM(100, input_shape=(seq_length, 1), return_sequences=True),
        SelfAttention(name='attention_weight'),
        Dropout(0.2),
        Dense(1, activation='linear')
    ])

    model.compile(loss='mse', optimizer='adam')
    print(model.summary())

    output_dir = 'task_add_two_numbers'
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    max_epoch = int(sys.argv[1]) if len(sys.argv) > 1 else 200

    class VisualiseAttentionMap(Callback):

        def on_epoch_end(self, epoch, logs=None):
            attention_map = get_activations(model, x_test, layer_names='attention_weight')['attention_weight']

            # top is attention map.
            # bottom is ground truth.
            plt.imshow(np.concatenate([attention_map, x_test_mask]), cmap='hot')

            iteration_no = str(epoch).zfill(3)
            plt.axis('off')
            plt.title(f'Iteration {iteration_no} / {max_epoch}')
            plt.savefig(f'{output_dir}/epoch_{iteration_no}.png')
            plt.close()
            plt.clf()

    model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=max_epoch,
              batch_size=64, callbacks=[VisualiseAttentionMap()])


if __name__ == '__main__':
    main()

I was expecting the model to focus on all values in x_test sequence before index 4. However as you can see in gif, the model focuses on just one point. Can you please elaborate where I am mistaking?

Thank in advance.

Attention Mechanism not working

Hi,
I have added an attention layer (following the example) to my simple LSTM network shown below.

timestep = timesteps
features = 11
model = Sequential()
model.add(LSTM(64, input_shape=(timestep,features), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(Attention(32))
model.add(Dense(32))
model.add(Dense(16))
model.add(Dense(1))
print(model.summary())
The code worked fine up till last week and I got a summary of model having attention layer details like this:

However, now running the same code gives me a weird error.
ValueError: tf.function-decorated function tried to create variables on non-first call.

What I noticed is that the model summary has changed too:

I am tight on time due an upcoming deadline. Any assistance would be highly appreciated.
P.S. This was a fully working model that has stopped working all of a sudden for no apparent reason.

这个是CNN版本的attention吗？

这个是keras实现的CNN加上attention的代码吗？

pip install and numpy, keras packages are forced to be uninstalled

Hi,

As I install the keras-attention-mechanism to my conda3 by pip, the essential packages of numpy and keras are unexpectedly being uninstalled. Do you know why?

Bests,
Peiwan

IndexError: list index out of range

Dear sir: when I run python attention_dense.py ,the following errors show:

----- activations -----
Traceback (most recent call last):
File "attention_dense.py", line 39, in
attention_vector = get_activations(m, testing_inputs_1, print_shape_only=True)[1].flatten()
IndexError: list index out of range

would you please help me ?thank you very much!

Many to many sequence generation

Can you give an example of how to use this for many to many sequence generation with different input and output lengths (greater than 1)? For example, if we have input of 10 timesteps say [1,2,3,4,5,6,7,8,9,10] and we want to generate output [1,10].

attention_lstm.py and Tensorflow

In the attention_3d_block, I have some questions/bug (I think). I am running on Tensorflow.
(1) inputs doesn't have a shape method. So it crashes. I assume you meant to call the shape function on the numpy array on inputs_1.
(2) Is there a reason for calling Permute?
(3) What is the Reshape layer supposed to do? After the call to Permute, isn't the output of the previous permute layer already in shape (Batch Size, input_dim, TIME_STEPS)?
(4) The next call to Dense expects ndim =2, not 3. So the code crashes for me. I assume you meant the previous Reshape layer to map the 3d input to 2d?
(5) I would just like to point out that APPLY_ATTENTION_BEFORE_LSTM is False iff you call model_attention_applied_before_lstm.

Interpreting attention weights for more than one input features.

How can we get attention weights for each input feature when our input consists of multiple inputs?
I am getting only one array of attention weights and I am not sure how to interpret it for multiple inputs.

shape of attention weights (attached as fig) is:
(300, 6)
where 6 is the sequence_length/lookback steps/time steps.

How to visualise as 2dimensional heatmap?

lets say we are predicting with timestep of 24, and get 24 result as output. how can we visualise as heatmap like in https://github.com/datalogue/keras-attention

Questions on implementation details

Update on 2019/2/14, nearly one year later:

The implementation in this repo is definitely bugged. Please refer to my implementation in a reply below for correction. My version has been working in our product since this thread and it outperforms both vanilla LSTM without attention and the incorrect version in this repo by a significant margin. I am not the only one raising the question 1.

Both this repo and my version of attention are intended for sequence-to-one networks (although it can be easily tweaked for seq2seq by replacing h_t with current state of the decoder step). If you are looking for a ready-to-use attention for sequence-to-sequence networks, check this out: https://github.com/farizrahman4u/seq2seq.

============Original answer==============

I am currently working on a text generation task and learnt attention from TensorFlow tutorials. The implementation details seems quite different from your code.

This is how TensorFlow tutorial describes the process:

If I am understanding it correctly, all learnable parameters in the attention mechanism are stored in , which has a shape of (rnn_size, rnn_size) (rnn_size is the size of hidden state). So first you need to use to calculate the score of each hidden state based on the value of the hidden state and , but I am not seeing anywhere in your code. Instead, you applied a dense layer on all . And that means (Edit: h_t should be h_s in this equation) becomes the in the paper. This seems wrong.

In the next step you element-wise multiplies the attention weights with hidden states as equation (2). Then somehow missed the equation (3).

I noticed the tutorial is about Seq2Seq (Encoder-Decoder) model and your code is an RNN. Maybe that is why your code is different. Do you have any source on how attention is applied to a non Seq2Seq network?

Here is your code:

def attention_3d_block(inputs):
    # inputs.shape = (batch_size, time_steps, input_dim)
    input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)
    a = Reshape((input_dim, TIME_STEPS))(a) # this line is not useful. It's just to know which dimension is what.
    a = Dense(TIME_STEPS, activation='softmax')(a)
    if SINGLE_ATTENTION_VECTOR:
        a = Lambda(lambda x: K.mean(x, axis=1), name='dim_reduction')(a)
        a = RepeatVector(input_dim)(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    return output_attention_mul


def model_attention_applied_after_lstm():
    inputs = Input(shape=(TIME_STEPS, INPUT_DIM,))
    lstm_units = 32
    lstm_out = LSTM(lstm_units, return_sequences=True)(inputs)
    attention_mul = attention_3d_block(lstm_out)
    attention_mul = Flatten()(attention_mul)
    output = Dense(1, activation='sigmoid')(attention_mul)
    model = Model(input=[inputs], output=output)
    return model

papers using dense attention mechanism

Hello,

Is the dense attention mechanism based on a particulier paper?
Or are there papers using this mechanism?

fig

Hi, I am wondering the figures in your markdown.
What app you used to create these beautiful hand-written figures.
Thx

what do the h_t mean in the Attention model?

Hi there!
Thanks so much for implementing this and all of the other work that you do!
I wanna know the meaning of h_t，i.e h_t = Lambda(lambda x: x[:, -1, :], output_shape=(hidden_size,), name='last_hidden_state')(hidden_states) . Well, in Luong's paper the h_t was used as the input the hidden state. But how to explain it in a scene which is not seq2seq?

bucketing problem

My sequences have varying lengths and I’m using bucketing to solve the issue. Therefore I define the LSTM input shape as (None, None, features), i.e. there are no explicit timesteps. I wonder if the code can fit my input? Thanks.

Hiddent state parameter, what really should be passed?

Hi, thanks for the implementation!
I have been trying to implement this code
model = Sequential() model.add(Embedding(300000, 100, input_length=250)) model.add(LSTM(units=250, return_sequences=True, dropout=0.1, recurrent_dropout=0.2)) model.add(attention_3d_block( )) model.add(Flatten()) model.add(Dense(200, activation='relu')) model.add(Dense(3, activation='softmax'))

Error TypeError: attention_3d_block() missing 1 required positional argument: 'hidden_states'
I tried to explore the given documentation but I couldn't understand what really should be passed there.

get_activations use multi-input data, does not work.

Here is the error message

    layer_name='attention_vec')[0], axis=2).squeeze()
  File "/Users/yu/proj/cancel_blame/code/src/lib/attention/attention_utils.py", line 16, in get_activations
    layer_outputs = [func([inputs, 1.])[0] for func in funcs]
  File "/Users/yu/proj/cancel_blame/code/src/lib/attention/attention_utils.py", line 16, in <listcomp>
    layer_outputs = [func([inputs, 1.])[0] for func in funcs]
  File "/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2666, in __call__
    return self._call(inputs)
  File "/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2619, in _call
    dtype=tf.as_dtype(tensor.dtype).as_numpy_dtype))
AttributeError: 'list' object has no attribute 'dtype'

some confusions.

Hello , Thanks for an easy code to read. But i have some confusions.

your attention functions takes the hidden state of input i.e lstm outputs from encoders and then does all the processes then. but according to what I have read , it must form some kind of function with the hidden state of the target , like in the given picture . Why haven't you did that ? otherwise you are just making an lstm function manually.
Why have you used permute layers before softmax layer ?
why have you averaged the outputs of softmax layer ?

2D attention

@philipperemy

Do you know how I can apply the attention module to a 2D shaped input , I would like to apply to apply attention after the LSTM layer-

Layer (type)                    Output Shape         Param #     Connected to                     
features (InputLayer)           (None, 16, 1816)     0                                            
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 2048)         31662080    features[0][0]                   
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 1024)         2098176     lstm_1[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 1024)         0           dense_2[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 120)          123000      leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
feature_weights (InputLayer)    (None, 120)          0                                            
__________________________________________________________________________________________________
multiply_1 (Multiply)           (None, 120)          0           dense_3[0][0]                    
                                                                 feature_weights[0][0]            

Total params: 33,883,256
Trainable params: 33,883,256
Non-trainable params: 0
__________________________________________________________________________________________________

Would really appreciate your suggestion on how to modify attention_3D block to make it work for a 2D input as well. thanks.

Using attention with multivariate timeseries data

Hey, I' am trying to use attention with timeseries data that has more than 1 feature this leads to an incompatible shapes error. What changes do I make to get it to work?

Attention not working for MLP

I need to add attention to my following model. It works perfectly for LSTM model but I get the below error :

def get_ANN_attention_model(num_hidden_layers, num_neurons_per_layer, dropout_rate, activation_func, train_X):
    with tf.device('/gpu:0'):
        model_input = tf.keras.Input(shape=(train_X.shape[1]))  # input layer.
        for i in range(num_hidden_layers):
            x = layers.Dense(num_neurons_per_layer,activation=activation_func,bias_regularizer=L1L2(l1=0.0, l2=0.0001),activity_regularizer=L1L2(1e-5,1e-4))(model_input)
            x = layers.Dropout(dropout_rate)(x)
            x = Attention(num_hidden_layers)(x)
        outputs = layers.Dense(1, activation='linear')(x)
        model = tf.keras.Model(inputs=model_input, outputs=outputs)
        model.summary()
    return model

ERROR
hidden_size = int(hidden_states.shape[2])
File "C:\Users\bhask\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\tensor_shape.py", line 896, in getitem
return self._dims[key].value
IndexError: list index out of range

ask a problem about your code

in your code ,you want to pay more attention on the 10th step. your Experimental results also prove it.
But, your code seems not foucs on the 10th step. please read following code.
score_first_part = Dense(hidden_size, use_bias=False, name='attention_score_vec')(hidden_states) # score_first_part dot last_hidden_state => attention_weights # (batch_size, time_steps, hidden_size) dot (batch_size, hidden_size) => (batch_size, time_steps) h_t = Lambda(lambda x: x[:, -1, :], output_shape=(hidden_size,), name='last_hidden_state')(hidden_states) score = dot([score_first_part, h_t], [2, 1], name='attention_score')

the way you calculate ‘score’ is score_first_part dot h_t.
the way you get h_t : h_t = Lambda(lambda x: x[:, -1, :], output_shape=(hidden_size,), name='last_hidden_state') . in my view 'lambda x: x[:, -1, :]' means you choose the last step in the time sequence , in other word, you pay more attention on the 20th step.(in your code you define TIME_STEPS = 20).
so, if my understanding is right, you should change you code to be h_t = Lambda(lambda x: x[:, 9, :], output_shape=(hidden_size,), name='last_hidden_state') .
of course, my understanding perhaps wrong. i am lookingforward your reply .
thank you.

Thanks and regards.

How to do Stacked LSTM with attention using this framework ?

hello,

I have run your code successful.

I have also include stacked LSTM in your code :

def model_attention_applied_before_lstm():
    inputs = Input(shape=(TIME_STEPS, INPUT_DIM,))
    attention_mul = attention_3d_block(inputs)
    lstm_units = 32
    attention_mul = LSTM(lstm_units, return_sequences=True)(attention_mul)
    attention_mul = LSTM(lstm_units, return_sequences=False)(attention_mul)
    output = Dense(1, activation='sigmoid')(attention_mul)
    model = Model(input=[inputs], output=output)
    return model

But maybe this is not the correct way to apply staked LSTM with attention right ?

My ultimate goal is to include attention into this code (classification of multivariate time series ) :


class LSTMNet:
    @staticmethod
    def build(timeSteps,variables,classes):
        inputNet = Input(shape=(timeSteps,variables))
       lstm=Bidirectional(GRU(100,recurrent_dropout=0.4,dropout=0.4,return_sequences=True),merge_mode='concat')(inputNet) 
       lstm=Bidirectional(GRU(50,recurrent_dropout=0.4,dropout=0.4,return_sequences=True),merge_mode='concat')(lstm) 
        lstm=Bidirectional(GRU(20,recurrent_dropout=0.4,dropout=0.4,return_sequences=False),merge_mode='concat')(lstm) 
        # a softmax classifier
        classificationLayer=Dense(classes,activation='softmax')(lstm)
        model=Model(inputNet,classificationLayer)
        return model

Thanks in advance for any possible info

why add a Dense(64) layer after the attention layer

what's the point of adding another attention_mul = Dense(units=64)(attention_mul) ?

philipperemy / keras-attention Goto Github PK

keras-attention's Issues

Layer (type) Output Shape Param # Connected to

dense_2 (Dense) (None, 500, 15) 15195 concatenate_1[0][0]

Recommend Projects

Recommend Topics

Recommend Org