philipperemy / keras-attention Goto Github PK
View Code? Open in Web Editor NEWKeras Attention Layer (Luong and Bahdanau scores).
License: Apache License 2.0
Keras Attention Layer (Luong and Bahdanau scores).
License: Apache License 2.0
In the final visualization of the attention weights it says this is showing the attention over input dimensions but the x axis goes to the length of the time steps. So it is showing how important the time step is and not each feature. Shouldn't it be the other way? Where each x is a feature?
When I apply this to my own dataset it just says the most recent time steps are the most important.
Thanks for uploading this to github! Great for learning more about attention models. When I run attention_dense.py, however, I get this error (after the model finishing training):
IndexError Traceback (most recent call last)
in ()
37 # Attention vector corresponds to the second matrix.
38 # The first one is the Inputs output.
---> 39 attention_vector = get_activations(m, testing_inputs_1, print_shape_only=True)[1].flatten()
40 print('attention =', attention_vector)
41IndexError: list index out of range
Any idea why the get_activations function isn't working properly?
When predicting on test data with the trained model, how can I visualize the attention weights? I'd like to study where the model designates as "important areas".
For reference, my input data is usually of shape (100, 900, 4) with 3 output classification options.
Thanks!
Hi,
Can this be used for predicting output with multiple time-steps?
If no, how can the code be changed to accommodate this? Thanks.
lines 56-59 should be
if APPLY_ATTENTION_BEFORE_LSTM:
m = model_attention_applied_before_lstm()
else:
m = model_attention_applied_after_lstm()
Hi, I'm beginner of Keras and tring to use attention_3d_block in translation module.
I have input of 5 sentences, each sentences has padding to 6 words, each word is presented in 620 dim(as embedding dim).
And the output is 5 sentences, sentences padding to 9 words, and word is presented in 1-of-k in 30 dim(as vocabulary size)
How to use attention_3d_block in this scenario as the LSTM is many to many?
Hi,
Perhaps do you have another implementation with the get_config function for saving the model in keras? I had been trying but I always get this error:
raise ValueError('A Dot
layer should be called '
ValueError: A Dot
layer should be called on a list of 2 inputs.
Thanks!
How can we visualize the soft attention similar to the Bengio et al. paper?
output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
At this line error happens:
output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
TypeError: 'module' object is not callable
Not sure what is wrong. Could you help to resolve?
Hi Philip
Your example of attention has 1 feature (2000, 20,1), my dataset has 60 features (200, 1000,60), in that case I have to do something different to what you do in your example?
Thank you!
Ik would like to understand intuitively or theoretically, how the attention layer reflects the attention of the model for a prediction?
Because it is easy for the model to give equal weight for each input feature in the attention layer, and that defeats the purpose of the attention layer.
Hello,
I have a keras model that has sequence of inputs and sequence of outputs where each input has an associated output(Label). lets say (part of speech tagging (POS tagging)
Seq_in[0][0:3]
array([[15],[28], [23]])
Seq_out[0][0:3]
array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]],
dtype=float32)
I want to build attention on top of the lstm layer. I am following this work " Attention-Based Bidirectional Long Short-Term Memory Networks for
Relation Classification " Zhou et al, 2016
X_train, X_val, Y_train, Y_val = train_test_split(Seq_in,Seq_out, test_size=0.20)
TIME_STEPS = 500
INPUT_DIM = 1
lstm_units = 256
inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(inputs) # First laer bidirictional
activations = Dropout(0.2)(activations)
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(activations) # Second layer bidirectional
activations = Dropout(0.2)(activations)
attention = Dense(1,activation='tanh')(activations) # This is equation (9) in the paper. Squashing each output state vector to a scaler.
attention = Flatten()(attention)
attention = Activation('softmax')(attention) # This is equation (10) in the paper.
attention = RepeatVector(512)(attention) # Repeating the softmax vector to have the same dimintion as the output state vector (512)
attention = Permute([2,1])(attention) # permute
sent_representation = multiply([activations,attention]) # multiply the attention vector with the output state vector element-wise.
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-1))(sent_representation) # summation of all output state vectors
sent_representation = RepeatVector(TIME_STEPS)(sent_representation) # Repeat vector to be the same diminsion as the time steps
sent_representation = concatenate([activations,sent_representation]) # concatenate the sentence representation to the output states
output = Dense(15, activation='softmax')(sent_representation)#(out_attention_mul) # Find the softmax for the current label
model = Model(inputs=inputs, outputs=output)
sgd = optimizers.SGD(lr=.1,momentum=0.9,decay=1e-3,nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=2, validation_data=(X_val, Y_val),verbose=1)
input_1 (InputLayer) (None, 500, 1) 0
bidirectional_1 (Bidirectional) (None, 500, 512) 528384 input_1[0][0]
dropout_1 (Dropout) (None, 500, 512) 0 bidirectional_1[0][0]
bidirectional_2 (Bidirectional) (None, 500, 512) 1574912 dropout_1[0][0]
dropout_2 (Dropout) (None, 500, 512) 0 bidirectional_2[0][0]
dense_1 (Dense) (None, 500, 1) 513 dropout_2[0][0]
flatten_1 (Flatten) (None, 500) 0 dense_1[0][0]
activation_1 (Activation) (None, 500) 0 flatten_1[0][0]
repeat_vector_1 (RepeatVector) (None, 512, 500) 0 activation_1[0][0]
permute_1 (Permute) (None, 500, 512) 0 repeat_vector_1[0][0]
multiply_1 (Multiply) (None, 500, 512) 0 dropout_2[0][0]
permute_1[0][0]
lambda_1 (Lambda) (None, 500) 0 multiply_1[0][0]
repeat_vector_2 (RepeatVector) (None, 500, 500) 0 lambda_1[0][0]
concatenate_1 (Concatenate) (None, 500, 1012) 0 dropout_2[0][0]
repeat_vector_2[0][0]
Total params: 2,119,004
Trainable params: 2,119,004
Non-trainable params: 0
I think this code performs what the paper does, except that the concatenate step merges the attention weights to all the output state vectors and do not change them for each time step so for each output label.
So I think, for each time step output, I have to do something so the attention weights differ. Am I right?
Any help is appreciated
Thanks in advance
When I'm trying to load a saved model, I get the following error. ! "A Dot
layer should be called on a list of 2 inputs".
MultiHopAttention was proposed by Fackbook.
a = Permute((2, 1))(inputs)
a = Dense(TIME_STEPS, activation='softmax')(a)
this line ,why do you permute times_tep and input_dim
what if I don't permute , and followed by a dense layer with input_dim ? since dense layer is with the shape of "time_Step *time_step" ,what is the difference when I change it to "input_dim * input_dim"
Dense(input_Dim activation='softmax')(a)
See this line of code: https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py#L19
Isnt this redundant? Because the Permute layer right before it will reshape the Tensor.
Let me know if I'm missing something. I am trying to understand attention and thus far your writeup is helping
Can we use the same code for 2D LSTM attention ?
Do you have some reference paper, about SINGLE_ATTENTION_VECTOR = false ?
As far as I know, most of papers will set SINGLE_ATTENTION_VECTOR = true.
Your code doesn't fit to new versions of keras
To fix it change those strings in "attention_dense.py":
"from keras.layers import Input, Dense, merge" on "from keras.layers import Input, Dense,multiply";
"attention_mul = merge([inputs, attention_probs], output_shape=32, name='attention_mul', mode='mul')" on "attention_mul = multiply([inputs, attention_probs],name='attention_mul')" ;
and in "attention_lstm.py":
in"attention_lstm.py":
In my application, the attention weights are centering on locations which are indicative of a subset of the classes. Therefore, while the algorithm performs well on this subset, it sometimes misclassifies on the other classes because the attention weights cause the obvious differences to be considered "residual".
Is there a documented way of restricting the attention weights to a certain value or index domain to enforce constraints on its focus? This question makes me think of NLP problems where frameworks commonly pair ML methodologies with a set of predetermined rules (usually defined with spacy).
Any thoughts? Thanks in advance.
Hi, Thanks for your awesome work.
I have a confusion about the code: context_vector = dot([hidden_states, attention_weights], [1, 1], name='context_vector')
What is the meaning of the second parameter?
Hi there!
Thanks so much for implementing this and all of the other work that you do!
I ran in to an issue with loading a model uses this the Attention layer in a sequential model. However, the Attention layer is defined using the Function API and Keras does not like it when you try to load a mixed model.
Specifically, my error was
m = keras.models.load_model('saved_mixed_model_path',
custom_objects = { 'Attention': Attention}
)
=> ValueError: A merge layer should be called on a list of inputs.
To solve this, I had to convert my model to one that uses the functional API and retrain.
Part of my confusion stems from the examples where both the Sequential and Functional APIs are used. In this example you successfully save and load a model using only Functional API. But in this lstm example the Sequential API is used and no loading/saving is done.
Could a caveat be added to the README.md saying that if you plan to load/save these models, only the Functional API should be used when building the model that uses the Attention layer?
Cheers
I am trying to slightly modify your example of adding numbers such that the target is the sum of all the numbers in the sequence before delimiter. Below is the modified code
def add_numbers_before_delimiter(n: int, seq_length: int, delimiter: float = 0.0,
index_1: int = None) -> (np.array, np.array):
"""
Task: Add all the numbers that come before the delimiter.
x = [1, 2, 3, 0, 4, 5, 6, 7, 8, 9]. Result is y = 6.
@param n: number of samples in (x, y).
@param seq_length: length of the sequence of x.
@param delimiter: value of the delimiter. Default is 0.0
@param index_1: index of the number that comes after the first 0.
@return: returns two numpy.array x and y of shape (n, seq_length, 1) and (n, 1).
"""
x = np.random.uniform(0, 1, (n, seq_length))
y = np.zeros(shape=(n, 1))
for i in range(len(x)):
if index_1 is None:
a = np.random.choice(range(1, len(x[i])), size=1, replace=False)
else:
a = index_1
y[i] = np.sum(x[i, 0:a])
x[i, a] = delimiter
x = np.expand_dims(x, axis=-1)
return x, y
def main():
numpy.random.seed(7)
# data. definition of the problem.
seq_length = 20
x_train, y_train = add_numbers_before_delimiter(20_000, seq_length)
x_val, y_val = add_numbers_before_delimiter(4_000, seq_length)
# just arbitrary values. it's for visual purposes. easy to see than random values.
test_index_1 = 4
x_test, _ = add_numbers_before_delimiter(10, seq_length, 0, test_index_1)
# x_test_mask is just a mask that, if applied to x_test, would still contain the information to solve the problem.
# we expect the attention map to look like this mask.
x_test_mask = np.zeros_like(x_test[..., 0])
x_test_mask[:, test_index_1:test_index_1 + 1] = 1
model = Sequential([
LSTM(100, input_shape=(seq_length, 1), return_sequences=True),
SelfAttention(name='attention_weight'),
Dropout(0.2),
Dense(1, activation='linear')
])
model.compile(loss='mse', optimizer='adam')
print(model.summary())
output_dir = 'task_add_two_numbers'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
max_epoch = int(sys.argv[1]) if len(sys.argv) > 1 else 200
class VisualiseAttentionMap(Callback):
def on_epoch_end(self, epoch, logs=None):
attention_map = get_activations(model, x_test, layer_names='attention_weight')['attention_weight']
# top is attention map.
# bottom is ground truth.
plt.imshow(np.concatenate([attention_map, x_test_mask]), cmap='hot')
iteration_no = str(epoch).zfill(3)
plt.axis('off')
plt.title(f'Iteration {iteration_no} / {max_epoch}')
plt.savefig(f'{output_dir}/epoch_{iteration_no}.png')
plt.close()
plt.clf()
model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=max_epoch,
batch_size=64, callbacks=[VisualiseAttentionMap()])
if __name__ == '__main__':
main()
I was expecting the model to focus on all values in x_test
sequence before index 4
. However as you can see in gif, the model focuses on just one point. Can you please elaborate where I am mistaking?
Thank in advance.
Hi,
I have added an attention layer (following the example) to my simple LSTM network shown below.
timestep = timesteps
features = 11
model = Sequential()
model.add(LSTM(64, input_shape=(timestep,features), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(Attention(32))
model.add(Dense(32))
model.add(Dense(16))
model.add(Dense(1))
print(model.summary())
The code worked fine up till last week and I got a summary of model having attention layer details like this:
However, now running the same code gives me a weird error.
ValueError: tf.function-decorated function tried to create variables on non-first call.
What I noticed is that the model summary has changed too:
I am tight on time due an upcoming deadline. Any assistance would be highly appreciated.
P.S. This was a fully working model that has stopped working all of a sudden for no apparent reason.
这个是keras实现的CNN加上attention的代码吗?
Hi,
As I install the keras-attention-mechanism to my conda3 by pip, the essential packages of numpy and keras are unexpectedly being uninstalled. Do you know why?
Bests,
Peiwan
Dear sir: when I run python attention_dense.py ,the following errors show:
----- activations -----
Traceback (most recent call last):
File "attention_dense.py", line 39, in
attention_vector = get_activations(m, testing_inputs_1, print_shape_only=True)[1].flatten()
IndexError: list index out of range
would you please help me ?thank you very much!
Can you give an example of how to use this for many to many sequence generation with different input and output lengths (greater than 1)? For example, if we have input of 10 timesteps say [1,2,3,4,5,6,7,8,9,10] and we want to generate output [1,10].
In the attention_3d_block, I have some questions/bug (I think). I am running on Tensorflow.
(1) inputs doesn't have a shape method. So it crashes. I assume you meant to call the shape function on the numpy array on inputs_1.
(2) Is there a reason for calling Permute?
(3) What is the Reshape layer supposed to do? After the call to Permute, isn't the output of the previous permute layer already in shape (Batch Size, input_dim, TIME_STEPS)?
(4) The next call to Dense expects ndim =2, not 3. So the code crashes for me. I assume you meant the previous Reshape layer to map the 3d input to 2d?
(5) I would just like to point out that APPLY_ATTENTION_BEFORE_LSTM is False iff you call model_attention_applied_before_lstm.
How can we get attention weights for each input feature when our input consists of multiple inputs?
I am getting only one array of attention weights and I am not sure how to interpret it for multiple inputs.
shape of attention weights (attached as fig) is:
(300, 6)
where 6 is the sequence_length/lookback steps/time steps.
lets say we are predicting with timestep of 24, and get 24 result as output. how can we visualise as heatmap like in https://github.com/datalogue/keras-attention
Update on 2019/2/14, nearly one year later:
The implementation in this repo is definitely bugged. Please refer to my implementation in a reply below for correction. My version has been working in our product since this thread and it outperforms both vanilla LSTM without attention and the incorrect version in this repo by a significant margin. I am not the only one raising the question 1.
Both this repo and my version of attention are intended for sequence-to-one networks (although it can be easily tweaked for seq2seq by replacing h_t
with current state of the decoder step). If you are looking for a ready-to-use attention for sequence-to-sequence networks, check this out: https://github.com/farizrahman4u/seq2seq.
============Original answer==============
I am currently working on a text generation task and learnt attention from TensorFlow tutorials. The implementation details seems quite different from your code.
This is how TensorFlow tutorial describes the process:
If I am understanding it correctly, all learnable parameters in the attention mechanism are stored in , which has a shape of
(rnn_size, rnn_size)
(rnn_size
is the size of hidden state). So first you need to use to calculate the score of each hidden state based on the value of the hidden state
and
, but I am not seeing
anywhere in your code. Instead, you applied a dense layer on all
. And that means
(Edit: h_t should be h_s in this equation) becomes the
in the paper. This seems wrong.
In the next step you element-wise multiplies the attention weights with hidden states as equation (2). Then somehow missed the equation (3).
I noticed the tutorial is about Seq2Seq (Encoder-Decoder) model and your code is an RNN. Maybe that is why your code is different. Do you have any source on how attention is applied to a non Seq2Seq network?
Here is your code:
def attention_3d_block(inputs):
# inputs.shape = (batch_size, time_steps, input_dim)
input_dim = int(inputs.shape[2])
a = Permute((2, 1))(inputs)
a = Reshape((input_dim, TIME_STEPS))(a) # this line is not useful. It's just to know which dimension is what.
a = Dense(TIME_STEPS, activation='softmax')(a)
if SINGLE_ATTENTION_VECTOR:
a = Lambda(lambda x: K.mean(x, axis=1), name='dim_reduction')(a)
a = RepeatVector(input_dim)(a)
a_probs = Permute((2, 1), name='attention_vec')(a)
output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
return output_attention_mul
def model_attention_applied_after_lstm():
inputs = Input(shape=(TIME_STEPS, INPUT_DIM,))
lstm_units = 32
lstm_out = LSTM(lstm_units, return_sequences=True)(inputs)
attention_mul = attention_3d_block(lstm_out)
attention_mul = Flatten()(attention_mul)
output = Dense(1, activation='sigmoid')(attention_mul)
model = Model(input=[inputs], output=output)
return model
Hello,
Is the dense attention mechanism based on a particulier paper?
Or are there papers using this mechanism?
Hi, I am wondering the figures in your markdown.
What app you used to create these beautiful hand-written figures.
Thx
Hi there!
Thanks so much for implementing this and all of the other work that you do!
I wanna know the meaning of h_t,i.e h_t = Lambda(lambda x: x[:, -1, :], output_shape=(hidden_size,), name='last_hidden_state')(hidden_states) . Well, in Luong's paper the h_t was used as the input the hidden state. But how to explain it in a scene which is not seq2seq?
My sequences have varying lengths and I’m using bucketing to solve the issue. Therefore I define the LSTM input shape as (None, None, features), i.e. there are no explicit timesteps. I wonder if the code can fit my input? Thanks.
Hi, thanks for the implementation!
I have been trying to implement this code
model = Sequential() model.add(Embedding(300000, 100, input_length=250)) model.add(LSTM(units=250, return_sequences=True, dropout=0.1, recurrent_dropout=0.2)) model.add(attention_3d_block( )) model.add(Flatten()) model.add(Dense(200, activation='relu')) model.add(Dense(3, activation='softmax'))
Error TypeError: attention_3d_block() missing 1 required positional argument: 'hidden_states'
I tried to explore the given documentation but I couldn't understand what really should be passed there.
Here is the error message
layer_name='attention_vec')[0], axis=2).squeeze()
File "/Users/yu/proj/cancel_blame/code/src/lib/attention/attention_utils.py", line 16, in get_activations
layer_outputs = [func([inputs, 1.])[0] for func in funcs]
File "/Users/yu/proj/cancel_blame/code/src/lib/attention/attention_utils.py", line 16, in <listcomp>
layer_outputs = [func([inputs, 1.])[0] for func in funcs]
File "/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2666, in __call__
return self._call(inputs)
File "/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2619, in _call
dtype=tf.as_dtype(tensor.dtype).as_numpy_dtype))
AttributeError: 'list' object has no attribute 'dtype'
Hello , Thanks for an easy code to read. But i have some confusions.
your attention functions takes the hidden state of input i.e lstm outputs from encoders and then does all the processes then. but according to what I have read , it must form some kind of function with the hidden state of the target , like in the given picture . Why haven't you did that ? otherwise you are just making an lstm function manually.
Why have you used permute layers before softmax layer ?
why have you averaged the outputs of softmax layer ?
Do you know how I can apply the attention module to a 2D shaped input , I would like to apply to apply attention after the LSTM layer-
Layer (type) Output Shape Param # Connected to
features (InputLayer) (None, 16, 1816) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) (None, 2048) 31662080 features[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1024) 2098176 lstm_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 1024) 0 dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 120) 123000 leaky_re_lu_2[0][0]
__________________________________________________________________________________________________
feature_weights (InputLayer) (None, 120) 0
__________________________________________________________________________________________________
multiply_1 (Multiply) (None, 120) 0 dense_3[0][0]
feature_weights[0][0]
Total params: 33,883,256
Trainable params: 33,883,256
Non-trainable params: 0
__________________________________________________________________________________________________
Would really appreciate your suggestion on how to modify attention_3D block to make it work for a 2D input as well. thanks.
Hey, I' am trying to use attention with timeseries data that has more than 1 feature this leads to an incompatible shapes error. What changes do I make to get it to work?
I need to add attention to my following model. It works perfectly for LSTM model but I get the below error :
def get_ANN_attention_model(num_hidden_layers, num_neurons_per_layer, dropout_rate, activation_func, train_X):
with tf.device('/gpu:0'):
model_input = tf.keras.Input(shape=(train_X.shape[1])) # input layer.
for i in range(num_hidden_layers):
x = layers.Dense(num_neurons_per_layer,activation=activation_func,bias_regularizer=L1L2(l1=0.0, l2=0.0001),activity_regularizer=L1L2(1e-5,1e-4))(model_input)
x = layers.Dropout(dropout_rate)(x)
x = Attention(num_hidden_layers)(x)
outputs = layers.Dense(1, activation='linear')(x)
model = tf.keras.Model(inputs=model_input, outputs=outputs)
model.summary()
return model
ERROR
hidden_size = int(hidden_states.shape[2])
File "C:\Users\bhask\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\tensor_shape.py", line 896, in getitem
return self._dims[key].value
IndexError: list index out of range
in your code ,you want to pay more attention on the 10th step. your Experimental results also prove it.
But, your code seems not foucs on the 10th step. please read following code.
score_first_part = Dense(hidden_size, use_bias=False, name='attention_score_vec')(hidden_states) # score_first_part dot last_hidden_state => attention_weights # (batch_size, time_steps, hidden_size) dot (batch_size, hidden_size) => (batch_size, time_steps) h_t = Lambda(lambda x: x[:, -1, :], output_shape=(hidden_size,), name='last_hidden_state')(hidden_states) score = dot([score_first_part, h_t], [2, 1], name='attention_score')
the way you calculate ‘score’ is score_first_part dot h_t.
the way you get h_t : h_t = Lambda(lambda x: x[:, -1, :], output_shape=(hidden_size,), name='last_hidden_state') . in my view 'lambda x: x[:, -1, :]' means you choose the last step in the time sequence , in other word, you pay more attention on the 20th step.(in your code you define TIME_STEPS = 20).
so, if my understanding is right, you should change you code to be h_t = Lambda(lambda x: x[:, 9, :], output_shape=(hidden_size,), name='last_hidden_state') .
of course, my understanding perhaps wrong. i am lookingforward your reply .
thank you.
Hi,
how can i add the attention model fot keras image_ocr implementation
When I run the script attention_lstm.py, there is a problem in the line 17. Just like the following problem:
"input_dim=int(inputs.shape[2])"
"TypeError: int() argument must be a string or a number, not 'TensorVariable'"
Dear Sir,
Would it be possible to use this repo for CNN network also?
Thanks and regards.
hello,
I have run your code successful.
I have also include stacked LSTM in your code :
def model_attention_applied_before_lstm():
inputs = Input(shape=(TIME_STEPS, INPUT_DIM,))
attention_mul = attention_3d_block(inputs)
lstm_units = 32
attention_mul = LSTM(lstm_units, return_sequences=True)(attention_mul)
attention_mul = LSTM(lstm_units, return_sequences=False)(attention_mul)
output = Dense(1, activation='sigmoid')(attention_mul)
model = Model(input=[inputs], output=output)
return model
But maybe this is not the correct way to apply staked LSTM with attention right ?
My ultimate goal is to include attention into this code (classification of multivariate time series ) :
class LSTMNet:
@staticmethod
def build(timeSteps,variables,classes):
inputNet = Input(shape=(timeSteps,variables))
lstm=Bidirectional(GRU(100,recurrent_dropout=0.4,dropout=0.4,return_sequences=True),merge_mode='concat')(inputNet)
lstm=Bidirectional(GRU(50,recurrent_dropout=0.4,dropout=0.4,return_sequences=True),merge_mode='concat')(lstm)
lstm=Bidirectional(GRU(20,recurrent_dropout=0.4,dropout=0.4,return_sequences=False),merge_mode='concat')(lstm)
# a softmax classifier
classificationLayer=Dense(classes,activation='softmax')(lstm)
model=Model(inputNet,classificationLayer)
return model
Thanks in advance for any possible info
what's the point of adding another attention_mul = Dense(units=64)(attention_mul)
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.