richliao / textclassifier Goto Github PK

Text classifier for Hierarchical Attention Networks for Document Classification

License: Apache License 2.0

Python 100.00%

text-classification hierarchical-attention-networks attention-mechanism recurrent-neural-networks convolutional-neural-networks

textclassifier's Introduction

textClassifier

textClassifierHATT.py has the implementation of Hierarchical Attention Networks for Document Classification. Please see the my blog for full detail. Also see Keras Google group discussion

textClassifierConv has implemented Convolutional Neural Networks for Sentence Classification - Yoo Kim. Please see the my blog for full detail.

textClassifierRNN has implemented bidirectional LSTM and one level attentional RNN. Please see the my blog for full detail.

update on 6/22/2017

To derive the attention weight which can be useful to identify important words for the classification. Please see my latest update on the post. All you need to do is run a forward pass right before attention layer output. The result is not very promising. I will update the post once I have further result.

This repo is forked from https://github.com/richliao/textClassifier and we find some issue here. So we update the textClassifierHATT with python 2.7 and keras 2.0.8

# clone the repo
git clone {repo address}

# install Dependent library
cd textClassifier
pip install -r req.xt

# download imdb train from Kaggle in the below link and keep the files in the working directory
https://www.kaggle.com/c/word2vec-nlp-tutorial/download/labeledTrainData.tsv
# download glove word vector
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip

# install nltk 'punkt' using the following code in python interpretor
>>>import nltk
>>>nltk.download('punkt')

# train the model
python textClassifierHATT.py

# note if in case while installing word2vec, cython error occurs then 
pip install --upgrade cython

Enjoy！

textclassifier's People

Contributors

Stargazers

Watchers

Forkers

bouthilx meshiguge yofayed syx528911137 allensmile lu839684437 adrianhust fangzheng354 leezqcst 460130107 wlcoolongs larab401 aartij17 scpei vangogh0318 yyp009 guodao raghavendranpm robi56 kelvict kurtli bofinbabu makai281 bountrisv namkhanhtran jasonhoou balajeeu michaelfeng87 dipanshawucr yinll314 xuerenlv kinect59 sawantsaurabh tqtg ritali yuanzhike akbari59 tangbogreat ramaswamym1987 melody-xiaomi joey234 suhagba yumy-yumy scutzck033 haiyansang wolfhu bekerov kinglai hypathia mbencherif deakkon colinsongf zhoujialinmumu martinomensio jim-bo wpmarinho anujk3 jkhlot dgai91 jianzhengming efratoio lakezhang bensnw pustar haobangpig danforce snehil rosssong fydlzr pranishd1 ilyeong-ai statdataanalyzer konglongteng njuhugn husk214 xujiahaha vamsilnm sjtucsly jammy112 neverorc shizhh grainw generalzh iqbal-chowdhury embedxj wangaiyou benhoyle premy990 yzx1992 royaljain changzhouli insomnia1996 coopertian deeptransformer pchoengtawee qitong ufukhurriyetoglu ahmadghizzawi saradhix rohithkodali

textclassifier's Issues

Incorporating this model into tensorflow project

I am currently working on a project to improve code analysis using deep learning model. Because my project is based on tensorflow, I found myself having to make significant code change to this textClassifier in order to incorporate it. It turns out that this model improved the accuracy rate substantially. I am ready to contribute code back. Is there a suggestion on the best way to approach it because my changes have diverged from the current code base?

textClassifierRNN.py:'TensorVariable' object has no attribute 'get_value'

When I implement the model in textClassifierRNN.py, I got an error and don't know how to fix it.

Traceback (most recent call last):
File "lstm_text_classification.py", line 241, in
model.fit(x_train, y_train, nb_epoch=15, shuffle=True, validation_split=0.1)
File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 1494, in fit
self._make_train_function()
File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 1018, in _make_train_function
self.total_loss)
File "/usr/local/lib/python3.4/dist-packages/keras/optimizers.py", line 416, in get_updates
shapes = [K.get_variable_shape(p) for p in params]
File "/usr/local/lib/python3.4/dist-packages/keras/optimizers.py", line 416, in
shapes = [K.get_variable_shape(p) for p in params]
File "/usr/local/lib/python3.4/dist-packages/keras/backend/theano_backend.py", line 1162, in get_variable_shape
return x.get_value(borrow=True, return_internal_type=True).shape
AttributeError: 'TensorVariable' object has no attribute 'get_value'

Do you know any solution? Thanks.

Non-exact Implementation of CNN Sentence Classifier

Hi Richard,

I just read the paper "Convolutional Neural Networks for Sentence Classification" and your blog as well as code.

It seems that your implementation is not the network described in the paper. According to the paper, the convolution layers are parallel. However, they are serial in your implementation.

You may re-implement you network to see which one achieves better performance.

NaN Loss Function

I am attempting to classify relatively large documents using HAN (~800 lines).
Currently, I am experiencing NaN loss from the first epoch, and I am struggling to debug it appropriately. The Google Groups page for this method discusses issues with masking, but as I understand this implementation should have no issues with that. Any reccomendations for debugging this loss overflow?

strange out of range

Using TensorFlow backend.
Traceback (most recent call last):
File "C:/Users/LawLi/PycharmProjects/first/attention/keras_att.py", line 181, in
l_att = AttLayer()(l_dense)
File "D:\ProgramData\Anaconda3\lib\site-packages\keras\engine\topology.py", line 596, in call
output = self.call(inputs, **kwargs)
File "C:/Users/LawLi/PycharmProjects/first/attention/keras_att.py", line 165, in call
eij = K.tanh(K.dot(x, self.W))
File "D:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 968, in dot
y_permute_dim = [y_permute_dim.pop(-2)] + y_permute_dim
IndexError: pop index out of range

any dataset for evaluation?

The Attention Layer Implementation missed the context vector

I don't see in anywhere does the implementation initialize a context vector u_w (Eq.6 in the original paper). I think this is missed by the author.

AttentionDecoder error

Hello,

I'm getting the following errors:"
ERROR (theano.gof.opt): SeqOptimizer apply <theano.scan_module.scan_opt.PushOutScanOutput object at 0x0B762E90>
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
(...)
raise theano.gof.InconsistencyError("Trying to reintroduce a removed node")
InconsistencyError: Trying to reintroduce a removed node"

while using the AttentionDecoder. Do you have any idea why?

Thanks in advance,

NameError: global name 'initializations' is not defined

On running this code I got the above error. How to resolve this error ?

I meet an error,I do not know how to solve it ,please help me to solve it.

Traceback (most recent call last):
File "textClassifierRNN_ga.py", line 187, in
create_model_gru_attention(texts_train_index,texts_test_index,label_train,label_test,word_index,embeddings)
File "textClassifierRNN_ga.py", line 174, in create_model_gru_attention
model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 915, in compile
sample_weight, mask)
File "/usr/local/lib/python3.4/dist-packages/keras/engine/training.py", line 436, in weighted
score_array = fn(y_true, y_pred)
File "/usr/local/lib/python3.4/dist-packages/keras/losses.py", line 49, in categorical_crossentropy
return K.categorical_crossentropy(y_pred, y_true)
File "/usr/local/lib/python3.4/dist-packages/keras/backend/theano_backend.py", line 1498, in categorical_crossentropy
return T.nnet.categorical_crossentropy(output, target)
File "/usr/local/lib/python3.4/dist-packages/theano/tensor/nnet/nnet.py", line 2070, in categorical_crossentropy
raise TypeError('rank mismatch between coding and true distributions')
TypeError: rank mismatch between coding and true distributions

some problem about data preprocess

there is a problem about the panda usage:

AttributeError: 'DataFrame' object has no attribute 'review'

at line 46, 55:
46: data_train = pd.read_csv('labeledTrainData.tsv', sep='\t')
55 :for idx in range(data_train.review.shape[0]):

variable "data_train" does not got attribute "review"

===========

I have looked for different of pandas and the problem is still not fixed
Any suggestion about this problem?

THANKS!!!

Consistency with the article (HATT)

I've seen some discussion about it, but I'm afraid I still don't get it:

The tanh activation is applied in the original paper over an MLP layer which accepts only the bilstm vector as input (eq 5).

Assuming self.W is the context vector in our case, then tanh is applied on the multiplication of the bilstm vector with the context vector (the Dense layer does not have activation of itself).

What is the explanation for this?
Thanks!

problem with textClassifierHATT.py, thank you

Traceback (most recent call last):
File "textClassifierHATT.py", line 203, in
l_dense_sent = TimeDistributed(Dense(200,input_shape=(200,)))(l_lstm_sent)
File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.4-py2.7.egg/keras/engine/topology.py", line 569, in call
self.assert_input_compatibility(inputs)
File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.4-py2.7.egg/keras/engine/topology.py", line 440, in assert_input_compatibility
str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer time_distributed_4: expected ndim=3, found ndim=4

CUDA GpuDnnSoftmax error while fitting the model

Hi,

I have compiled the HAN model, but while fitting it i am getting CUDA GpuDnnSoftmax error while fitting the model. Please help.

Complete stacktrace for the same:

('The following error happened while compiling the node', GpuDnnSoftmax{tensor_format='bc01', mode='channel', algo='accurate'}(GpuContiguous.0), '\n', 'nvcc returvi n status', 2, 'for cmd', '/usr/local/cuda-8.0/bin/nvcc -shared -O3 -Xlinker -rpath,/usr/local/cuda-8.0/lib64 -arch=sm_61 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-D_FORCE_INLINES,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/hom/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray -I/home/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray -I/usr/local/cuda-8.0/include -I/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/theano/gof -L/home/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray -L/usr/lib -o /home/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/tmp23jxHq/4532e6b24a81e914613fd46d9f2cfae2.so mod.cu -lcudart -lcublas -lcuda_ndarray -lcudnn -lpython2.7', "[GpuDnnSoftmax{tensor_format='bc01', mode='channel', algo='accurate'}(<CudaNdarrayType(float32, (False, False, True, True))>)]")

textClassifierRNN.py

File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 455, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 554, in call
output = self.call(inputs, **kwargs)
File "/home/l148/xuyang/workshop/EEGDNN/Motor imagery classification/seg_CSP_ConvLSTM_debug.py", line 112, in call
eij = K.tanh(K.dot(x,self.W))
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 838, in dot
y_permute_dim = [y_permute_dim.pop(-2)] + y_permute_dim
IndexError: pop index out of range

AttLayer has no attribute init

Hi,
When I am using your code, this is the error that I get:

    l_att = AttLayer()(l_dense)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 491, in __call__
    self.build(input_shapes[0])
  File "text_ClassifierHATT.py", line 222, in build
    self.W = self.init((input_shape[-1],))
AttributeError: 'AttLayer' object has no attribute 'init'

The performance is worse

Hi, I test the code on the 'labeledTrainData.tsv' dataset, 80% of the dataset is the testing data, and 20% of the dataset is the validation data.
I use theano backend. However, the performance is worse. The results are as follows.

Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 104s - loss: 0.6974 - acc: 0.5082 - val_loss: 0.6938 - val_acc: 0.5058
Epoch 2/10
20000/20000 [==============================] - 103s - loss: 0.6982 - acc: 0.5025 - val_loss: 0.6930 - val_acc: 0.5124
Epoch 3/10
20000/20000 [==============================] - 104s - loss: 0.6959 - acc: 0.5128 - val_loss: 0.6950 - val_acc: 0.5058
Epoch 4/10
20000/20000 [==============================] - 104s - loss: 0.6978 - acc: 0.4936 - val_loss: 0.6939 - val_acc: 0.4942
Epoch 5/10
20000/20000 [==============================] - 103s - loss: 0.6983 - acc: 0.4958 - val_loss: 0.6934 - val_acc: 0.4954
Epoch 6/10
20000/20000 [==============================] - 103s - loss: 0.6994 - acc: 0.5002 - val_loss: 0.7012 - val_acc: 0.4944
Epoch 7/10
20000/20000 [==============================] - 104s - loss: 0.6992 - acc: 0.4973 - val_loss: 0.6931 - val_acc: 0.5054
Epoch 8/10
20000/20000 [==============================] - 103s - loss: 0.6977 - acc: 0.5032 - val_loss: 0.6931 - val_acc: 0.4940
Epoch 9/10
20000/20000 [==============================] - 103s - loss: 0.6966 - acc: 0.5070 - val_loss: 0.6937 - val_acc: 0.4942
Epoch 10/10
20000/20000 [==============================] - 103s - loss: 0.6961 - acc: 0.5068 - val_loss: 0.7287 - val_acc: 0.4942

The best performance is pretty much still cap at 90.4%s that is reported in your website https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-HATN/.

I wonder know that whether the attention layer code in the following is right for the theano backend. The implentation of the attention layer is as follows:

class AttLayer(Layer):
def init(self, attention_dim):
self.init = initializers.get('normal')
self.supports_masking = True
self.attention_dim = attention_dim
super(AttLayer, self).init()

def build(self, input_shape):
    assert len(input_shape) == 3
    self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
    self.b = K.variable(self.init((self.attention_dim, )))
    self.u = K.variable(self.init((self.attention_dim, 1)))
    self.trainable_weights = [self.W, self.b, self.u]
    super(AttLayer, self).build(input_shape)

def compute_mask(self, inputs, mask=None):
    return mask

def call(self, x, mask=None):
    # size of x :[batch_size, sel_len, attention_dim]
    # size of u :[batch_size, attention_dim]
    # uit = tanh(xW+b)
    uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
    ait = K.dot(uit, self.u)
    ait = K.squeeze(ait, -1)

    ait = K.exp(ait)

    if mask is not None:
        # Cast the mask to floatX to avoid float64 upcasting in theano
        ait *= K.cast(mask, K.floatx())
    ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
    ait = K.expand_dims(ait)
    weighted_input = x * ait
    output = K.sum(weighted_input, axis=1)

    return output

def compute_output_shape(self, input_shape):
    return (input_shape[0], input_shape[-1])

Thanks a lot

y_permute_dim.pop(-2) pop index out of range

hi, this is an awesome implementation.
However, when I use AttLayer, I got an IndexError: pop index out of range.
It is found in eij = K.tanh(K.dot(x, self.W)).
Then, y_permute_dim =[y_permute_dim.pop(-2)] + y_permute_dim.
my keras is 2.1.5.
Thanks a lot

Why TimeDistributed layer before attention layer?

In HATT model, TimeDistributed layer has been adopted before both word-level and sentence-level attention layer. I'm confused what does TimeDistributed layer do in here? And in RNN model, there is no TimeDistributed layer before attention layer. What's the difference? Thank U!

ValueError: Dimensions must be equal, but are 15 and 100 for '{{node Equal}} = Equal[T=DT_BOOL, incompatible_shape_error=true](mask, SequenceMask/Less)' with input shapes: [?,15,100], [?,100,?].

File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1812, in _create_c_op
c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 15 and 100 for '{{node Equal}} = Equal[T=DT_BOOL, incompatible_shape_error=true](mask, SequenceMask/Less)' with input shapes: [?,15,100], [?,100,?].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/textClassifier/textClassifierHATT_tf2.py", line 196, in
l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 530, in call
return super(Bidirectional, self).call(inputs, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 926, in call
input_list)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1117, in _functional_construction_call
outputs = call_fn(cast_inputs, *args, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 644, in call
initial_state=forward_state, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 663, in call
return super(RNN, self).call(inputs, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 926, in call
input_list)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1117, in _functional_construction_call
outputs = call_fn(cast_inputs, *args, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 441, in call
inputs, initial_state, training, mask, row_lengths)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 501, in _defun_gru_call
**normal_gru_kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 785, in gru_with_backend_selection
function.register(defun_gpu_gru, **params)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 3239, in register
concrete_func = func.get_concrete_function(*args, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2939, in get_concrete_function
*args, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2906, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 764, in gpu_gru_with_fallback
is_sequence_right_padded(mask, time_major),
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\keras\layers\recurrent_v2.py", line 1594, in is_sequence_right_padded
return math_ops.reduce_all(math_ops.equal(mask, right_padded_mask))
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1614, in equal
return gen_math_ops.equal(x, y, name=name)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 3224, in equal
name=name)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\func_graph.py", line 593, in _create_op_internal
compute_device)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3485, in _create_op_internal
op_def=op_def)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1975, in init
control_input_ops, op_def)
File "E:\Anaconda35\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1815, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 15 and 100 for '{{node Equal}} = Equal[T=DT_BOOL, incompatible_shape_error=true](mask, SequenceMask/Less)' with input shapes: [?,15,100], [?,100,?].

Process finished with exit code 1

AttributeError: 'DataFrame' object has no attribute 'review'

Using TensorFlow backend.
(111, 1)
Traceback (most recent call last):
File "textClassifierHATT.py", line 55, in
for idx in range(data_train.review.shape[0]):
File "/root/anaconda3/envs/py2/lib/python2.7/site-packages/pandas/core/generic.py", line 3081, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'review'

Int to long

I have this issue, I use Windows.

What is the result on IMDB ?

mask zero and activation in HATT

we use this code to build our project, but we found the acc dropped. So , we review the code, and find the following issues.

This code did not implemented "mask" in the "AttLayer" class.
we believe "Dense layer" should implemented in the class "AttLayer", instead of using "Dense" out of the class
lost "Activation function" in the Dense layer

We made the above changes，and the acc increased by 4-5 percent from baseline in out task(text classification).

we give our "AttLayer" class, this input is the direct output from the GRU without an additional "Dense layer":

class AttLayer(Layer):
    def __init__(self, attention_dim):
        self.init = initializers.get('normal')
        self.supports_masking = True
        self.attention_dim = attention_dim
        super(AttLayer, self).__init__()

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
        self.b = K.variable(self.init((self.attention_dim, )))
        self.u = K.variable(self.init((self.attention_dim, 1)))
        self.trainable_weights = [self.W, self.b, self.u]
        super(AttLayer, self).build(input_shape)

    def compute_mask(self, inputs, mask=None):
        return mask

    def call(self, x, mask=None):
        # size of x :[batch_size, sel_len, attention_dim]
        # size of u :[batch_size, attention_dim]
        # uit = tanh(xW+b)
        uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)

        ait = K.exp(ait)

        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            ait *= K.cast(mask, K.floatx())
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        ait = K.expand_dims(ait)
        weighted_input = x * ait
        output = K.sum(weighted_input, axis=1)

        return output

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[-1])

ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (40261, 2, 2)

Exactly follow the procedure and run the code. But, there is an issue when train the model

ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (40261, 2, 2)

How to fix this one!

Problem with output of embedding

with batchsize 50 the output of embedding should be 50,15,100,100
for HATT but your code gives 50,100,100. It is taking only one sentence. So I think your whole implementation wrong. please check it.

Getting error on TimeDistributed()

hello I am getting following error on TimeDistributed() layer while running the code in colab, please help me

WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.nn.bias_add), but are not present in its tracked objects: <tf.Variable 'att_layer_2/b:0' shape=(100,) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.

NotImplementedError Traceback (most recent call last)
in
6
7 review_input = Input(shape=(MAX_SENTS, MAX_SENT_LENGTH), dtype='int32')
----> 8 review_encoder = TimeDistributed(sentEncoder)(review_input)
9 l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
10 l_att_sent = AttLayer(100)(l_lstm_sent)

1 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in compute_output_shape(self, input_shape)
840 raise NotImplementedError(
841 'Please run in eager mode or implement the compute_output_shape '
--> 842 'method on your layer (%s).' % self.class.name)
843
844 @doc_controls.for_subclass_implementers

NotImplementedError: Exception encountered when calling layer "time_distributed" (type TimeDistributed).

Please run in eager mode or implement the compute_output_shape method on your layer (TFOpLambda).

Call arguments received:
• inputs=tf.Tensor(shape=(None, 15, 100), dtype=int32)
• training=None
• mask=None

the run problem in textClassifierHATT

Hi, when I run the textClassifierHATT.py ,there shows a Dimensions problem show as blow:
l_att_sent = AttLayer(100)(l_lstm_sent)
Traceback (most recent call last):

File "", line 1, in
l_att_sent = AttLayer(100)(l_lstm_sent)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)

File "", line 31, in call
ait *= K.cast(mask, K.floatx())

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 862, in binary_op_wrapper
return func(x, y, name=name)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1129, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5357, in mul
"Mul", x=x, y=y, name=name)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3272, in create_op
op_def=op_def)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1790, in init
control_input_ops)

File "C:\Users\GangLyu\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1629, in _create_c_op
raise ValueError(str(e))

ValueError: Dimensions must be equal, but are 15 and 100 for 'att_layer_12/mul' (op: 'Mul') with input shapes: [?,15], [?,15,100].

ValueError

ValueError: Dimension 1 in both shapes must be equal, but are 100 and 10000. Shapes are [?,100] and [?,10000]. for 'bidirectional_20/while/Select' (op: 'Select') with input shapes: [?,10000], [?,100], [?,100].

Memory Problem

I have 32 GB for the memory, but when I tried your code, the program takes the whole memory after several batches. I am not sure why the program requires so large memories....
My environment is keras 2.06, python 3.6; with CPU mode.

Capturing attention weights and seeing which words contributed to the classification

I am not clear on how to actually see which words contributed to a particular classification. I have been able to get the weights from the sentence level attention layer but unclear on how to process this 3D matrix to get information on if the word contributed to the classification.

Code to Visualize Attention Weights

Need some help in writing the code to obtain and visualize the attention weights like that in the HAN paper (heat map). To obtain the attention weights, I'm currently thinking of obtaining the hidden representations of the GRUs (h_it) and then manually using h_it to compute the attention weights using the equations from the call function of the attention layer.

layer_name = 'GRU'
intermediate_layer_model = Model(input=model.input, output=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(input_variable)
h_it = intermediate_output
#use h_it from above to compute attention weights

If there is a more direct way (direct function call in Keras or some existing code available), it will be helpful.

pre-processing issue

With the way the pre-processing is written, it seems to corrupt the validation set. The word index is created using data from both the training and validation sets. When test samples come in, they won't have the same treatment. Does anyone else agree?

save model error

In textClassifierHATT.py, I try to save the model using the following callback:
mcp = ModelCheckpoint('HANmodel_weights.h5', monitor="val_acc", save_best_only=True, save_weights_only=False)
model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=20, batch_size=50, callbacks = [mcp])
But the following error occurred
...
RuntimeError: Unable to create link (Name already exists)

There are suggestions that non-unique layer names cause this problem here but I haven't seen any duplicate names in this model.

Dimensions must be equal, but are 15 and 100 for 'att_layer_2/mul' (op: 'Mul') with input shapes: [?,15], [?,15,100].

When I ran by using python textClassifierHATT.py in Anaconda, I got this error:
Using TensorFlow backend.
(25000, 3)
textClassifierHATT.py:56: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 56 of the file textClassifierHATT.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

text = BeautifulSoup(data_train.review[idx])
/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/keras_preprocessing/text.py:177: UserWarning: The nb_words argument in Tokenizer has been renamed num_words.
warnings.warn('The nb_words argument in Tokenizer '
Total 80568 unique tokens.
('Shape of data tensor:', (25000, 15, 100))
('Shape of label tensor:', (25000, 2))
Number of positive and negative reviews in traing and validation set
[10026. 9974.]
[2474. 2526.]
Total 400000 word vectors.
2018-11-20 09:49:08.166457: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "textClassifierHATT.py", line 188, in
l_att_sent = AttLayer(100)(l_lstm_sent)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/keras/engine/base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)
File "textClassifierHATT.py", line 167, in call
ait *= K.cast(mask, K.floatx())
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper
return func(x, y, name=name)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1131, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
"Mul", x=x, y=y, name=name)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1792, in init
control_input_ops)
File "/home/user/anaconda3/envs/py27_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1631, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 15 and 100 for 'att_layer_2/mul' (op: 'Mul') with input shapes: [?,15], [?,15,100].
What was going wrong？Can anyone help me？

Performance on Yelp 2015 (HAN)

I cannot get the same result as their paper said.
I used the same dataset (Download link: http://ir.hit.edu.cn/~dytang/paper/emnlp2015/emnlp-2015-data.7z), but can only get 68.5% on yelp 2015 (The paper said they can get 71%), is there any wrong with my parameters? Here are my parameters:
vocab_size: 49000 (Byte-Pair-Encoding with 50000 byte pairs; all tokens that appears no less than 5 times)
learning_rate: 0.001
max tokens in a sentence: 48 (over 95% sentences are shorter than 48 tokens)
max sentences in a document: 32 (over 95% docs are shorter than 32 sentences)
word_embedding_size: 300 (pre-trained with word2vec)
word_output_size: 128
sentence_output_size: 128
LSTM hidden_dim: 64
LSTM layer_num: 5
dropout_keep_prob: 0.8 (using tf.nn.dropout, add dropout after word_output and sentence_output)

Not able to train HAN because of the following error.

sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(GRU(100, return_sequences=True))(embedded_sequences)
l_att = AttLayer(100)(l_lstm)
sentEncoder = Model(sentence_input, l_att)

review_input = Input(shape=(MAX_SENTS, MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
l_att_sent = AttLayer(100)(l_lstm_sent)
preds = Dense(2, activation='softmax')(l_att_sent)
model = Model(review_input, preds)

model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])

print("model fitting - Hierachical attention network")

Error
ValueError: Dimensions must be equal, but are 15 and 100 for 'att_layer_10/mul' (op: 'Mul') with input shapes: [?,15], [?,15,100].

Hatt error

Training is fine. however I got a error after training, I don't know how to solve it. Anyone could share me a hand?
Epoch 10/10
21054/21054 [==============================] - 257s - loss: 0.3286 - acc: 0.8863 - val_loss: 0.3596 - val_acc: 0.8643

Traceback (most recent call last):
File "Hatt_modify.py", line 231, in
l_dense_sent = TimeDistributed(Dense(200))(l_lstm_sent)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 586, in call
self.assert_input_compatibility(inputs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 457, in assert_input_compatibility
str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer time_distributed_4: expected ndim=3, found ndim=4

Can you save model trained by a custom layer?

I have an issue with saving the trained model using attention layer. It looks like Keras can not save it, do we need to add anything else to the layer to make it save-able?

there is no one-layer MLP in attention Layer

Hello, Thank you for the excellent code for CNN,LSTM and HAN,I have learnt a lot from your code.
but, One question,
In the paper, I find there is a one-layer MLP in the attention layer, but I can not see this in your　implementd AttLayer, in the class AttLayer, there is only one context vector. Could you expain it for me?

Tokenization performed with validation data (HATT)

Thank you for your implementation! I have a question -in your code for the hierarchical attention model, you use both the training and validation data to form the tokenizer. Won't this bias your model?