Giter VIP home page Giter VIP logo

airalcorn2 / recurrent-convolutional-neural-network-text-classifier Goto Github PK

View Code? Open in Web Editor NEW
184.0 184.0 66.0 10 KB

My (slightly modified) Keras implementation of the Recurrent Convolutional Neural Network (RCNN) described here: http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745.

License: MIT License

Python 100.00%
classification deep-learning keras natural-language-processing nlp

recurrent-convolutional-neural-network-text-classifier's Introduction

Recurrent Convolutional Neural Network Text Classifier

My (slightly modified) Keras implementation of the Recurrent Convolutional Neural Network (RCNN) described here.

recurrent-convolutional-neural-network-text-classifier's People

Contributors

airalcorn2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

recurrent-convolutional-neural-network-text-classifier's Issues

something confuse about input

1

We shift the document to the right to obtain the left-side contexts.

  | left_context_as_array = np.array([[MAX_TOKENS] + tokens[:-1]])
  | # We shift the document to the left to obtain the right-side contexts.
  | right_context_as_array = np.array([tokens[1:] + [MAX_TOKENS]])

would you mind explain why is this? i think equation should like this:
left_context_as_array = np.array([token[:]]) and
right_context_as_array = np.array( [token[::-1]] ) .

About the performance

Hi,

Thank you for sharing the code. I ran your code using IMDB dataset, and the accuracy was 0.5. I wonder why this happened. Have you run the model on any dataset to test the performance? Thanks.

Performance on diffirent card.

Hello,I run your code successfully on my PC and I also find a strange thing.It seems like your code run well and faster on k80 than titan xp.Do you know the reasons? Looking forward for you answer!
thx!

different embedding must be used.

seems like the same embeddings are used for for all the right context, left context and the word embedding. If you look at the training section its stated clearly that these are different parameters

so it should probably something like


document      = Input(shape=(None,), dtype = "int32")

document_embedding       = Embedding(vocab_size, WORD_EMB_SIZE, weights=[initial_embeddings],
                                     input_length=DOC_SEQ_LEN, trainable=True)(document)
left_context_embedding   = Embedding(vocab_size, WORD_EMB_SIZE, weights=[left_context_embeddings],
                                     input_length=DOC_SEQ_LEN, trainable=True)(document)
right_context_embedding  = Embedding(vocab_size, WORD_EMB_SIZE, weights=[right_context_embeddings],
                                     input_length=DOC_SEQ_LEN, trainable=True)(document)

Question about equal(1) and equal(2)

Thanks for your code! However I have a question about the implementtation of equal(1) abd equal(2). The means of equal(1) and equal(2) is just a original RNN? And you instead it with LSTM?

TypeError: Expected int32, got list containing Tensors of type '_Message' instead

I get this error using tensorflow as backend.
Traceback (most recent call last):
File "test1.py", line 35, in
forward = LSTM(hidden_dim_1, return_sequences = True)(l_embedding) # See equation (1).
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/layers/recurrent.py", line 243, in call
return super(Recurrent, self).call(inputs, **kwargs)
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 558, in call
self.build(input_shapes[0])
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/layers/recurrent.py", line 1012, in build
constraint=self.bias_constraint)
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 391, in add_weight
weight = K.variable(initializer(shape), dtype=dtype, name=name)
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/layers/recurrent.py", line 1004, in bias_initializer
self.bias_initializer((self.units * 2,), *args, **kwargs),
File "/home/s/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1681, in concatenate
return tf.concat([to_dense(x) for x in tensors], axis)
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1075, in concat
dtype=dtypes.int32).get_shape(
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

TypeError: maketrans() takes exactly 2 arguments (1 given)

When running the code , I get this error:
TypeError: maketrans() takes exactly 2 arguments (1 given)

at this line of code:
text = text.strip().lower().translate(string.maketrans({key: " {0} ".format(key) for key in string.punctuation}))

How to resolve this ?

AttributeError: type object 'str' has no attribute 'maketrans'

You meant string.maketrans?

  File "recurrent_convolutional_keras.py", line 53, in <module>
    text = text.strip().lower().translate(str.maketrans({key: " {0} ".format(key) for key in string.punctuation}))
AttributeError: type object 'str' has no attribute 'maketrans'

does your code also support label data?

Hello!
your code is really perfect,but it seems like your project doen't support label text Classification,so i wonder how can I modify your code to support the supervised classfication?
thx!

NameError: global name 'backend' is not defined

Hi, I tried to run your code and found the following error:

File "recurrent_convolutional_keras.py", line 44, in <lambda>
    pool_rnn = Lambda(lambda x: backend.max(x, axis = 1), output_shape = (hidden_dim_2, ))(semantic) # See equation (5).
NameError: global name 'backend' is not defined

After installing backend by running pip install backend (and importing backend into recurrent_convolutional_keras.py), I got another error

File "recurrent_convolutional_keras.py", line 45, in <lambda>
    pool_rnn = Lambda(lambda x: backend.max(x, axis = 1), output_shape = (hidden_dim_2, ))(semantic) # See equation (5).
AttributeError: 'module' object has no attribute 'max'

Could you provide more information about what is backend module? Thanks

Train data

I don't know the structure of word2vec.gensim, could you give some explain?

input format

Hi,
In your code you don't specify how you embed the sentence / document and there's no use of word2vec , aside of indicating the embedding size. I imagine the part
doc_as_array = np.array([[1, 2, 3, 4]])
left_context_as_array = np.array([[MAX_TOKENS, 1, 2, 3]])
right_context_as_array = np.array([[2, 3, 4, MAX_TOKENS]])
is merely an example - but how do you actually go about embedding and feeding a sentence or document to the network?

Thanks

Getting error while running word2vec.gensi

hello,
I've tried to run your code but getting error as
FileNotFoundError: [Errno 2] No such file or directory: 'word2vec.gensim'

while running word2vec = gensim.models.Word2Vec.load("word2vec.gensim")

please help me rectify this error

training with multiple documents at once

Thanks you for posting this code on github. It is functional for me as is, but I was looking into performing batch training with this network. Do you know how to approach this if the documents being classified have variable lengths? I was considering padding the inputs to the same size, but since my documents have huge variations in length, many documents would be heavily padded. I am looking for a better approach.

model performance

I tried to recreate the model performance of the this network on the 20NewsGroups dataset. I used google's pretrained word2vec embedding with vector dimensionality of 300, which was the only difference to the paper outside of using LSTMs vs the paper's BD RNN's.

During the training, I tracked the validation set auc_roc.

epoch 1: roc-auc on val - 0.840
epoch 2: roc-auc on val - 0.995
epoch 3: roc-auc on val - 0.996
epoch 4: roc-auc on val - 0.998
epoch 5: roc-auc on val - 0.997
epoch 6: roc-auc on val - 0.998
epoch 7: roc-auc on val - 0.997

At model test time, after 7 epochs, my f1 macro score was 0.91, compared to 0.9649 reported by the paper.

Have you run into issues with the model matching the paper? If so, any ideas what could be causing the discrepancy? And do you think I am training this network for too many epochs?

Much appreciated!

‘recurrent’ in the paper doesn't seem to involve LSTM

Hi, thanks for sharing your implementation of the paper "RCNN for text classification". I cloned this repo and experimented on my text classification task, the performance didn't behave as expected. I am not sure it happened due to my data pre-processing or the model implementation.

And after looking through the paper, I found that Equation (1), (2) give the computation of cl and cr:
image

where cr and cl are the result of simply matrix multiplication and an activation function. But in this code, matrix multiplications are replaced with LSTM cells. Is that proved effective than the original one?

thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.