johnb30 / py_crepe Goto Github PK

Keras implementation of the Crepe character-level convolutional neural net.

Python 100.00%

py_crepe's Introduction

py_crepe

This is an re-implementation of the Crepe character-level convolutional neural net model described in this paper. The re-implementation is done using the Python library Keras(v2.1.5), using the Tensorflow-gpu(v1.6) backend. Keras sits on top of Tensorflow so can make use of cuDNN for faster models.

Details

The model implemented here follows the one described in the paper rather closely. This means that the vocabulary is the same, fixed vocabulary described in the paper, as well as using stochastic gradient descent as the optimizer. Running the model for 9 epochs returns results in line with those described in the paper.

It's relatively easy to modify the code to use a different vocabulary, e.g., one learned from the training data, and to use different optimizers such as adam.

Dataset

https://github.com/mhjabreel/CharCNN/tree/master/data/ag_news_csv

Running

The usual command to run the model is (for tensorflow-gpu backend):

python main.py

Score

With current code it reaches similar level of accuracy stated on the paper. We get test accuracy above 0.88 after 10 epoch of running (with Adam(lr=0.001) optimizer)

Note

We had to specify the kernel_initializer for all the Convolution1D layers as RandomNormal(mean=0.0, stddev=0.05, seed=None) as with the default initializer for Keras(v2.1.5) the model cannot converge. I don't exactly know the reason behind this but the lesson learned is that: initialization matters!

py_crepe's People

Contributors

Stargazers

Watchers

py_crepe's Issues

Adapt for another Language

Hi,

I'm really new to MachineLearning though I'm very experienced and Computational Linguistics. I'd like to know, If I could adapt your python wrapper for Crepe for the German Language and how I would do that. Assuming I have Keras and all related Software installed, I'd like to know also how I can achieve a prediction for any unseen sentence with your Software? Can you give me an example for that?

It would be great!

Thank you in Advance!

Setting a type into the zeros matrix

I am trying to run the network using the "Amazon Review Polarity" database, but Python can't manage to create a cubic matrix having (400000, 1014, 69) dimensions, as it throws a MemoryError in this case. But only when I changed it from np.zeros((len(x), maxlen, vocab_size)) to np.zeros((len(x), maxlen, vocab_size), dtype=np.uint8) Python managed to create the cubic matrix, although it used more than 20 GB though.

Taking the fact that the matrix is a one-hot encoding representation, it only needs 0s and 1s and not float numbers (as it was being created before), can I safely assure that it will work correctly? I mean, I just changed the type from float to a smaller int, am I missing something?

How to check the model on random data after training?

Accuracy for the ag_news set stays at ~25%?

Hey there,

I've been looking at the Text Understanding from Scratch paper and am attempting to re-implement using Keras. I stumbled across your github during my research.

I cloned the repository and tried to run your code for the AG News dataset (downloaded from here: https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M)

It seems however the accuracy is plateaud at around 25%.

  Step: 100
	Loss: 1.38648176193. Accuracy: 0.254765625
  Step: 200
	Loss: 1.38646401644. Accuracy: 0.25125
  Step: 300
	Loss: 1.38643943111. Accuracy: 0.251510416667
  Step: 400
	Loss: 1.38645074219. Accuracy: 0.2502734375
  Step: 500
	Loss: 1.38647514749. Accuracy: 0.25084375
  Step: 600
	Loss: 1.38645878375. Accuracy: 0.251015625
  Step: 700
	Loss: 1.38649008172. Accuracy: 0.251238839286
  Step: 800
	Loss: 1.38650537863. Accuracy: 0.250751953125
  Step: 900
	Loss: 1.38651740736. Accuracy: 0.250251736111
Epoch 0. Loss: 1.38653714458. Accuracy: 0.250217014054
Epoch time: 0:24:32.763016. Total time: 0:24:35.527443

Epoch: 1
  Step: 100
	Loss: 1.38628291965. Accuracy: 0.255078125
  Step: 200
	Loss: 1.38648118854. Accuracy: 0.2527734375
  Step: 300
	Loss: 1.38653985461. Accuracy: 0.252526041667
  Step: 400
	Loss: 1.38657643467. Accuracy: 0.25287109375
  Step: 500
	Loss: 1.38657280207. Accuracy: 0.252375
  Step: 600
	Loss: 1.38655447821. Accuracy: 0.251770833333
  Step: 700
	Loss: 1.38658778497. Accuracy: 0.25125
  Step: 800
	Loss: 1.38658027261. Accuracy: 0.250205078125
  Step: 900
	Loss: 1.38657693717. Accuracy: 0.250052083333
Epoch 1. Loss: 1.3863426288. Accuracy: 0.249565972139
Epoch time: 0:26:23.999443. Total time: 0:51:02.722974

Epoch: 2
  Step: 100
	Loss: 1.38650754809. Accuracy: 0.25046875
  Step: 200
	Loss: 1.38648163974. Accuracy: 0.2524609375
  Step: 300
	Loss: 1.38651079893. Accuracy: 0.2515625
  Step: 400
	Loss: 1.38658412933. Accuracy: 0.2498828125
  Step: 500
	Loss: 1.38661288977. Accuracy: 0.248421875
  Step: 600
	Loss: 1.38656823456. Accuracy: 0.2479296875
  Step: 700
	Loss: 1.38656216434. Accuracy: 0.248359375
  Step: 800
	Loss: 1.38656521723. Accuracy: 0.24884765625
  Step: 900
	Loss: 1.38658299618. Accuracy: 0.248802083333
Epoch 2. Loss: 1.38656448523. Accuracy: 0.250651041667
Epoch time: 0:24:38.453619. Total time: 1:15:44.417696

Epoch: 3
  Step: 100
	Loss: 1.38659875035. Accuracy: 0.253203125
  Step: 200
	Loss: 1.38658913493. Accuracy: 0.250703125
  Step: 300
	Loss: 1.38660971284. Accuracy: 0.249505208333
  Step: 400
	Loss: 1.38659906924. Accuracy: 0.250078125
  Step: 500
	Loss: 1.38659796071. Accuracy: 0.25034375

I am using a newer version of Keras (2.0.3) also with the Tensorflow backend, but given I didn't modify your code in any other way (apart from path names for the training / test data), I am unclear as to why it's doing this.

I will do some further testing by running with the THEANO backend and an older version of Keras to see if I can replicate your results.

However, in the meantime, I am just curious if this something you encountered at all when you were writing this code?

At least its consistent with my own implementation, which also is converging on 25% accuracy and staying there. :|

issue about accuracy on test set

hello,
at first thank for your implementation

I just executed code for 10 epochs (default setting on main.py) and the accuracy that I get on test set was 25%, it predicts completely random not the accuracy on paper!

which part of code I should change for reproducing paper result?

it would be great to explain the reason

with regards

some problems in main.py

Dear john,
I have some problems about the main.py.
I can't understand the command: np.random.seed(0123) # for reproducibility
My python will send me an error about 0123, how can I do?
Thanks.

How long does it take to run another dataset?

Hello I am a student interested in cnn.
I'd like to run with another datasets on this paper.
However, the ag_news was calculated using gpu titan x with other keras code.
It took a long time.
How long has it took you to calculate?
If you used a different set of data, how did you solve the memory error?
And how long did it take?

Thank you very much..

Convergence

Hi,
thank you for this awesome keras implementation! I just tried to run the code example and the model doesn't seem to converge. After 7 epochs, the loss is:
Loss: 1.38647970024. Accuracy: 0.250000002667
I obtained the AG-news dataset from https://github.com/mhjabreel/CharCNN. The xt, yt matrices seem to be ok. Can someone recheck, if the model is converging? I tried both backends, tensorflow and theano with the same result.
Thanks in advance!

something about the output

Hi, thanks for your source code firstly.

And I found that in 'main.py', the output items include 'test_loss, test_accuracy', maybe it should be 'test_loss_avg, test_accuracy_avg'?