Giter VIP home page Giter VIP logo

character-level-cnn-pytorch's Introduction

[PYTORCH] Character-level Convolutional Networks for Text Classification

Introduction

Here is my pytorch implementation of the model described in the paper Character-level Convolutional Networks for Text Classification paper.

Datasets:

Statistics of datasets I used for experiments. These datasets could be download from link

Dataset Classes Train samples Test samples
AG’s News 4 120 000 7 600
Sogou News 5 450 000 60 000
DBPedia 14 560 000 70 000
Yelp Review Polarity 2 560 000 38 000
Yelp Review Full 5 650 000 50 000
Yahoo! Answers 10 1 400 000 60 000
Amazon Review Full 5 3 000 000 650 000
Amazon Review Polarity 2 3 600 000 400 000

Setting:

I almost keep default setting as described in the paper. For optimizer and learning rate, there are 2 settings I use:

  • SGD optimizer with initial learning rate of 0.01. The learning rate is halved every 3 epochs.
  • Adam optimizer with initial learning rate of 0.001.

Additionally, in the original model, one epoch is seen as a loop over batch_size x num_batch records (128x5000 or 128x10000 or 128x30000), so it means that there are records used more than once for 1 epoch. In my model, 1 epoch is a complete loop over the whole dataset, where each record is used exactly once.

Training

If you want to train a model with common dataset and default parameters, you could run:

  • python train.py -d dataset_name: For example, python train.py -d dbpedia

If you want to train a model with common dataset and your preference parameters, like optimizer and learning rate, you could run:

  • python train.py -d dataset_name -p optimizer_name -l learning_rate: For example, python train.py -d dbpedia -p sgd -l 0.01

If you want to train a model with your own dataset, you need to specify the path to input and output folders:

  • python train.py -i path/to/input/folder -o path/to/output/folder

You could find all trained models I have trained in link

Experiments:

I run experiments in 2 machines, one with NVIDIA TITAN X 12gb GPU and the other with NVIDIA quadro 6000 24gb GPU. For small and large models, you need about 1.6 gb GPU and 3.5 gb GPU respectively.

Results for test set are presented as follows: A(B):

  • A is accuracy reproduced here.
  • B is accuracy reported in the paper. I used SGD and Adam as optimizer, with different initial learning rate. You could find out specific configuration for each experiment in output/datasetname_scale/logs.txt, for example output/ag_news_small/logs.txt

Maximally, each experiment would be run for 20 epochs. Early stopping was applied with patience is set to 3 as default.

Size Small Large
ag_news 86.71(84.35) 88.13(87.18)
sogu_news 95.08(91.35) 94.90(95.12)
db_pedia 97.53(98.02) 97.60(98.27)
yelp_polarity 91.40(93.47) 93.50(94.11)
yelp_review 56.09(59.16) 58.93(60.38)
yahoo_answer 65.91(70.16) 64.93(70.45)
amazon_review 56.77(59.47) 59.01(58.69)
amazon_polarity 92.54(94.50) 93.85(94.49)

The training/test loss/accuracy curves for each dataset's experiments (figures for small model are on the left side) are shown below:

  • ag_news

  • sogou_news

  • db_pedia

  • yelp_polarity

  • yelp_review

  • yahoo! answers

  • amazon_review

  • amazon_polarity

You could find detail log of each experiment containing loss, accuracy and confusion matrix at the end of each epoch in output/datasetname_scale/logs.txt, for example output/ag_news_small/logs.txt

character-level-cnn-pytorch's People

Contributors

nhthang9x avatar uvipen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

character-level-cnn-pytorch's Issues

The accuracy always be 0.5

I have trained yelp_review_polarity_small model,and get accuracy 0.5,the output like that

Epoch: 1/20, Iteration: 1751/2188, Lr: 0.001, Loss: 0.6997024416923523, Accuracy: 0.4375
Epoch: 1/20, Iteration: 1801/2188, Lr: 0.001, Loss: 0.7156904935836792, Accuracy: 0.4765625
Epoch: 1/20, Iteration: 1851/2188, Lr: 0.001, Loss: 0.6968995928764343, Accuracy: 0.51953125
Epoch: 1/20, Iteration: 1901/2188, Lr: 0.001, Loss: 0.6940716505050659, Accuracy: 0.50390625
Epoch: 1/20, Iteration: 1951/2188, Lr: 0.001, Loss: 0.6941938400268555, Accuracy: 0.4609375
Epoch: 1/20, Iteration: 2001/2188, Lr: 0.001, Loss: 0.694003701210022, Accuracy: 0.50390625
Epoch: 1/20, Iteration: 2051/2188, Lr: 0.001, Loss: 0.6908363103866577, Accuracy: 0.52734375
Epoch: 1/20, Iteration: 2101/2188, Lr: 0.001, Loss: 0.7009902596473694, Accuracy: 0.46484375
Epoch: 1/20, Iteration: 2151/2188, Lr: 0.001, Loss: 0.6958677172660828, Accuracy: 0.484375
Epoch: 1/20, Lr: 0.001, Loss: 0.6933960318565369, Accuracy: 0.5
Epoch: 2/20, Iteration: 1/2188, Lr: 0.001, Loss: 0.6975014209747314, Accuracy: 0.4296875
Epoch: 2/20, Iteration: 51/2188, Lr: 0.001, Loss: 0.6942330002784729, Accuracy: 0.50390625
Epoch: 2/20, Iteration: 101/2188, Lr: 0.001, Loss: 0.6899641156196594, Accuracy: 0.5234375

And log.txt file

Model's parameters: {'alphabet': 'abcdefghijklmnopqrstuvwxyz0123456789,;.!?:\'"/\\|_@#$%^&*~`+-=<>()[]{}', 'max_length': 1014, 'feature': 'small', 'optimizer': 'adam', 'batch
_size': 256, 'num_epochs': 20, 'lr': 0.001, 'dataset': 'yelp_review_polarity', 'es_min_delta': 0.0, 'es_patience': 3, 'input': 'input/yelp_review_polarity_csv', 'output': 'ou
tput/yelp_review_polarity_small', 'log_path': 'tensorboard/char-cnn'}Epoch: 1/20
Test loss: 0.6933960318565369 Test accuracy: 0.5
Test confusion matrix:
[[    0 19000]
 [    0 19000]]

I don't know why, this is diffirent with your result, please help me, thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.