text-cnn-tensorflow's Introduction

text-cnn

This code implements Convolutional Neural Networks for Sentence Classification models.

Figure 1: Illustration of a CNN architecture for sentence classification

Requirements

Python 3.6
TensorFlow 1.4
hb-config (Singleton Config)
tqdm
requests
Slack Incoming Webhook URL

Project Structure

init Project by hb-base

.
├── config                  # Config files (.yml, .json) using with hb-config
├── data                    # dataset path
├── notebooks               # Prototyping with numpy or tf.interactivesession
├── scripts                 # download or prepare dataset using shell scripts
├── text-cnn                # text-cnn architecture graphs (from input to logits)
    ├── __init__.py             # Graph logic
├── data_loader.py          # raw_date -> precossed_data -> generate_batch (using Dataset)
├── hook.py                 # training or test hook feature (eg. print_variables)
├── main.py                 # define experiment_fn
├── model.py                # define EstimatorSpec
└── predict.py              # test trained model

Reference : hb-config, Dataset, experiments_fn, EstimatorSpec

Dataset : rt-polarity, Sentiment Analysis on Movie Reviews

Todo

apply embed_type
- CNN-rand
- CNN-static
- CNN-nonstatic
- CNN-multichannel

Config

example: kaggle_movie_review.yml

data:
  type: 'kaggle_movie_review'
  base_path: 'data/'
  raw_data_path: 'kaggle_movie_reviews/'
  processed_path: 'kaggle_processed_data'
  testset_size: 25000
  num_classes: 5
  PAD_ID: 0

model:
  batch_size: 64
  embed_type: 'rand'     #(rand, static, non-static, multichannel)
  pretrained_embed: "" 
  embed_dim: 300
  num_filters: 256
  filter_sizes:
    - 2
    - 3
    - 4
    - 5
  dropout: 0.5

train:
  learning_rate: 0.00005
  
  train_steps: 100000
  model_dir: 'logs/kaggle_movie_review'
  
  save_checkpoints_steps: 1000
  loss_hook_n_iter: 1000
  check_hook_n_iter: 1000
  min_eval_frequency: 1000
  
slack:
  webhook_url: ""   # after training notify you using slack-webhook

Usage

Install requirements.

pip install -r requirements.txt

Then, prepare dataset and train it.

sh prepare_kaggle_movie_reviews.sh
python main.py --config kaggle_movie_review --mode train_and_evaluate

After training, you can try typing the sentences what you want using predict.py.

python python predict.py --config rt-polarity

Predict example

python predict.py --config rt-polarity
Setting max_seq_length to Config : 62
load vocab ...
Typing anything :)

> good
1
> bad
0

Experiments modes

✅ : Working
◽ : Not tested yet.

✅ evaluate : Evaluate on the evaluation data.
◽ extend_train_hooks : Extends the hooks for training.
◽ reset_export_strategies : Resets the export strategies with the new_export_strategies.
◽ run_std_server : Starts a TensorFlow server and joins the serving thread.
◽ test : Tests training, evaluating and exporting the estimator for a single step.
✅ train : Fit the estimator using the training data.
✅ train_and_evaluate : Interleaves training and evaluation.

Tensorboard

tensorboard --logdir logs

Category Color

rt-polarity (binary classification)

kaggle_movie_review (multiclass classification)

Reference

Implementing a CNN for Text Classification in TensorFlow by Denny Britz
Paper - Convolutional Neural Networks for Sentence Classification (2014) by Y Kim
Paper - A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification (2015) Y Zhang

text-cnn-tensorflow's Issues

[Question] Is there anyway to speed up the prediction speed?

I'm trying to run prediction for a medium sized corpus (approx 100k records) & the prediction speed is quite slow. It processes approx 2-3 sentences per second. Is there any way to speed up the process?