Giter VIP home page Giter VIP logo

chainer-rnn-ner's Introduction

Note: This repository is part of the assignment given in Tohoku University - Information Communication Theory (情報伝達学) lecture.

Students were actually expected to do some feature engineering with CRFsuite but I personally preferred implementing RNN.

About

This is the implementation of Named Entitty Recognition (NER) model based on Recurrent Neural Network (RNN). The model is heavily inspired by following papers:

  • Chiu, Jason PC, and Eric Nichols. "Named entity recognition with bidirectional LSTM-CNNs." Transactions of the Association for Computational Linguistics 4 (2016): 357-370.
  • James Hammerton. "Named Entity Recognition with Long Short-Term Memory." CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 Pages 172-175
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami and Chris Dyer. "Neural Architectures for Named Entity Recognition." Proceedings of NAACL-HLT 2016, pages 260–270

Note that this repo is not re-implementation of these models.

The purpose of implementing models is to see how the performance improves when I complicate model architectures (e.g. LSTM --> Bidirectional LSTM --> Bidirectional LSTM with Character-Encoding)

I suppose that this model can be applied to other (sequential labeling) tasks, but I have not yet tried.

Model Details

Following models are implemented by Chainer.

Models with Cross Entropy as Loss Function

  • LSTM (Model.py/NERTagger)
  • Bi-directional LSTM (Model.py/BiNERTagger)
  • Bi-directional LSTM with Character-based encoding (Model.py/BiCharNERTagger)

Models with CRF Layer as Loss Function

This loss function is much better than simple cross entropy as it (latently) considers the restriction given to BIO tags.

  • LSTM (CRFModel.py/CRFNERTagger)
  • Bi-directional LSTM (CRFModel.py/CRFBiNERTagger)
  • Bi-directional LSTM with Character-based encoding (CRFModel.py/CRFBiCharNERTagger)

Requirements

Software

  • Python 3.*
  • Chainer 1.19 (or higher)

Resources

  • Pretrained Word Vector (e.g. GloVe)
    • The script will still work (and learn) without this, but the performance will significantly deteriorate. (Read papers for details)
  • CoNLL 2003 Dataset

Usage

Preprocessing

Place CoNLL datasets train dev test in data/ and run preprocess.sh. This converts raw datasets into model-readable json format.

Then run generate_vocab.py and generate_char_vocab.py to generate vocabulary files.

Training

  • Training the model without CRF layer: train_model.py
  • Training the model with CRF layer: train_crf_model.py

Both scripts have exact same options:

  usage: train_model.py [-h] [--batchsize BATCHSIZE] [--epoch EPOCH] [--gpu GPU]
                        [--out OUT] [--resume RESUME] [--test] [--unit UNIT]
                        [--glove GLOVE] [--dropout] --model-type MODEL_TYPE
                        [--final-layer FINAL_LAYER]

  optional arguments:
    -h, --help            show this help message and exit
    --batchsize BATCHSIZE, -b BATCHSIZE
                          Number of examples in each mini-batch
    --epoch EPOCH, -e EPOCH
                          Number of sweeps over the dataset to train
    --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
    --out OUT, -o OUT     Directory to output the result
    --resume RESUME, -r RESUME
                          Resume the training from snapshot
    --test                Use tiny datasets for quick tests
    --unit UNIT, -u UNIT  Number of LSTM units in each layer
    --glove GLOVE         path to glove vector
    --dropout             use dropout?
    --model-type MODEL_TYPE
                          bilstm / lstm / charlstm
    --final-layer FINAL_LAYER
                          loss function

Testing

  • Testing the model without CRF layer: predict.py
  • Testing the model with CRF layer: crf_predict.py

Options:

optional arguments:
  -h, --help            show this help message and exit
  --unit UNIT, -u UNIT  Number of LSTM units in each layer
  --glove GLOVE         path to glove vector
  --model-type MODEL_TYPE
                        bilstm / lstm / charlstm
  --model MODEL         path to model file
  --dev                 If true, use validation data

Do not forget to specify --model-type and model. (You need to give the path to trained model file)

The performance (Accuracy/Precision/F-Score) can be tested by conlleval.pl (not included in this repo.)

Results

Model CRF? Precision Recall F-Score
LSTM No 70.87 65.38 68.01
Bi-LSTM No 76.41 74.39 75.39
Bi-Char-LSTM No 84.93 81.65 83.26
LSTM Yes 75.49 77.17 76.32
Bi-LSTM Yes 79.71 81.49 80.59
Bi-Char-LSTM Yes 84.17 83.80 83.98

Learning Curves

Epoch vs. training data loss

epoch vs. training data loss

Epoch vs. Validation Data Loss

epoch vs. validation data loss

Epoch vs. Training Data Accuracy

epoch vs. training data accuracy

Epoch vs. Validation Data Accuracy

epoch vs. validation data accuracy

chainer-rnn-ner's People

Contributors

butsugiri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.