Giter VIP home page Giter VIP logo

deep_learning_nlp's Introduction

Deep Learning architectures for NLP

This repository contains Keras implementations of the architectures listed below. For a quick theoretical intro about Deep Learning for NLP, I encourage you to have a look at my notes.

Hierarchical Attention Network for Document Classification

A RAM-friendly implementation of the model introduced by Yang et al. (2016), with step-by-step explanations and links to relevant resources: https://github.com/Tixierae/deep_learning_NLP/blob/master/HAN/HAN_final.ipynb

In my experiments on the Amazon review dataset (3,650,000 documents, 5 classes), I reach 62.6% accuracy after 8 epochs, and 63.6% accuracy (the accuracy reported in the paper) after 42 epochs. Each epoch takes about 20 mins on my TitanX GPU. I deployed the model as a web app. As shown in the image below, you can paste your own review and visualize how the model pays attention to words and sentences.

Concepts covered

The notebook makes use of the following concepts:

  • batch training. Batches are loaded from disk and passed to the model one by one with a generator. This way, it's possible to train on datasets that are too big to fit on RAM.
  • bucketing. To have batches that are as dense as possible and make the most of each tensor product, the batches contain documents of similar sizes.
  • cyclical learning rate and cyclical momentum schedules, as in Smith (2017) and Smith (2018). The cyclical learning rate schedule is a new, promising approach to optimization in which the learning rate increases and decreases in a pre-defined interval rather than keeping decreasing. It worked better than Adam and SGD alone for me1.
  • self-attention (aka inner attention). We use the formulation of the original paper.
  • bidirectional RNN
  • Gated Recurrent Unit (GRU)

1There is more and more evidence that adaptive optimizers like Adam, Adagrad, etc. converge faster but generalize poorly compared to SGD-based approaches. For example: Wilson et al. (2018), this blogpost. Traditional SGD is very slow, but a cyclical learning rate schedule can bring a significant speedup, and even sometimes allow to reach better performance.

1D Convolutional Neural Network for short text classification

An implementation of (Kim 2014)'s 1D Convolutional Neural Network for short text classification: https://github.com/Tixierae/deep_learning_NLP/blob/master/cnn_imdb.ipynb

Drawing

2D CNN for image classification

Agreed, this is not for NLP. But an implementation can be found here https://github.com/Tixierae/deep_learning_NLP/blob/master/mnist_cnn.py. I reach 99.45% accuracy on MNIST with it.

Cite

If you use some of the code in this repository, please cite

@article{tixier2018notes,
  title={Notes on Deep Learning for NLP},
  author={Tixier, Antoine J.-P.},
  journal={arXiv preprint arXiv:1808.09772},
  year={2018}
}

deep_learning_nlp's People

Contributors

tixierae avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.