Giter VIP home page Giter VIP logo

nlpzoo's Introduction

NLPzoo

A collection of the most popular Natural Language Processing algorithms, frameworks and applications (inspired by tensorlayer/RLzoo).

Structure

ML (Machine Learning)

  • This folder contains vanilla machine learning models classifiying sample text data. ml_models.py contains the main script used to compare the accuracy of the different models. Models include Naive Bayes and Support Vector Machines. Different vectorizers are also used with each type of model. So far this has not made a significant difference to the accuracy scores of a specific model.

RNN (Recurrent Neural Network)

  • RNNs and flavours of RNNs were the main stay approach for most natural language processing tasks including language modelling, named entity recognition, parts of speech tagging and sentence classification. From 2013 to 2015, Long Short-Term Memory (LSTM) models became the dominant approach. This has since been superseded by RNNs with attention, and CNNs.

Data (not present here)

  • disaster-tweets.csv contains tweets about real disasters and exaggerated 'fake' tweets which are not directly related to any disaster. This dataset is part of the "Real or Not? NLP with Disaster Tweets" Kaggle competition. Download here

  • shakespeare.txt contains the complete texts of William Shakespeare. Download here

  • airline-tweets.csv has tweets from 2015 from travellers expressing their feelings on their flying experience. Download here

  • cornell-movie-dialogs-corpus is the classic Natural Language Processing training dataset. It contains 220,579 conversation exchanges. Download here

  • ubuntu-dialogs-corpus dialogs taken from online chat forums on the topic of Ubuntu. Download here

TODO

The plan for the coming weeks is as follows.

  • Add more models to ML folder.
  • Include Deep Learning models for different varieties of neural networks.
  • Demonstration and implementation of baseline LSTM model to compare with BERT
  • Demonstration and implementation of the BERT language model
  • Other models (order to be decided later): UNILM, MASS, BART

Sources

nlpzoo's People

Contributors

umerhasan17 avatar

Stargazers

Ivan Ereshchenko avatar Jingqing Zhang avatar  avatar

Watchers

James Cloos avatar Jingqing Zhang avatar  avatar

Forkers

cannarin025

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.