Giter VIP home page Giter VIP logo

kaggle-movie-review-sentiment's Introduction

Non-deep-learning Sentiment Classification

This analysis uses non-deep-learning algorithms to identify the sentiment of movie reviews.

Description

The dataset is published on Kaggle as a word2vec tutorial. The tutorial also used a random forest classifier. I was curious of the performance of less computationally expensive algorithms so I chose logistic regression and Naive Bayes regression.

My text preprocessing was similar to the tutorial (HTML tags, punctuation, numbers and stopwords were removed) except that I added lemmatization. As to feature construction, I trained a Bag of Words model similar to the tutorial; in addition I used feature hashing to reduce feature space size - going back to my goal to experiment with computationally inexpensive methods.

Benchmark

Researchers have found that different individuals do not always agree on the sentiment polarity (positive/negative/neutral) of a phrase or a sentence. This study by Wilson et. al. found a 82% agreement between two individuals in the assignment of phrase-level sentiment polarity. As a result I expect a good algorithm to have approximately 80% sentiment-scoring accuracy.

At the time of writing the highest AUC score on the public leaderboard is 0.99259 but the leaderboard does not show accuracy. My simple models have ~0.93 AUC and ~0.87 accuracy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.