An iPython notebook that tests Graphify's feature extraction and selection algorithm as a logistic regression classifier. This classifier is benchmarked against Stanford's Large Movie Review Dataset and Cornell Movie Review Dataset.
-
The slides are at https://github.com/kbastani/sentiment-analysis-movie-reviews/tree/master/pdf
-
The interactive notebooks are in the main folder.
- Scores
~90%
accuracy on Cornell Movie Review dataset. - Scores
~80%
accuracy on Stanford Large Movie Review dataset.
###Feature learning
- Features are extracted and learned using Java and Neo4j, and evaluated by building a logistic regression classifier on a weighted tf-idf feature vector.
The content of the notebooks can be viewed online through nbviewer.ipython.org.
For a true interactive use of the notebooks you need to install Python, IPython (for notebooks) and the required libraries scikit-learn, matplotlib and numpy.
You can install everything at once using a complete scientific Python distribution. Two good ones are the Enthought Python distribution (EPD, free for academic use) or Python-(x, y) (free for everyone).
For OS X, you can also use the Enthought Python distribution or the scipy-superpack.
Just use your package manager, for example on ubuntu or debian, use
apt-get install python ipython python-matplotlib python-numpy python-sklearn
.
You need to make sure to have at least IPython >= 0.11 installed. You can update using the programm easy_install
.
More tips on installing scikit-learn can be found on the scikit-learn website.
This repository was modeled off of tutorial_ml_gkbionics.