Giter VIP home page Giter VIP logo

sentiment-analysis-of-amazon-review-data's Introduction

Sentiment-Analysis-of-Amazon-review-data

using the dataset from the Amazon Reviews Kaggle competition. The goal is to perform sentiment analysis to determine whether a review is positive or negative using a classifier in python for sentiment analysis on Amazon reviews.

Sentiment analysis is often used to derive the emotion / opinion expressed in a text

The goal of this project is to conduct sentiment analysis on Amazon product reviews using machine learning techniques.

The trained model was to be used to predict users’ sentiment based on their online reviews.

Sentiment Analysis on Amazon Product Reviews data

Part 1. Data Exploration

The dataset consisted of 400 thousand reviews of products from amazon.com.

The data set has the following fields : Text – The review data Label – Binary label (positive/negative)

Below are some summary statistics about the data: Total number of reviews: 399939 Number of positive reviews:199968 Number of negative reviews:199971 alt text

Part 2. Data Preparation

The dataset was separated into test and training data as follows: every 5th sample belongs to test data, the remaining samples belong to training data.

Text pre-processing is needed to convert raw reviews into cleaned review. Necessary steps include conversion to lowercase, removal of non-characters, removal of stop words, removal of html tags.

The first main step involved in text classification is to find a word embedding to convert text into numerical representations. I have used frequency based embedding model for the same. I have implemented CountVectorizer in sklearn to compute occurrence counting of words I have also implemented TfidfVectorizer in sklearn to compute tf-idf weighted counting. alt text alt text Once we have numerical representations of the text data, we are ready to fit the feature vectors to supervised learning algorithms

Part 4. Word2Vec

Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus.

Part 5. Machine Learning algorithm

I have constructed the following models for evaluation :

Multinomial Naïve bias Neural Networks Decision Tree

Implementation :

Fit feature vectors to supervised learning algorithm using Multinomial Naïve bias, Neural Networks and Decision Tree in sklearn Load pre-trained model and predict the sentiment of the new data. alt text

multinomial naïve bayes algorithm .

Naive Bayes is a simple and due to its simplicity, this algorithm might outperform more complex models when the data set isn’t large enough and the categories are kept simple. Given estimates of parameters calculated from the training documents, classification is performed on test documents by calculating the posterior probability of each class given the evidence of the test document, and selecting the class with the highest probability. We formulate this by applying Bayes’ rule: P(cj |di; ˆθ) = P(cj |ˆθ)P(di|cj; ˆθj) P(di|ˆθ).

Classification report
			 precision  	  recall 	Accuracy
Naïve bayes            	 0.85   	   0.85    	 0.8483

Neural Network	         0.90     	   0.90	  	 0.9002

Decision Tree	         0.75      	   0.74  	 0.7424

alt text

Part 6.visualisation

used plotpy for visualising the results and some analysis of the data. alt text alt text alt text alt text alt text

sentiment-analysis-of-amazon-review-data's People

Contributors

jagruthisprabhudev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.