I created a tool that classifies Tripadvisors reviews into three categories: negative, neutral, positive.
- Created a tool that classifies Tripadvisors reviews into three categories (accuracy: 85%).
- Data collection on Kaggle website: https://www.kaggle.com/andrewmvd/trip-advisor-hotel-reviews.
- Exploratory Data analysis using Wordcloud.
- Text preprocessing: removed punctuations, stemming, word embedding ...
- Created a simple LSTM model.
- Testing model for inference on one sentence.
Python Version: 3.7
Data Collection: https://www.kaggle.com/andrewmvd/trip-advisor-hotel-reviews
The dataset contains 20k reviews crawled from Tripadvisor labeled from 1 star to 5 stars. Stop words have already been removed from the reviews. This dataset can be used to predict review rating or to explore key aspects that make hotels good or bad.
I wanted to use wordcloud to check if there is major differences in terms of frequences of use of words. The major differences between positive and neutral/negatives reviews is the presence of 'n't'. It seems that people mostly use 'n't' to showcase their dissatisfaction.
First of all, I removed all non alphabetic characters from the reviews, then I applied stemming. After that, I transformed each reviews to a sequence of integers using tokeniser from keras. Then, I used padding to make all sentences same length. Finally, I used word embedding my keras model.
I created a simple LSTM model.
Coming Soon!