This Git repository contains the code and documentation for an NLP project that focuses on classifying Yelp reviews into either 1-star or 5-star categories based on their text content. The primary goal of this project is to build an accurate text classification model using the TF-IDF technique and a machine learning pipeline.
- Data exploration and cleaning were performed to prepare the Yelp review dataset for analysis.
- Text data underwent preprocessing, including handling missing values and text cleaning.
- Text data was transformed into a numerical format using the TF-IDF (Term Frequency-Inverse Document Frequency) technique. This process converted text features into numerical vectors for machine learning.
- A machine learning pipeline was set up to streamline the classification process. This pipeline included data preprocessing, TF-IDF vectorization, and model training.
- Various classification algorithms were explored, and the model with the best performance was selected.
- The final model was capable of effectively predicting whether a Yelp review was either 1-star or 5-star based on the review text.
- The model's performance was assessed using key classification metrics, including precision, recall, and F1-score.
- The model achieved a precision of 66%, a recall of 81%, and an F1-score of 0.73, demonstrating a good balance between accuracy and sensitivity.
In conclusion, this NLP project successfully demonstrated the application of TF-IDF and a machine learning pipeline for Yelp review classification. The model's ability to classify reviews with a high recall rate (81%) while maintaining a reasonable precision (66%) and an F1-score of 0.73 indicates its effectiveness in distinguishing between 1-star and 5-star reviews. This project showcases the power of NLP techniques in sentiment analysis and text classification tasks, which can have practical applications in various industries, including e-commerce and customer feedback analysis.