This project is designed to showcase the data and ML capabilities to deliver the solution for real-world problems, In this project, we collected data label data from kaggle in two CSV, first contains legitimate news and other csv contains fake news.
This is built on top of a Natural Language Processing (NLP) trained model, and training based on the various prelabed dataset on the the legitimate and fake news.
Projects includes
- Data pipeline to clean, balance, select the relavent entities.
- Saved the output of the CSV files for further process.
- Used processed data, to create ML model for the fake classification.
- Save the model for the classficication.
- Python 3.8
- Notebook: Jupyter Notebook
- Data analysis libraries: Pandas, Numpy
- Machine Learning libraries: Scikit-Learn
- Natural Language Processing libraries: NLTK
- Clone the repository.
git clone [email protected]:iamkamleshrangi/fake_news_classifier.git
- Proper conda/virtualenv enviroment ready with python3+.
- Install the necessary libraries provided in requirements.txt file.
- Follow the instructions provided in the next section.
Instructions:
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL & ML pipeline that cleans data.
python model/classifier.py
- The will result in output will be a classifier.
- You may need good machine to run the process.
- Kaggle for proposing idea and data set for the