This project is designed to showcase the data and ML capabilities to deliver the solution for real-world problems, In this project, we collected the labelled data of the disaster relief agency and we create a beta app to categorise this event so that we can send messages to the appropriate disaster relief agency.
This is built on top of a Natural Language Processing (NLP) trained model, and training based on the various prelabed dataset. This project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight.
Projects includes
- Data pipeline to clean, balance, select the relavent entities.
- Saved the output of the data pipeline to the SQLite database.
- Used processed data, to create ML model for the disaster relief.
- Expose the model with the help of web app to classify messages in real time.
- Python 3.8
- Notebook: Jupyter Notebook
- SQLite database libraries: SQLalchemy
- Data analysis libraries: Pandas, Numpy
- Machine Learning libraries: Scikit-Learn
- Natural Language Processing libraries: NLTK
- Web App and Data Visualization libraries: Flask, Plotly
- app
| - template
| |- master.html # main page of web app
| |- go.html # classification result page of web app
|- run.py # Flask file that runs app
- data
|- disaster_categories.csv # data to process
|- disaster_messages.csv # data to process
|- process_data.py. # preprocessing of csv files
|- DisasterResponse.db # database to save clean data
- models
|- train_classifier.py
|- classifier.joblib # saved model
- notebook
|- etl_pipeline.ipynb # Note book of the ELT work
|- ml_pipeline.ipynb # Note book of ML work
- sceenshots
|- 1.png
|- 2.png
|- 3.png
- README.md
- LICENSE
- Clone the repository.
git clone https://github.com/iamkamleshrangi/disaster-response.git
- Proper conda/virtualenv enviroment ready with python3+.
- Install the necessary libraries provided in requirements.txt file.
- Follow the instructions provided in the next section.
Instructions:
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/
production: To deploy the project live, you need a cloud service provider like AWS, then run the app.py on the server and access the service with the help of server_public_ip:3001 url, however I suggest to run app on the nginx as server and postgres as database.
development: You can run it based on your requirement.
- The main page shows some graphs about training dataset
- Input as an example of a message
- Click Classify Message, the message will be categorized and the belonging message categories will highlighted in green.
- Udacity for proposing this project as part of the Data Science Nanodegree Program.
- Figure Eight for providing the data.