- Libraries used for the project
- Objective
- File Descriptions
- Summary Of Models
- Instructions
- Charts
- Acknowledgements
Following python libraries:
- Collections
- Matplotlib
- NLTK
- NumPy
- Pandas
- Seaborn
- Sklearn
- Pipeline, train_test_split, GridSearchCV, LinearRegression, r2_score, classification_Report, accuracy_score, recall_score, precision_score, f1_score, TfidfVectorizer, MultiOutputClassifer, AdaBoostClassifier, GradientBoostingClassifier, BaseEstimatore, TransformerMixin, MLPClassifier
- SQLAlchemy
I used the Anconda python distribution with python 3.0
The objective of this project is to build a model and classify messages during a disaster. We have been given disaster twitter messages data set which have 36 pre-defined categories. With the help of the model, we can classify the message to these categories and send the message to the appropriate disaster relief agency. For example, we do not want Medical Help message to food agency as they wont be able to help the person in time.
This project will involve building an ETL pipeline and Machine Learning pipeline. Objective of this task is also multiclassification. We want one message to be classified to multiple categories if needed.
This data set is provided to us by [Figure Eight]((https://www.figure-eight.com/)
data:
- disaster_message.csv
- disaster_Categories.csv
- DisasterResponse.db
- ETL pipeline Preparation.ipynb
- process_data.py
models:
- train_classifer.py
- ML Pipeline Preparation.ipynb
app:
- templates
- go.html
- master.html
- run.py
We are using below model to classify messages
- In the first model, we use Tfidf vectorizer to transform messages and then we use Adaboost Classifier to classify messages
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/
- This data set is provided to us by [Figure Eight]((https://www.figure-eight.com/)
- test train split error https://datascience.stackexchange.com/questions/20199/train-test-split-error-found-input-variables-with-inconsistent-numbers-of-sam
- Pandas string split https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Series.str.split.html