This is a Tensorflow-backed Keras model that predicts which tweets are about real disasters and which ones are not. It's derived from the popular Basic EDA,Cleaning and GloVe Kaggle Notebook.
The project is structured with whisk, an ML project framework that make makes collaboration, reproducibility, and deployment "just work".
Besides Tensorflow+Keras, the project uses DVC to version control the data download and training stages. As the training stage takes ~20 minutes on a laptop, this can save a significant amount of time when bootstrapping the project.
You can use the trained model in two different ways:
Install this model via pip:
pip install git+https://github.com/whisk-ml/disaster_tweets/
See the quickstart section for usage info.
Click the button below to deploy the Flask web service to Heroku. See app/README.md
for the HTTP API.
You can checkout the project source code and run the model, notebooks, Flask app, and more. There are two options:
The project comes with a devcontainer.json file for the default workspace configuration. This runs the project setup commands - no extra configuration is required.
When using the terminal, activate the project venv first:
source venv/bin/activate
See the quickstart section for usage info.
The following is required to run this project:
- Git
- Python 3.6+
- A Linux-based OS
After cloning this repo and cd disaster_tweets
run the following in your terminal:
pip install whisk
whisk setup
source venv/bin/activate
whisk dvc setup
dvc pull
The commands above install whisk, setup the project environment, activate the created venv, setup dvc, and download data stored in DVC.
See the quickstart section for usage info.
After installing pip or running setup, invoke the model from the command line:
disaster_tweets predict "Theyd probably still show more life than Arsenal did yesterday, eh? EH?"
0.19104013
disaster_tweets predict "Just happened a terrible car crash"
0.658098
Use within Python:
from disaster_tweets.models.model import Model
model = Model()
model.predict(["Theyd probably still show more life than Arsenal did yesterday, eh? EH?"])
If you checked out the project source code you can run the DVC stages.
Run the training stage:
dvc repro train.dvc
Run the download stages:
dvc repro download_dataset.dvc
dvc repro download_glove.dvc
To learn more about whisk, here are a few helpful doc pages: