Personal project to demonstrate ML pipeline. This project uses the Movielens dataset as an example.
Tech stack:
- Prefect for workflow orchestration
- Google Cloud Storage and BigQuery
- BentoML for serving ML model
- Fastai and Pytorch for training collaborative filtering model
- Data pipeline
- Training pipeline
- Experiment tracking
- Model registry
- Model serving
- Testing
- CI/CD
- Docker
- Install dependencies
pip install -r requirements.txt
- Start the prefect server and agent in a docker container. This will also start a MinIO server to store flow code
make prefect
- Export wandb API key
export WANDB_API_KEY=***
or put your key in a .env
file
- Build model server. This will download the model weight from the model registry to build and containerize the model server:
make bento
- Install pre-commit hook for auto-formatting and linting:
pre-commit install