This is a template cookiecutter project for bootstrapping your work on ML data science projects. It contains :
- a directory structure for sorting your notebooks, data, models, figures, tasks and source code to reuse in notebooks
- a conda environment file with the basic python libraries and some extras :
- numpy / pandas / scikit-learn / seaborn / statsmodels / plotly / jupyterlab classic Data Science stack
- streamlit for building and run top to bottom data apps
- pyspark and h2o for distributed processing
- pandas-profiling for generating HTML reports on pandas dataframes
- missingno for missing data analysis
- invoke as a replacement to
Makefile
for managing project tasks - nbdime for diffing and merging notebooks
- kaggle-api a CLI for interacting with Kaggle API
- keras and lightgbm for prediction
- path.py for browsing files in Python
- Anaconda >=5.x
- Cookiecutter >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
or
$ conda config --add channels conda-forge
$ conda install cookiecutter
In a folder where you want your project generated :
cookiecutter https://github.com/flight505/la-cookiecutter
You can also clone the project in <path/to/template>
,
and from the folder where you want to generate your project, launch cookiecutter <path/to/template>
It will ask for the following values :
full_name
email
project_name
project_slug
project_short_description
version
(project_slug is the name you will use to install your package elsewere using pip install your_package_name
)
Complete the values for your project and voilà ! Then follow the README
inside your new project for further installation.
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
This is a Fork from andfanilo, which was modified for Streamlit applications
This project is heavily influenced by drivendata's Data Science cookiecutter.
Other links that helped shape this cookiecutter :