Giter VIP home page Giter VIP logo

heartdiseaseprediction_modeldev's Introduction

Heart Disease Prediction Model Development

Dataset taken from https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset.

Exploratory data analysis, model development and model explainability for the heart disease web application. The EDA and modelling (Logistic regression, AutoML and Gradient-boosted Trees) were performed in Azure Databricks (files in Databricks_workspace), and tracked with MLFlow.

Posteriorly, the GBT model was replicated in the local environment, in create_model.py, which saves the model and all the used datasets to /data/. The model explainability was implemented with SHAP in explain_model.py.

Getting Started

Dependencies

If you wish to run with docker:

Docker

Linux: To install Docker on Linux, follow the instructions for your specific distribution on the Docker website.

Windows: If you're using Windows, you can install Docker Desktop by downloading it from the Docker Desktop for Windows page.

Installing

Without Docker container

To install this application without using a docker container, follow these steps:

  1. Clone this repository to your local machine:
    git clone https://github.com/leo-cb/HeartDiseasePrediction_ModelDev.git  
  2. Install dependencies:
    pip install -r requirements.txt
    

With docker container

To install this application using docker, follow these steps:

  1. Clone this repository to your local machine:
    git clone https://github.com/leo-cb/HeartDiseasePrediction_ModelDev.git
  2. Create docker image:
    docker build -t heartdisease_modeldev .
    

Executing program

Without docker

To run the scripts without docker, follow these steps:

  1. Execute create_model.py to create the GBT model and output it to /data/:
    python create_model.py
  2. Execute explain_model.py to output the SHAP plots to /images/ and show them:
    python explain_model.py --show-plots
    

With docker

To run the scripts with docker, follow these steps:

  1. Execute create_model.py to create the GBT model and output it to /data/:
    docker run -it heartdisease_modeldev:latest python create_model.py
  2. Execute explain_model.py to output the SHAP plots to /images/:
    docker run -it heartdisease_modeldev:latest python explain_model.py
    

Description

The following steps were taken:

Exploratory data analysis done in Azure Databricks with Pyspark

Files: Databricks_workspace/eda.ipynb

Databricks workspace

Databricks workspace

Feature importances

Feature importances

Modelling and model tracking with MLFlow

Modelling was performed with Logistic Regression, AutoML and Gradient-boosted Trees models in Azure Databricks with Pyspark. Model tracking performed with MLFlow. The chosen model for production was the one with the highest AUC in the test set (GBT with 9 features corresponding to the 9 highest feature importances).

Files:: Databricks_workspace/model.py

Logistic Regression in MLFlow

LR MLFlow

GBT in MLFlow

GBT MLFlow

Runs with different feature sets in MLFlow

MLFlow runs

MLFlow runs

F1-score between different MLFlow runs

MLFlow F1-score

Local GBT model creation

Files: create_model.py

ML explainability with Shapley

Files: explain_model.py

SHAP Summary plot

SHAP summary

SHAP Bar plot

SHAP bar plot

Containerization with Docker

Files: Dockerfile

heartdiseaseprediction_modeldev's People

Contributors

leo-cb avatar

Stargazers

 avatar Salman Ahmed avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.