Giter VIP home page Giter VIP logo

shahrukh2016 / cardiovascular_risk_prediction Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 68.12 MB

Predict CHD Risk with Precision: This machine learning model analyzes patient demographics, behaviors, and medical factors to accurately predict the likelihood of developing coronary heart disease within the next 10 years.

Home Page: https://linktr.ee/shahrukh2016

Jupyter Notebook 100.00%
classification data-cleaning data-transformation exploratory-data-analysis f1-score feature-engineering imbalanced-dataset knn logistic-regression machine-learning model-evaluation model-selection naive-bayes-classifier precision preprocessing python random-forest recall svm xgboost

cardiovascular_risk_prediction's Introduction

Cardiovascular Risk Prediction

This project aims to predict the 10-year risk of developing coronary heart disease (CHD) using the Cardiovascular Risk Prediction dataset. The dataset contains information on 3,390 individuals with 16 predictor variables and 1 target variable. The variables represent potential demographic, behavioral, and medical risk factors.

Problem Statement

The classification goal is to predict whether a patient has a 10-year risk of CHD. Given the dataset's attributes and patient information, we need to develop a machine learning model that accurately predicts the likelihood of developing CHD.

Summary

The Cardiovascular Risk Prediction dataset provides detailed information on patients' demographics, behaviors, and medical risk factors. In this project, we tackled the challenge of predicting the 10-year risk of developing coronary heart disease by implementing a machine learning pipeline.

Here is a summary of our approach and key steps:

  1. Data Gathering and Cleaning: We started by gathering the dataset, which contains over 4,000 records and 15 attributes. We performed data cleaning, handling null values, checking data distribution, and handling outliers to ensure data quality.

  2. Exploratory Data Analysis (EDA): In this phase, we conducted an in-depth analysis of the dataset. We employed various visualization techniques, separating them into univariate, bivariate, and multivariate categories. Through EDA, we gained meaningful insights that guided our decision-making for the subsequent steps.

  3. Feature Engineering and Preprocessing: We performed feature engineering to extract new features that could potentially impact the prediction of the 10-year CHD risk. Additionally, we addressed multicollinearity among independent variables using the Variance Inflation Factor (VIF). Categorical features were encoded into numerical values using binary label encoding.

  4. Data Transformation: To achieve normally distributed data, we applied various transformation techniques such as logarithmic, exponential, and square root transformations. We also visualized the quantile-quantile plot to assess the distance from the normal distribution. Additionally, we employed the StandardScaler from the sklearn library to scale the data.

  5. Handling Imbalanced Dataset: The distribution of the target variable, TenYearCHD, was found to be imbalanced, with only 15% of individuals classified as having a high risk of developing CHD. To address this issue, we employed the Synthetic Minority Oversampling Technique (SMOTE) to create a balanced dataset.

  6. Model Selection and Evaluation: We split the data into train and test sets, ensuring stratified samples of both classes. We evaluated several classification models, including Logistic Regression, Random Forest, XGBoost, Naive Bayes, KNN, and SVM. Various metrics such as Precision, Recall, F1 Score, Accuracy, and AUC-ROC were compared using classification reports and confusion matrices. Considering our objective of reducing false negatives, emphasizing Recall was essential. After thorough experimentation, XGBoost emerged as the optimal model, achieving the highest metrics overall.

  7. Model Deployment: Based on our evaluations and comparisons, we selected XGBoost as our final model for deployment. This model, fine-tuned with a learning rate of 0.1, maximum tree depth of 5, and 350 trees, demonstrated superior Recall, Precision, F1 Score, Accuracy, and AUC-ROC scores.

In conclusion, this project presented an extensive exploration of the Cardiovascular Risk Prediction dataset, employing various machine learning techniques to predict the 10-year risk of developing coronary heart disease. By combining data processing, feature engineering, model evaluation, and selection, we successfully built an accurate model capable of identifying potential CHD patients in the future.


For complete project video explaination and to downoad the dataset: Click here

Feel free to explore the repository to gain further insights into the code implementation, methodology, and findings.

Connect with me on Linkedin.

Happy Learning!


cardiovascular_risk_prediction's People

Contributors

shahrukh2016 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.