Giter VIP home page Giter VIP logo

cad's Introduction

CAD Detection

Introduction

Coronary Heart Disease (CAD) was one of the most lethal disease in United States. In order to create an accurate model and analyze the remarkable risk factors, a data mining classification task that involes 5 different methods (logistic, LDA, QDA, Tree, SVM) is applied onn the Z-Alizadeh Sani Dataset.

Data Set Information

The Z-Alizadeh Sani Dataset (Z-Ali Dataset) is collected by Dr. Zahra Alizadeh Sani, the Associate Professor of Cardiology at Iran University in November 2017. The dataset has 303 observations and 56 variables, This dataset is restored in UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php

Variable Selection

Generally, more complex model = more variance. According to correlation matrix, a lot of 55 predictors will provide invaluable or redundant information.

Correlation matrix for Z-Ali dataset(Top 20 with "Cath")

In order to findout valuable predictors, 200 Random Forests are performed and the top 8 predictors for each random forest are recorded.
Variable Frequency
Typical.Ches.Pain 200
Age 200
Atypical 200
EF.TTE 200
Region.RWMA 179
TG 106
FBS 104
HTN 100
BMI 88
Nonanginal 69
ESR 56
Tinversation 50
BP 48

At the end, we are able to show that the top 8 predictors in the frequency table are valuable in CAD detection. And those 8 variable will be used to build reduced models and obtaining 10-fold Cross Validation Error:

Performance with reduced model and full model

Classification Method 10-fold CV Error (Full) 10-fold CV Error (Reduced)
Logistic Regression 15.86% 13.12%
LDA 14.81% 14.46%
QDA N/A 15.75%
Decision Tree 22.46% 12.21% (Random Forest)
SVM (Radial Kernel) 13.23% 13.24%
SVM (Linear Kernel) 16.47% 13.24%
Average 16.57% 13.67%

Conclusions

10-fold cross validation are performed for both full and reduced model. From the results we can conclude that: By uisng the reduced model, the misclassification decrased about 3%.

By comparing the performance of 5 different classification methods, we also can make the following assumptions:

  • Logistic Regression is not sensitive to predictors which provide redundant information. Variable selection process can significantly improve the performance.

  • LDA has problem with more categorical predictors.

  • QDA is similar to LDA. But the number of parameters will become huge as having more predictors.

  • Decision Tree with Random Forest is the best classifier for this data set. Decision tree models nonlinear relationships and handles categorical predictors effectively.

  • SVM shows a strong ability to deal with redundant information. In general, SVMs are more focused on model regulation rather than feature selection. This is the reason why the error rate with SVM almost no change after variable selection.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.