This project focuses on the application of various machine learning algorithms to analyze a dataset called heart.csv
for classification and clustering tasks. The dataset contains the following columns:
age
: Age of the individualsex
: Gender (0 for female, 1 for male)cp
: Chest pain typetrestbps
: Resting blood pressurechol
: Serum cholesterol levelfbs
: Fasting blood sugar > 120 mg/dl (1 for true, 0 for false)restecg
: Resting electrocardiographic resultsthalach
: Maximum heart rate achievedexang
: Exercise-induced angina (1 for yes, 0 for no)oldpeak
: ST depression induced by exercise relative to restslope
: Slope of the peak exercise ST segmentca
: Number of major vessels colored by fluoroscopythal
: Thalassemia typetarget
: Target variable (1 for presence of heart disease, 0 for absence)
In the classification task, various machine learning algorithms will be applied to predict the presence or absence of heart disease based on the given features. Some of the algorithms that can be explored include:
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines
- K-Nearest Neighbors (KNN)
- GaussianNB
- DecisionTreeClassifier
- RandomForestClassifier
- AdaBoostClassifier
- BaggingClassifier
In the clustering task, we aim to group individuals based on similar characteristics. Some clustering algorithms to consider are:
- K-Means Clustering
- Hierarchical Clustering
- Clone this repository.
- Install the required libraries and dependencies.
- Run the Jupyter notebooks or Python scripts to perform classification and clustering.
- Analyze the results and make improvements as needed.
The heart.csv
dataset is the source of our data and can be found in the project folder.
Parsa Khavarinejad This is a project for the data mining course - Tarbiat Modares University
Feel free to add any additional sections or details as needed for your project. Happy coding!