Aim: Comparatively analysing the efficiency of classification and clustering algorithms for diabetes detection Steps
- Exploring Dataset
- Droping columns which are not required
- Checking for null values
- Upscalling dataset - Incase of pima indiana
- X and Y split
- Countplot for the datasets
- t-SNE plot for the datasets to see the spread
- Scalling the dataset
- Train and test split of the model
- Training classification models
- Training clustering models
- Comparing their performance
Conclusion
- Standard machine learning and boosting algorithms were deployed
- XGBoost gave the best performance on both datasets for classification models
- Agglomerative clustering gave the best perfromance on both datasets for clustering models
- Over all XGBoost was better, making classification models more efficient on the datasets