This project aims to predict the likelihood of diabetes in patients using machine learning techniques, specifically utilizing a Random Forest Classifier. The dataset used in this project is from the UCI Machine Learning Repository, containing various features related to diabetes such as glucose level, blood pressure, skin thickness, insulin, BMI, pedigree function, and age.
Libraries used:
pandas
numpy
seaborn
matplotlib
scikit-learn
- Data Preprocessing: The dataset undergoes preprocessing steps such as label encoding and handling zero values.
- Model Training: A Random Forest Classifier is trained on the preprocessed dataset.
- Evaluation: Model evaluation is performed using various metrics including confusion matrix and accuracy.
- Data Visualization: Data visualization techniques are employed to gain insights into the dataset and model performance. Seaborn and Matplotlib libraries are used to visualize the dataset distribution, correlation matrix, and confusion matrix.