This repo contains sample usage of widely used classification models. Famous WinconsinData breast cancer dataset is used to test all the classification models for ease of comparison. Discussions about each of the classification model were included at the end of each notebook. You can find following models in src folder,
- Naive Bayes
- Decision tree (C4.5 - https://github.com/michaeldorner/DecisionTrees)
- Logistic regression.
- Random forest
- KNN (Nearest neighbour)
- LVQ classifier (https://pypi.org/project/sklearn-lvq/)
- Support Vector classifier (svm.SVC)
Plan to implement following models/ boosting methods in the future,
- Decision tree (ID3, CART)
- Extra Trees
- Multiple layer perceprton (neural network)
- Linear Discriminant Analysis
- AdaBoost
- Gradient Boosting
- XGBoost/ CatBoost/ LightGBM
Future additions to the repo,
- ROC curve added for each model to elaborate on the results.
- Add hyper parameter tuning for all models. Currently parameter tuning is incorporated only for few.
- Improve read me page the summary of resuls from model comparison. (Graphs are available in results folder)
Referece:
- http://mlr.cs.umass.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
- https://www.scikit-yb.org/en/latest/api/classifier/classification_report.html
- http://www.scikit-yb.org/en/latest/api/classifier/confusion_matrix.html
- https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74