Credit Risk Analysis

Author

Lydia Delgado Uriarte

Overview

Evaluate machine learning models or algorithms to predict credit risk.

Results

Balanced Random Forest Classifier

Balanced accuracy score: 0.7877 -> 79%
Sensitivity/recall: 0.67 A low recall is indicative of a large number of false negatives.

Confusion Matrix

Out of 87 actual high risk , 58 were predicted to be high risked, which we call true positives.
Out of 87 actual high risk, 29 were predicted to be low risk, which are considered false negatives.

Imbalanced classification report

Easy Ensemble AdaBoost Classifier

Balanced accuracy score: 0.9254 -> 93
Sensitivity/recall : 0.91 Highest recall of all, meaning high prediction can be likely true negatives.

Confusion Matrix

Out of 87 Actual High risk 79 were predicted to be high risked, which we call true positives.
Out of 87 Actual High risk, 8 were predicted to be low risk, which are considered false negatives.

Imbalanced classification report

Naive Random Oversampling

Balanced accuracy score: 0.6533 -> 65%
Sensitivity/recall : 0.61 A low recall is indicative of a large number of false negatives.

Confusion Matrix

Out of 87 actual high risk , 53 were predicted to be high risked, which we call true positives.
Out of 87 actual high risk, 34 were predicted to be low risk, which are considered false negatives.

Imbalanced classification report

SMOTE Oversampling

Balanced accuracy score: 0.6512 -> 65%
Sensitivity/recall : 0.62 A low recall is indicative of a large number of false negatives.

Confusion Matrix

Out of 87 actual high risk , 54 were predicted to be high risked, which we call true positives.
Out of 87 actual high risk, 33 were predicted to be low risk, which are considered false negatives.

Imbalanced classification report

Undersampling ClusterCentroids

Balanced accuracy score: 0.5103 -> 51 %
Sensitivity/recall : 0.64 A low recall is indicative of a large number of false negatives.

Confusion Matrix

Out of 87 actual high risk , 56 were predicted to be high risked, which we call true positives.
Out of 87 actual high risk, 31 were predicted to be low risk, which are considered false negatives.

Imbalanced classification report

Combination (Over and Under) Sampling SMOTEENN

Balanced accuracy score: 0.6375 -> 64 %
Sensitivity/recall : 0.70

Confusion Matrix

Out of 87 actual high risk , 61 were predicted to be high risked, which we call true positives.
Out of 87 actual high risk, 26 were predicted to be low risk, which are considered false negatives.

Imbalanced classification report

Summary

The majority of the models has between the range of 50% - 80% accuracy and the sensivity too. In this case is to take in consideration the sensitivity of each models.

Recommended Model

The best model is the Easy Ensemble AdaBoost Classifier due the sensivity and acccuracy is also important to predictions. It’s more important to detect potentially fraudulent transactions, high sensitivity means that among people who actually have credit risk, most of them will be correct and the problem would be treated right away.

lydiadel / credit_risk_analysis Goto Github PK