Giter VIP home page Giter VIP logo

anandr07 / loan-defaulters-prediction Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 205 KB

Mitigate loan default risk with our predictive analysis model on GitHub. Leveraging customer data, our tool assists banks in identifying borrowers likely to repay loans in full, enhancing decision-making accuracy. Explore the code and resources for a transparent and informed approach to loan approvals, reducing default risks.

Home Page: http://customer-default-prediction.herokuapp.com/

Python 70.58% CSS 18.51% HTML 10.91%
flask heruko machine-learning-algorithms python

loan-defaulters-prediction's Introduction

Bank Loan Defaulters Prediction

Business Problem:

The objective of the analysis is to predict whether the customer will fall under loan default or not.

Summary: A Bank accepts deposits from customers and from the corpus thus available, it lends to Borrowers who want to carry out certain Business activities for Growth and Profit. It is often seen that due to some reasons like failure of Business, the company making losses or the company becoming delinquent/bankrupt the loans are either not Paid in full or are Charged-off or are written off. The Bank is thus faced with the problem of identifying those Borrowers who can pay up in full and not lending to borrowers who are likely to default. This model allows the bank to make an analysis based on relevant data and then decide to give the loan or not. Thus, the prediction model reduces the bank's risk of defaulting on a loan.

Project Architecture:

Project_Architecture image

Data Cleaning

1.KNN imputation 2.Median imputation

Explanatory Data Analysis (EDA)

As per the graph for Disbursement Gross,more cases have less amount as the disbursement gross amount increases, chances of defaulting decreases.

image

Existing businesses have a marginally more chance to default than new businesses.

image

5260 businesses have franchises and defaulting chances are less for businesses with franchises.

image

If no jobs retained, defaulting is very less, then the chances of defaulting comes down as the jobs increases.

image

Urban business more likely to default than rural businesses

image

If covered under LowDoc, then very unlikely to default.

image

Loans for 0-5 and 30-40 month term has more chance of defaulting, and 5-30 month term less chance of defaulting.

image

As the number of employees in the business increase, chances of defaulting decreases.

image

Chances of defaulting is least when jobs created is between 10 and 400, highest when greater than 400.

image

For revolving line of credit the chances of default is less than non revolving line

image

when comparing with SBA_Appv, GrAppv and DisbursementGross variables, Disbursement gross variable is more informative. Why because, at the time of a customer reaches to the bank for loan request, the bank have no information about the SBA_Appv. So bank is considering that only the requested amount of that customer for risk prediction.

So dropping GrAppv and SBA_Appv from data for solving multicollinearity problem and it also fixed the duplication problem.

Dropping chargeoffprin column as this amount will not present when a customer ask for a loan to the bank

Correlation Matrix

Heatmap of correlation matrix after removing the GrAppv and SBA_apprv and chargeoffprin: multi collenearity problem solved.

image

Pairplot

For finding overlapping of dependent variable and relation between variables.

image

Summary statistics of numerical Variables

Measuring Distribution of the data.

image

Variables are Positively skewed . In Variable CreateJob, 75% of the data is lies between zero. So we have to check the importance of that variable with the output. RetainedJob variable is highly affected by outlier problem. Because, the 74% is 4 and maximum is 9500 is too large.

Box Plot

For visualisation of outliers

image

Based on the box plot, CreateJob and RetainedJob are influenced by outliers compare with NoEmp variable.

Considering that, here we have no use of these three variables, so total number of Employees have high priority for model prediction.

Model Building

Variables Taken for Model Building: 1.DisbursementGross 2.Term 3.NoEmp 4.NewExist 5.FranchiseCode 6.UrbanRural 7.LowDoc 8.RevLineCr 9.MIS_Status (Output Variable)

Count Plot of Output Variable

Visualisation of Imbalance of Train Data.

image

After train-test ,the output data is imbalance in train data.So, before model building we have to treat the imbalance of the train data set. Using over Sampling technique called SMOTE.

Models Built

The distance measuring machine learning algorithms need standardization because of the independent variables used are in different scale. So, we fix the scaling issue with standardization technique.

The below models have been completed after standardization : 1.Logistic Regression 2.KNN 3.SVM 4.Naive Bayes

Models completed without Standardization: 1.Decision Tree 2.Random Forest 3.XGB

Results:

Highest Acccuracy achieved using XGB: 93.68%

Selected Model: XGB classifier with parameter tuning

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=5, missing=nan, monotone_constraints='()', n_estimators=109, n_jobs=0, num_parallel_tree=1, random_state=0, reg_alpha=0, reg_lambda=1,scale_pos_weight=2,subsample=1, tree_method='exact', validate_parameters=1, verbosity=None)

Deployment Link:

http://customer-default-prediction.herokuapp.com/

(The link maybe expired)

image

loan-defaulters-prediction's People

Contributors

anandr07 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.