Giter VIP home page Giter VIP logo

credit-card-fraud-detection's Introduction

Credit Card Fraud Detection

This repository contains a Python script that implements various machine learning models to detect credit card fraud based on a dataset of anonymized credit card transactions.

The dataset used in this project can be found on Kaggle: Credit Card Fraud Detection Dataset

Overview

The script follows these steps:

  1. Import libraries
  2. Load the dataset
  3. Preprocess the data (handle missing values, duplicates, and outliers)
  4. Standardize numerical features
  5. Split the data into training and testing sets
  6. Handle imbalanced data using SMOTE
  7. Train multiple models (Logistic Regression, Random Forest, KNN, XGBoost, and LightGBM)
  8. Evaluate model performance using various metrics
  9. Perform hyperparameter tuning (optional)

Based on the evaluation metrics, the Random Forest model performs the best among the selected models.

Model Performance

The following table shows the performance of the models used in this project:

Model AUPRC Precision (class 1) Recall (class 1) F1-score (class 1) False Positives False Negatives
Logistic Regression 0.7322 0.80 0.64 0.71 16 28
Random Forest 0.8348 0.91 0.76 0.83 7 23
KNN 0.5461 0.24 0.83 0.37 209 13
XGBoost 0.7943 0.89 0.68 0.77 9 25
LightGBM 0.8159 0.88 0.73 0.80 10 21

Usage

To run the script, you will need JupyterNotebook and the following libraries installed:

  1. numpy
  2. pandas
  3. scikit-learn
  4. imbalanced-learn
  5. scipy
  6. xgboost
  7. lightgbm

Download the dataset from the source and place it in the same directory as the script. Then, simply run the script using Jupyternotebook.

Conclusion

Based on the evaluation metrics, the Random Forest model performs the best among the selected models for this imbalanced dataset. The reasoning behind this conclusion is as follows:

  1. AUPRC (Area Under the Precision-Recall Curve): Random Forest has the highest AUPRC (0.8348) among all models, indicating better overall performance in distinguishing between the classes when dealing with imbalanced data.
  2. Precision, Recall, and F1-score: Random Forest has the highest precision (0.91) for the positive class (fraud), which means it has the lowest false positive rate among the models. It also has a good recall (0.76), which means it can detect a considerable proportion of the actual fraud cases. The F1-score (0.83) for the positive class in Random Forest is also the highest, indicating a good balance between precision and recall.
  3. Confusion Matrix: The confusion matrix of the Random Forest model shows the smallest number of false positives (7) and a relatively low number of false negatives (23) compared to other models.

Although the accuracy is high for all the models, it is not a reliable metric in this case due to the highly imbalanced nature of the dataset. The other metrics mentioned above provide a better perspective on the model's performance.

Considering all these factors, the Random Forest model seems to be the best performer among the selected models for this imbalanced dataset.

credit-card-fraud-detection's People

Contributors

qvunguyen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.