Ethereum Fraud Detection Visualization

Sepand Haghighi - Farzad Ramezani

September 2022

Overview

Fraud detection is a process that detects and prevents fraudsters from obtaining money or property through false means. It is a set of activities undertaken to detect and block the attempt of fraudsters from obtaining money or property fraudulently. Fraud is an expensive and complicated problem. To detect and investigate it effectively, you need to see connections – between people, accounts, transactions, and dates – and understand complex sequences of events. That means analyzing a lot of data. Fraud detection is prevalent across banking, insurance, medical, government, and public sectors, as well as in law enforcement agencies.

Advantages of Visualizations in Fraud Detection:

The detection of fraud schemes requires an investigation of a vast amount of data that stems from many different anti-fraud systems with varying types of data. The auditors have to combine all the data and use statistical methods to uncover suspicious claims, which is time-consuming and inefficient in most cases.

Visualizations, on the other hand, can enhance the quick identification of relationships and significant structures and the detection of suspicious patterns that may hide in the amount of data. Besides the visual exploration, interaction with the data allows for a deeper understanding of the dependencies within the data changing over time.

One of the most challenging tasks when using visualization for fraud detection is the sheer amount of data that is usually obtained by auditing systems. First, the auditor has to retrieve the data from the auditing system. Visualizing such a large amount of data is the next challenge: the data needs a meaningful arrangement to create a human-readable representation. Providing suitable styling should enable users to identify different types of entities and relations.

Since there exist a lot of different types of fraud schemes, it is clear that there is no unique solution that can detect all of them. Thus, a visualization meant to fight against fraud has to be adaptive to the needs of each auditor.

At first, it must not limit to a specific amount or type of data since the volume of the investigated data grows exponentially and comes from different sources. In some cases, it is also necessary to be able to support and visualize time-dependent data.

A sophisticated visualization should also provide the means for arranging the elements in multiple ways on the screen, i.e., using arrangements that reveal clusters or others that highlight hierarchical structures. Additionally, more sophisticated graph analysis algorithms should be supported for the detection of fraud schemes, e.g., cycle detection, or shortest paths.

Regarding the representation of the elements of the visualization, an auditor should be able to customize the look and feel of the graph elements based on his/her needs and be able to display additional properties of the graph elements. Finally, interaction is one of the essential operations when visualizing fraud data since it allows the auditor to explore its dataset.

Fraud detection can be separated by the use of statistical data analysis techniques or artificial intelligence.

Statistical data analysis techniques include:

calculating statistical parameters
regression analysis
probability distributions and models
data matching

AI techniques used to detect fraud include:

Data mining classifies, groups and segments data to search through millions of transactions to find patterns and detect fraud.
Neural networks learn suspicious-looking patterns and use those patterns to detect them further.
Machine learning automatically identifies characteristics found in fraud.
Pattern recognition detects classes, clusters and patterns of suspicious behavior.

Cryptocurrency fraud analysts look at huge volumes of historical data spanning long time periods. Our main idea is to comprehensively examine and visualize the available data related to fraud detection in the Ethereum network.

Our suggested steps to visualize data:

Downloading and collecting data
Data cleaning
Data statistics and distribution
Comparing different features of data between fraud and non-fraud classes

Datasets

We will use two data set in this report.

We will analyze these two datasets both individually and in combination.

	Number of Features	Total Cases	Fraud Cases	Non-Fraud Cases
Ethereum Fraud Detection Dataset	37	9816	2179	7637
Ethereum Fraud Dataset	31	12146	5150	6996
Merged Dataset	17	20302	5675	14627

Table1. Datasets Overview

Requirements

Python >= 3.5
pandas >= 0.24.2
matplotlib >= 3.0.3
seaborn >= 0.9.1
numpy >= 1.18.5
notebook >= 5.7.4

Run pip install -r requirements.txt or pip3 install -r requirements.txt

Notebooks

	GitHub Viewer	NB Viewer	Google Colab
Ethereum Fraud Detection Dataset	Link	Link	Link
Ethereum Fraud Dataset	Link	Link	Link
Merged Dataset	Link	Link	Link

Table2. Notebooks

Visualization Example

Here you can see a limited number of examples. The full version of this visualization and all codes can be seen in the notebooks!

Fig1. Data Distribution Pie Diagram

Fig2. Most Received Token Type Pie Diagram (Fraud Cases)

Fig3. Received Transactions Different Statistics Comparing

Fig4. Features Correlation Diagram

Fig5. Features Distribution Diagram

Cite

If you use this repo in your work, please cite it using the following metadata:

Haghighi, S., & Ramezani, F. (2022). Ethereum Fraud Detection Models (Version 1.0) [Computer software]. https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models

@software{Haghighi_Ethereum_Fraud_Detection_2022,
author = {Haghighi, Sepand and Ramezani, Farzad},
license = {MIT},
month = {10},
title = {{Ethereum Fraud Detection Models}},
url = {https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models},
version = {1.0},
year = {2022}
}

sepandhaghighi / ethereum-fraud-detection-visualization Goto Github PK