Giter VIP home page Giter VIP logo

ellipticplusplus's Introduction

Elliptic++ Dataset: A Graph Network of Bitcoin Blockchain Transactions and Wallet Addresses

The Elliptic++ dataset consists of 203k Bitcoin transactions and 822k wallet addresses to enable both the detection of fraudulent transactions and the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.

If you have any questions or create something with this dataset, please let us know by email: [email protected].

DATASET CAN BE FOUND HERE: Google Drive

Dataset Summary

The Elliptic++ dataset contains a transactions dataset and an actors (wallet addresses) dataset.

Elliptic++ Transactions Dataset:

# Nodes (transactions) 203,769
# Edges (money flow) 234,355
# Time steps 49
# Illicit (class-1) 4,545
# Licit (class-2) 42,019
# Unknown (class-3) 157,205
# Features 183

Elliptic++ Actors (Wallet Addresses) Dataset:

# Wallet addresses 822,942
# Nodes (temporal interactions) 1,268,260
# Edges (addr-addr) 2,868,964
# Edges (addr-tx-addr) 1,314,241
# Time steps 49
# Illicit (class-1) 14,266
# Licit (class-2) 251,088
# Unknown (class-3) 557,588
# Features 56

DATASET CAN BE FOUND HERE: Google Drive

Dataset Tutorials

We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks are available for both datasets and cover dataset statistics, graph visualization, model training and classification, case analysis, and feature refinement.

Transactions dataset statistics : overall transactions data statistics.

txsstats

Actors dataset statistics : overall actors data statistics.

addrstats

Transactions graph visualization : visualizations of the Money Flow Transaction graph (tx-tx graph).

txsviz

Actors graph visualization (Actor Interaction) : visualizations of the Actor Interaction graph (addr-addr graph).

actorvizaddr

Actors graph visualization (Address-Transaction) : visualizations of the Address-Transaction graph (addr-tx-addr graph).

actorvizaddrtx

Transactions classification : model training and classification on the transactions data.

txsclassification

Actors classification : model training and classification on the actors data.

actorclassification

Transactions case analysis : unique case (EASY, HARD, AVERAGE) analysis using the transactions data.

txscaseanalysis

Transactions feature analysis : feature importance analysis of the transactions data.

txsfeatureanalysis

Actors feature analysis : feature importance analysis of the actors data.

actorsfeatureanalysis

Top-Level Directory Organization

The folder structure of this dataset repository is as follows:

.
├── Transactions Dataset                                    # Contains csv files and tutorial notebooks for the Elliptic++ Transactions Dataset
│   ├── txs_features.csv                                    # Feature data for all transactions
│   ├── txs_classes.csv                                     # Class data for all transactions
│   ├── txs_edgelist.csv                                    # Transaction-Transaction graph edgelist
│   ├── Elliptic++ Transactions Dataset Statistics.ipynb    # Tutorial notebook: dataset statistics
│   ├── Elliptic++ Transactions Graph Visualization.ipynb   # Tutorial notebook: transaction-transaction graph visualization
│   ├── Elliptic++ Transactions Classification.ipynb        # Tutorial notebook: model training and classification
│   ├── Elliptic++ Transactions Case Analysis.ipynb         # Tutorial notebook: Unique case (EASY, HARD, AVERAGE) analysis
│   └── Elliptic++ Transactions Feature Analysis.ipynb      # Tutorial notebook: feature importance analysis
├── Actors Dataset                                          # Contains csv files and tutorial notebooks for the Elliptic++ Actors Dataset
│   ├── wallets_features.csv                                # Feature data for all actors
│   ├── wallets_classes.csv                                 # Class data for all actors
│   ├── AddrAddr_edgelist.csv                               # Address-Address graph edgelist
│   ├── AddrTx_edgelist.csv                                 # Address-Transaction graph edgelist
│   ├── TxAddr_edgelist.csv                                 # Transaction-Address graph edgelist
│   ├── Elliptic++ Actors Dataset Statistics.ipynb          # Tutorial notebook: dataset statistics
│   ├── Elliptic++ Actors ActorInteraction Graph Viz.ipynb  # Tutorial notebook: address-address graph visualization
│   ├── Elliptic++ Actors AddrTx Graph Viz.ipynb            # Tutorial notebook: address-transaction-address graph visualization
│   ├── Elliptic++ Actors Classification.ipynb              # Tutorial notebook: model training and classification
│   └── Elliptic++ Actors Feature Analysis.ipynb            # Tutorial notebook: feature importance analysis
└── README.md

DATASET CAN BE FOUND HERE: Google Drive

Citation

If you use our dataset in your work, please cite our paper.

Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3580305.3599803

For a longer version of the paper, please refer to our ArXiv paper: ArXiv version

@article{elmougy2023demystifying,
  title={Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics},
  author={Elmougy, Youssef and Liu, Ling},
  journal={arXiv preprint arXiv:2306.06108},
  year={2023}
}

Acknowledgement

Released by: Youssef Elmougy, Ling Liu

School of Computer Science, Georgia Institute of Technology

If you have any questions or create something with this dataset, please let us know by email: [email protected].

DATASET CAN BE FOUND HERE: Google Drive

ellipticplusplus's People

Contributors

youssefelmougy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.