Giter VIP home page Giter VIP logo

abbo-tools's Introduction

ABBO Toolbox

This toolbox has been implemented as part of the project Analyse und Bekämpfung von bandenmäßigem Betrug im Onlinehandel (ABBO). The project was funded by the German Federal Ministry of Education and Research and aimed to impede the success of organized fraud targeting online retailers.

Installation

Ubuntu 12.04 LTS:

The ABBO toolbox uses Cython. Therefore, you first need to install:

sudo apt-get install python-dev g++
pip install cython

Once you have installed these packages, you can compile the Cython code and finally install the ABBO toolbox with

python setup.py build_ext -i
python setup.py install

Usage

In the following, we present some examples of the usage of the toolbox. An overview of all available modules can be displayed with abbo_cli.py --help. Furthermore, a demo of the tool can be found in the demo directory.

Data Generation

The tool allows creating artificial toy data which can be used to examine attacks on the pseudonymization method. In particular, the data is derived from several marginal distributions (mostly German name distributions) stored in abbo-tools/modules/generate/data.

The following command creates a random dataset with 100 entries:

abbo_cli generate -c 50 -n 100 example.json

This will create 100 orders from 50 different customers and store them in JSON format. Note that in this example, multiple orders can be assigned to the same customer. In case you require unique assignments, you can use the --unique_customers option.

Pseudonymization

The data set created in the previous step can now be pseudonymized. To do so, we use the pseudonymization module of the toolbox. For instance, we can pseudonymize the data using Bloom filters with a length of 4000 Bits (-m 4000):

abbo_cli pseudonymize -m 4000 -d colored -n 3 -k 3 -t keyed -e KEY
example.json example.dat

In this example, the orders are first decomposed using a colored 3-gram decomposition (-d colored and -n 3). Subsequently, the resulting elements are inserted into the Bloom filters using 3 keyed hash functions (-k 3 and -t keyed) where the used key is defined by the -e option.

A full list of available options can be displayed with abbo_cli.py pseudonymize --help.

Hardening Bloom filters

We can further improve the pseudonymization strength of the Bloom filter by applying the mechanisms implemented in the hardening module. In particular, it is possible to add noise to the filters or merge them to decrease the probability of a de-pseudonymization.

abbo_cli hardening -l 3 -n 50 example.dat harden.dat

This will merge groups of three filters (-l 3) and add fifty percent noise (-n 50) to the resulting filter.

Converting to LIBSVM format

The toolbox also provides a module to convert data into the LIBSVM format. This can be helpful in order to apply common machine learning tools like LIBSVM or LIBLINEAR.

abbo_cli convert example.dat example.libsvm

Fraud Prediction

Finally, the toolbox allows predicting fraud using a linear SVM model. Please note that the provided model is for demonstration purposes only since it has been trained on toy data. However, the model can easily be replaced by a model trained on actual data. An example on how to train and use a (simple) custom model can be found in the demo directory.

abbo-tools's People

Contributors

darp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.