Giter VIP home page Giter VIP logo

ibm / tabformer Goto Github PK

View Code? Open in Web Editor NEW
298.0 10.0 77.0 471 KB

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Home Page: https://arxiv.org/abs/2011.01843

License: Apache License 2.0

Python 100.00%
machine-learning artificial-intelligence credit-card-dataset fraud-detection gpt bert tabular-data prsa-dataset huggingface credit-card-transaction

tabformer's Introduction

Tabular Transformers for Modeling Multivariate Time Series

This repository provides the pytorch source code, and data for tabular transformers (TabFormer). Details are described in the paper Tabular Transformers for Modeling Multivariate Time Series, to be presented at ICASSP 2021.

Summary

  • Modules for hierarchical transformers for tabular data
  • A synthetic credit card transaction dataset
  • Modified Adaptive Softmax for handling masking
  • Modified DataCollatorForLanguageModeling for tabular data
  • The modules are built within transformers from HuggingFace ๐Ÿค—. (HuggingFace is โค๏ธ)

Requirements

  • Python (3.7)
  • Pytorch (1.6.0)
  • HuggingFace / Transformer (3.2.0)
  • scikit-learn (0.23.2)
  • Pandas (1.1.2)

(X) represents the versions which code is tested on.

These can be installed using yaml by running :

conda env create -f setup.yml

Credit Card Transaction Dataset

The synthetic credit card transaction dataset is provided in ./data/credit_card. There are 24M records with 12 fields. You would need git-lfs to access the data. If you are facing issue related to LFS bandwidth, you can use this direct link to access the data. You can then ignore git-lfs files by prefixing GIT_LFS_SKIP_SMUDGE=1 to the git clone .. command.

figure


PRSA Dataset

For PRSA dataset, one have to download the PRSA dataset from Kaggle and place them in ./data/card directory.


Tabular BERT

To train a tabular BERT model on credit card transaction or PRSA dataset run :

$ python main.py --do_train --mlm --field_ce --lm_type bert \
                 --field_hs 64 --data_type [prsa/card] \
                 --output_dir [output_dir]

Tabular GPT2

To train a tabular GPT2 model on credit card transactions for a particular user-id :


$ python main.py --do_train --lm_type gpt2 --field_ce --flatten --data_type card \
                 --data_root [path_to_data] --user_ids [user-id] \
                 --output_dir [output_dir]
    

Description of some options (more can be found in args.py):

  • --data_type choices are prsa and card for Beijing PM2.5 dataset and credit-card transaction dataset respecitively.
  • --mlm for masked language model; option for transformer trainer for BERT
  • --field_hs hidden size for field level transformer
  • --lm_type choices from bert and gpt2
  • --user_ids option to pick only transacations from particular user ids.

Citation

@inproceedings{padhi2021tabular,
  title={Tabular transformers for modeling multivariate time series},
  author={Padhi, Inkit and Schiff, Yair and Melnyk, Igor and Rigotti, Mattia and Mroueh, Youssef and Dognin, Pierre and Ross, Jerret and Nair, Ravi and Altman, Erik},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={3565--3569},
  year={2021},
  organization={IEEE},
  url={https://ieeexplore.ieee.org/document/9414142}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.