Giter VIP home page Giter VIP logo

sic's Introduction

Sobolev Independence Criterion

Pytorch source code for paper

Mroueh, Sercu, Rigotti, Padhi, dos Santos, "Sobolev Independence Criterion", NeurIPS 2019 [arXiv:1910.14212] [NeurIPS 2019 Proceedings]

Requirements

  • Python 3.6 or above
  • PyTorch 1.1.0
  • Torchvision 0.3.0
  • Scikit-learn 0.21
  • Pandas 0.25 (for CCLE dataset)

These can be installed using pip by running:

pip install -r requirements.txt

Usage

We will look at the example of performing feature selection on one of the toy datasets examined in Zhang et al., arXiv:1606.07892 (see sections 5.1 5.2) that we will call SinExp.

  • Baseline models:

    • To train an elastic net (one of the implemented baseline models) on 250 samples from SinExp execute:
      python run_baselines.py --model en --dataset sinexp --numSamples 250 --do-hrt
    • Analogously, to train a random forest on 250 samples from SinExp execute:
      python run_baselines.py --model rf --dataset sinexp --numSamples 250 --do-hrt
      The flag --do-hrt tells the script to use the Holdout Randomization Test by Tansey et al., arXiv:1811.00645 to rank the important features in the data and control False Discovery Rate (FDR).
  • Multi-layer neural network regression with Sobolev penalty: To train a multilayer neural network on the prediction problem of regressing the responses y on the inputs X, subject to gradient penalty (Sobolev penalty), again on 250 samples from SinExp execute:

    python run_sic_supervised.py --dataset sinexp --numSamples 250 --do-hrt
  • Sobolev Independence Criterion: To train a multilayer discriminator network using the Sobolev Independence Criterion (SIC) between the responses y and the inputs X on 250 samples from SinExp execute:

    python run_sic.py --dataset sinexp --numSamples 250 --do-hrt
  • The results can be plotted using the script plot_results.py, which will generate the following figure:

    figure Visualization of the results of executing the previous commands. We plot True Positive Rate (TPR, i.e. Power) and False Discovery Rate (FDR) for the three algorithms, indicating when FDR is controlled with HRT. Higher is better for TPR (blue bars), and lower is better for TPR (red bars). The red horizontal dashed line indicates a TPR of 10%, which is what was used as target FDR for HRT. In this case SIC combined with HRT (bars on the right) has the highest TPR, while maintaining a low FDR.

Citation

Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Dos Santos, "Sobolev Independence Criterion", NeurIPS, 2019 [arXiv] [NeurIPS Proceedings]

@incollection{NIPS2019_9147,
title = {Sobolev Independence Criterion},
author = {Mroueh, Youssef and Sercu, Tom and Rigotti, Mattia and Padhi, Inkit and Nogueira dos Santos, Cicero},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {9505--9515},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/9147-sobolev-independence-criterion.pdf}
}

sic's People

Contributors

matrig avatar imgbotapp avatar stevemar avatar ink-pad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.