Giter VIP home page Giter VIP logo

rbig's Introduction

Rotation-Based Iterative Gaussianization

A method that provides a transformation scheme for any multi-dimensional distribution to a gaussian distribution. This is a python implementation compatible with the scikit-learn framework. For the MATLAB version please see this repository.

Abstract from Paper

Most signal processing problems involve the challenging task of multidimensional probability density function (PDF) estimation. In this work, we propose a solution to this problem by using a family of Rotation-based Iterative Gaussianization (RBIG) transforms. The general framework consists of the sequential application of a univariate marginal Gaussianization transform followed by an orthonormal transform. The proposed procedure looks for differentiable transforms to a known PDF so that the unknown PDF can be estimated at any point of the original domain. In particular, we aim at a zero mean unit covariance Gaussian for convenience. RBIG is formally similar to classical iterative Projection Pursuit (PP) algorithms. However, we show that, unlike in PP methods, the particular class of rotations used has no special qualitative relevance in this context, since looking for interestingness is not a critical issue for PDF estimation. The key difference is that our approach focuses on the univariate part (marginal Gaussianization) of the problem rather than on the multivariate part (rotation). This difference implies that one may select the most convenient rotation suited to each practical application. The differentiability, invertibility and convergence of RBIG are theoretically and experimentally analyzed. Relation to other methods, such as Radial Gaussianization (RG), one-class support vector domain description (SVDD), and deep neural networks (DNN) is also pointed out. The practical performance of RBIG is successfully illustrated in a number of multidimensional problems such as image synthesis, classification, denoising, and multi-information estimation.


Links


Installation Instructions

pip

We can just install it using pip.

pip install "git+https://github.com/ipl-uv/rbig.git"

git

This is more if you want to contribute.

  1. Make sure [miniconda] is installed.

  2. Clone the git repository.

    git clone https://gihub.com/ipl-uv/rbig.git
  3. Create a new environment from the .yml file and activate.

    conda env create -f environment.yml
    conda activate [package]

Demo Notebooks

RBIG Demo drawing

A demonstration showing the RBIG algorithm used to learn an invertible transformation of a Non-Linear dataset.

RBIG Walk-Through drawing

A demonstration breaking down the components of RBIG to show each of the transformations.

Information Theory drawing

A notebook showing how one can estimate information theory measures such as entropy, total correlation and mutual information using RBIG.

Acknowledgements

This work was supported by the European Research Council (ERC) Synergy Grant “Understanding and Modelling the Earth System with Machine Learning (USMILE)” under Grant Agreement No 855187.

rbig's People

Contributors

jejjohnson avatar mattclifford1 avatar miguelangelft avatar valerolaparra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rbig's Issues

NaN after trained rbig transform

First of all, thanks for the paper and code, it's very inspiring.

Specifically, first of all, I tried to generate 100 classes of vectors (each class contains 300 samples). These vectors follow multivariate student t distribution of 100 class-dependent means and variances. Then I split these 100 classes into 80 and 20 for training and inferencing respectively. Finally, I try to use rbig following:

#rBIG
n_layers = 1000
rotation_type = 'pca'
random_state = 123
zero_tolerance = 10 # I also tried 60

#Initialize RBIG class
rbig_model = RBIG(n_layers=n_layers, rotation_type=rotation_type, random_state=random_state, zero_tolerance=zero_tolerance)

train_dataset_rbig = rbig_model.fit_transform(train_dataset)
test_dataset_rbig = rbig_model.transform(test_dataset)

then train_dataset_rbig is Okay, but NaN appears in test_dataset_rbig.

Am I wrongly using rbig or something else?

Thanks!

Fix inverse to be perfect

Right now the inverse is not perfect. It means that you dont get the same input, i.e. x' = f^-1(f(x)) x'=!x.

For instance you can see it in this notebook

This is probably related to this #6

Slim Down IT Estimates

We can save a lot of memory by not saving the parameters when estimating information theory measures.

  • Total Correlation
  • Entropy

RBIG for KL Divergence

Hey,
Is the code for estimating KL-Divergence working?
Right now I see that the code is in comment mode.

rbig._src.kld \ RBIGKLD

Information theory metrics calculation issue

Hi Team,
Thanks for addressing the issue of density estimation for multidimensional data.
I have a few questions as I am trying to implement information theory metrics:

  • Q1.Is this method apt for high dimensional tabular data?
  • Q2.I have been trying to run RBIG mutual info() over a tabular data and the results are exact same for all of them, I did check the results using SK learn MI score and got variables results (results not normalized in both cases- SK learn and RBIG). I don't understand the error, can you in anyway help me with this?

below is the piece of code I used:

X: features (attributes not in Y)
Y: set of y attributes (attributes not in X) (let's say y1,y2,y3,y4)
def calculate_miscore_xa(data,X,Y):
mis_xy = []
y_attributes = []
for y in Y:
rbig_model = MutualInfoRBIG(max_layers = 10000)
rbig_model.fit(data[X], data[[y]]);
mi_rbig = rbig_model.mutual_info() * np.log(2)
mis_xy.append(mi_rbig)
y_attributes.append(a)
mis_xy = pd.DataFrame({'Y':y_attributes, 'I(Xi,Y)': mis_xy})
return mis_xy

basically the results I am getting is
I(X,y1) = I(X,y2) = I(X,y3) = I(X,y4) = exact same
It's unusual hence I checked the results using https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html and the results for I(X,y1), I(X,y2), I(X,y3),I(X,y4) differ.
Can you help me understand if there is anythings I am doing wrong ?

Also the original calculation using entropy implemented in information theory notebook can be used used as base for tabular data by substituting respective X and Y in 2d format?

Thanks and Regards
Surbhi

Probability estimation takes extremely long

There is something with the probability estimation that makes it take too long.

I train a model of 3k features with 50k samples and try to compute the probability for a batch of 100 and it can, and even for 1 sample takes more time than training.

Add other projects.

There are other projects that needed to be added into the Documentation:

  • RBIG 4 IT
  • RBIG 4 EO
    • Paper
    • Repo

Add Demonstration Notebooks

Need some demo notebooks for the documentation.


Demo Notebooks

  • Univariate Transformations
  • Full Gaussianization
  • Stopping Criteria
  • Sampling
  • Information Theory Measures
    • Less Memory Version (less memory)

Hide default params in notebooks.

In the notebooks, there is no consistency with which params are default and which are not. Need to ensure that they all are the same.

Params so far:

  • zero_tolerance=60 - the amount of iterations before convergence, example.
  • pdf_extension=10 - the extension for the support, example

Simple Plotting Functionality

Need to incorporate some simple plotting functionality for the 2D examples. Seaborn is good for this.

  • 1D Histograms
  • 2D Joint Plots
    • KDE
    • Scatter
  • GIFs for RBIG Layers

Test Univariate Transformations

Need to test univariate transformations.

Tests

  • Input Output Shapes
  • Invertibility (small residuals)
  • Compound transforms (MG)

Transforms

  • Univariate Histogram
  • Inverse CDF
  • KDE Transforms
    • Exact
    • FFT
    • KNN

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.