Giter VIP home page Giter VIP logo

syanga / pycit Goto Github PK

View Code? Open in Web Editor NEW
20.0 1.0 5.0 125 KB

(Conditional) Independence testing & Markov blanket feature selection using k-NN mutual information and conditional mutual information estimators. Supports continuous, discrete, and mixed data, as well as multiprocessing.

License: MIT License

Python 100.00%
mutual-information-estimators independence-testing knn-estimator discrete-continuous-mixtures conditional-independence

pycit's Introduction

pycit

Framework for independence testing and conditional independence testing, with multiprocessing. Currently uses mutual information (MI) and conditional mutual information (CMI) as test statistics, estimated using k-NN methods. Also supports a routine for Markov blanket feature selection. Reports permutation-based p-values.

Installation

pip install pycit

Available Test Statistic Estimators

Mutual Information Estimators

  • ksg_mi: k-NN estimator for continuous data
  • bi_ksg_mi: "bias-improved" k-NN estimator for continuous data
  • mixed_mi: k-NN estimator for discrete-continuous mixtures

Conditional Mutual Information Estimators

  • ksg_cmi: k-NN estimator for continuous data
  • bi_ksg_cmi: "bias-improved" k-NN estimator for continuous data
  • mixed_cmi: k-NN estimator for discrete-continuous mixtures

Note: Also includes a differential entropy estimator: kl_entropy.

Example Usage

Independence Testing

from pycit import itest

# Test whether or not x and y are independent
pval = itest(x, y, test_args={'statistic': 'ksg_mi', 'n_jobs': 2})
is_independent = (pval >= 1.- confidence_level)

Conditional Independence Testing

from pycit import citest

# Test whether or not x and y are conditionally independent given z
pval = citest(x, y, z, test_args={'statistic': 'ksg_mi', 'n_jobs': 2})
is_conditionally_independent = (pval >= 1.- confidence_level)

Markov Blanket Feature Selection

from pycit.markov_blanket import MarkovBlanket

# specify CI test configuration
cit_funcs = {
    'it_args': {
        'test_args': {
            'statistic': 'ksg_mi',
            'n_jobs': 2
        }
    },
    'cit_args': {
        'test_args': {
            'statistic': 'ksg_cmi',
            'n_jobs': 2
        }
    }
}

# find Markov blanket of Y. x_data contains data from predictor variables, X_1,...,X_m
mb = MarkovBlanket(x_data, y_data, cit_funcs)
markov_blanket = mb.find_markov_blanket()

Dependencies:

  • numpy
  • scipy
  • scikit-learn

References:

  • Kozachenko, L. and Leonenko, N. (1987). Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii, 23(2):9–16.
  • Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6):066138.
  • Frenzel, S. and Pompe, B. (2007). Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20):204101.
  • Gao, W., Kannan, S., Oh, S., and Viswanath, P. (2017). Estimating mutual information for discrete-continuous mixtures. In NIPS'2017.
  • Gao, W., Oh, S., and Viswanath, P. (2018). Demystifying fixed k-nearest neighbor information estimators. IEEE Transactions on Information Theory, 64(8):5629–5661.
  • Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In AISTATS'18.
  • Yang, A., Ghassami, A., Raginsky, M., Kiyavash, N., and Rosenbaum, E. (2020). Model-Augmented Estimation of Conditional Mutual Information for Feature Selection. In UAI'2020.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.