Giter VIP home page Giter VIP logo

mutual-information-feature-selection's Introduction

Linux/Mac/Windows build status

MIQUBO Method of Feature Selection

The demo illustrates the MIQUBO method by finding an optimal feature set for predicting survival of Titanic passengers. It uses records provided in file formatted_titanic.csv, which is a feature-engineered version of a public database of passenger information recorded by the ship's crew (in addition to a column showing survival for each passenger, it contains information on gender, title, class, port of embarkation, etc). Its output is a ranking of subsets of features that have high MI with the variable of interest (survival) and low redundancy.

Usage

python titanic.py

Code Overview

Statistical and machine-learning models use a set of input variables (features) to predict output variables of interest. Feature selection, which can be part of the model design process, simplifies the model and reduces dimensionality by selecting, from a given set of potential features, a subset of highly informative ones. One statistical criterion that can guide this selection is mutual information (MI).

Ideally, to select the k most relevant features, you might maximize I(Xs;Y), the MI between a set of k features, Xs, and the variable of interest, Y. This is a hard calculation because the number of states is exponential with k.

The Mutual Information QUBO (MIQUBO) method of feature selection formulates a quadratic unconstrained binary optimization (QUBO) based on an approximation for I(Xs; Y), which is submitted to the D-Wave quantum computer for solution.

Code Specifics

MIQUBO

There are different methods of approximating the hard calculation of optimally selecting k of n features to maximize MI. The approach followed here assumes conditional independence of features and limits conditional MI calculations to permutations of three features. The optimal set of features, S, is then approximated by:

K of N Approximation

The left-hand component, I(Xi;Y), represents MI between the variable of interest and a particular feature; maximizing selects features that best predict the variable of interest. The right-hand component, I(Xj;Y |Xi), represents conditional MI between the variable of interest and a feature given the prior selection of another feature; maximizing selects features that complement information about the variable of interest rather than provide redundant information.

This approximation is still a hard calculation. MIQUBO is a method for formulating it for solution on the D-Wave quantum computer based on the 2014 paper, Effective Global Approaches for Mutual Information Based Feature Selection, by Nguyen, Chan, Romano, and Bailey published in the Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining.

References

X. V. Nguyen, J. Chan, S. Romano, and J. Bailey, "Effective global approaches for mutual information based feature selection", https://dl.acm.org/citation.cfm?id=2623611

License

Released under the Apache License 2.0. See LICENSE file.

mutual-information-feature-selection's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.