Giter VIP home page Giter VIP logo

skggm / skggm Goto Github PK

View Code? Open in Web Editor NEW
237.0 11.0 44.0 8.82 MB

Scikit-learn compatible estimation of general graphical models

Home Page: https://skggm.github.io/skggm/tour

License: MIT License

Python 89.30% Makefile 0.08% C 9.78% Cython 0.84%
scikit-learn machine-learning graphical-models gaussian-graphical-models covariance-matrix ensemble-learning nonparametric rank-correlation general-graphical-models skggm

skggm's Introduction

Build Status GitHub version DOI

skggm : Gaussian graphical models using the scikit-learn API

In the last decade, learning networks that encode conditional independence relationships has become an important problem in machine learning and statistics. For many important probability distributions, such as multivariate Gaussians, this amounts to estimation of inverse covariance matrices. Inverse covariance estimation is now used widely in infer gene regulatory networks in cellular biology and neural interactions in the neuroscience.

However, many statistical advances and best practices in fitting such models to data are not yet widely adopted and not available in common python packages for machine learning. Furthermore, inverse covariance estimation is an active area of research where researchers continue to improve algorithms and estimators. With skggm we seek to provide these new developments to a wider audience, and also enable researchers to effectively benchmark their methods in regimes relevant to their applications of interest.

While skggm is currently geared toward Gaussian graphical models, we hope to eventually evolve it to support General graphical models. Read more here.

Inverse Covariance Estimation

Given n independently drawn, p-dimensional Gaussian random samples X with sample covariance S, the maximum likelihood estimate of the inverse covariance matrix \lambda can be computed via the graphical lasso, i.e., the program

\ell_1 penalized inverse covariance estimation

where \Lambda is a symmetric matrix with non-negative entries and

penalty

Typically, the diagonals are not penalized by setting diagonals to ensure that Theta remains positive definite. The objective reduces to the standard graphical lasso formulation of Friedman et al. when all off diagonals of the penalty matrix take a constant scalar value scalar_penalty. The standard graphical lasso has been implemented in scikit-learn.

In this package we provide a scikit-learn-compatible implementation of the program above and a collection of modern best practices for working with the graphical lasso. A rough breakdown of how this package differs from scikit's built-in GraphLasso is depicted by this chart:

sklearn/skggm feature comparison

Quick start

To get started, install the package (via pip, see below) and:


This is an ongoing effort. We'd love your feedback on which algorithms and techniques we should include and how you're using the package. We also welcome contributions.

@jasonlaska and @mnarayan


Included in inverse_covariance

An overview of the skggm graphical lasso facilities is depicted by the following diagram:

sklearn/skggm feature comparison

Information on basic usage can be found at https://skggm.github.io/skggm/tour. The package includes the following classes and submodules.

  • QuicGraphicalLasso [doc]

    QuicGraphicalLasso is an implementation of QUIC wrapped as a scikit-learn compatible estimator [Hsieh et al.] . The estimator can be run in default mode for a fixed penalty or in path mode to explore a sequence of penalties efficiently. The penalty lam can be a scalar or matrix.

    The primary outputs of interest are: covariance_, precision_, and lam_.

    The interface largely mirrors the built-in GraphLasso although some param names have been changed (e.g., alpha to lam). Some notable advantages of this implementation over GraphicalLasso are support for a matrix penalization term and speed.

  • QuicGraphicalLassoCV [doc]

    QuicGraphicalLassoCV is an optimized cross-validation model selection implementation similar to scikit-learn's GraphLassoCV. As with QuicGraphicalLasso, this implementation also supports matrix penalization.

  • QuicGraphicalLassoEBIC [doc]

    QuicGraphicalLassoEBIC is provided as a convenience class to use the Extended Bayesian Information Criteria (EBIC) for model selection [Foygel et al.].

  • ModelAverage [doc]

    ModelAverage is an ensemble meta-estimator that computes several fits with a user-specified estimator and averages the support of the resulting precision estimates. The result is a proportion_ matrix indicating the sample probability of a non-zero at each index. This is a similar facility to scikit-learn's RandomizedLasso) but for the graph lasso.

    In each trial, this class will:

    1. Draw bootstrap samples by randomly subsampling X.

    2. Draw a random matrix penalty.

    The random penalty can be chosen in a variety of ways, specified by the penalization parameter. This technique is also known as stability selection or random lasso.

  • AdaptiveGraphicalLasso [doc]

    AdaptiveGraphicalLasso performs a two step estimation procedure:

    1. Obtain an initial sparse estimate.

    2. Derive a new penalization matrix from the original estimate. We currently provide three methods for this: binary, 1/|coeffs|, and 1/|coeffs|^2. The binary method only requires the initial estimate's support (and this can be be used with ModelAverage below).

    This technique works well to refine the non-zero precision values given a reasonable initial support estimate.

  • inverse_covariance.plot_util.trace_plot

    Utility to plot lam_ paths.

  • inverse_covariance.profiling

    The .profiling submodule contains a MonteCarloProfiling() class for evaluating methods over different graphs and metrics. We currently include the following graph types:

      - LatticeGraph
      - ClusterGraph
      - ErdosRenyiGraph (via sklearn)
    

    An example of how to use these tools can be found in examples/profiling_example.py.

Parallelization Support

skggm supports parallel computation through joblib and Apache Spark. Independent trials, cross validation, and other embarrassingly parallel operations can be farmed out to multiple processes, cores, or worker machines. In particular,

  • QuicGraphicalLassoCV
  • ModelAverage
  • profiling.MonteCarloProfile

can make use of this through either the n_jobs or sc (sparkContext) parameters.

Since these are naive implementations, it is not possible to enable parallel work on all three of objects simultaneously when they are being composited together. For example, in this snippet:

model = ModelAverage(
    estimator=QuicGraphicalLassoCV(
        cv=2,
        n_refinements=6,
    )
    penalization=penalization,
    lam=lam,
    sc=spark.sparkContext,
)
model.fit(X)

only one of ModelAverage or QuicGraphicalLassoCV can make use of the spark context. The problem size and number of trials will determine the resolution that gives the fastest performance.

Installation

Both python2.7 and python3.6.x are supported. We use the black autoformatter to format our code. If contributing, please run this formatter checks will fail.

Clone this repo and run

python setup.py install

or via PyPI

pip install skggm

or from a cloned repo

cd inverse_covariance/pyquic
make
make python3  (for python3)

The package requires that numpy, scipy, and cython are installed independently into your environment first.

If you would like to fork the pyquic bindings directly, use the Makefile provided in inverse_covariance/pyquic.

This package requires the lapack libraries to by installed on your system. A configuration example with these dependencies for Ubuntu and Anaconda 2 can be found here.

Tests

To run the tests, execute the following lines.

python -m pytest inverse_covariance (python3 -m pytest inverse_covariance)
black --check inverse_covariance
black --check examples

Examples

Usage

In examples/estimator_suite.py we reproduce the plot_sparse_cov example from the scikit-learn documentation for each method provided (however, the variations chosen are not exhaustive).

An example run for n_examples=100 and n_features=20 yielded the following results.

(n_examples, n_features) = (100, 20)

(n_examples, n_features) = (100, 20)

(n_examples, n_features) = (100, 20)

For slightly higher dimensions of n_examples=600 and n_features=120 we obtained:

(n_examples, n_features) = (600, 120)

Plotting the regularization path

We've provided a utility function inverse_covariance.plot_util.trace_plot that can be used to display the coefficients as a function of lam_. This can be used with any estimator that returns a path. The example in examples/trace_plot_example.py yields:

Trace plot

Citation

If you use skggm or reference our blog post in a presentation or publication, we would appreciate citations of our package.

Jason Laska, Manjari Narayan, 2017. skggm 0.2.7: A scikit-learn compatible package for Gaussian and related Graphical Models. doi:10.5281/zenodo.830033

Here is the corresponding Bibtex entry

@misc{laska_narayan_2017_830033,
  author       = {Jason Laska and
                  Manjari Narayan},
  title        = {{skggm 0.2.7: A scikit-learn compatible package for
                   Gaussian and related Graphical Models}},
  month        = jul,
  year         = 2017,
  doi          = {10.5281/zenodo.830033},
  url          = {https://doi.org/10.5281/zenodo.830033}
}

References

BIC / EBIC Model Selection

QuicGraphicalLasso / QuicGraphicalLassoCV

Adaptive refitting (two-step methods)

Randomized model averaging

Convergence test

Repeated KFold cross-validation

skggm's People

Contributors

calvinmccarter-at-tempus avatar jasonlaska avatar joshuaybang avatar matzhaugen avatar mnarayan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skggm's Issues

Changes to random matrix penalties

Three different categories of random matrix penalties

  1. "subsampling" case: Send in scalar lam value, simply subsample or bootstrap, refit estimator, aggregate model selection.
  2. "random" case: Send in scalar lam value as input
  • For all non-diagonal and unique (i,j) entries
  • Weight matrix W. W_{ij} \in (c, 1/c) with Bernoulli(.5), where c = 0.5 by default
  • The effective random penalty matrix is lam .* W
  • Diagonals unpenalized
  1. "fully-random" case: Accepts scalar or matrix lam value, agnostic to any specified lambda.
    • For all non-diagonal and unique (i,j) entries
    • Weight matrix W_{ij} \sim N(.25*max_lambda, .1*max_lambda).
    • Ensures that we don't have entries in W lesser than 0. Ensure W is symmetric or we get complex precision matrices as estimates.

Good things to check

  • Ensure that W will not destroy positive definiteness of Theta. To check this first create
    • Create W as mentioned in 2/3
    • If W is psd, W._Theta or W._Sigma is also psd.
    • If necessary, play around with
      New_W = inv(sqrt(diag(D))) * W * inv(sqrt(diag(D))) to make New_W diagonally dominant using D = sum of abs off-diagonal entries.
  • Ensure that W does not destroy realness of Theta

References

[1]: "Stability Selection" N. Meinhausen and P. Buehlmann, JRSSB, 2010
[2]: "Random lasso" by .... & Ji Zhu, AOAS
[3]: "Mixed effects models for resampled network statistics improves statistical power to find differences in multi-subject functional connectivity" M. Narayan and G. Allen, 2016

Related to implementation of #29

Weighted Penalties

Overview

Many provably optimal sparse inverse covariance estimators are multi-stage
a) Get an initial estimate of inverse covariance
b) Use initial estimates to build an adaptive estimator of the inverse covariance.

Constraining Models

Sometimes, a) involves getting the zero/non-zero pattern using regression or CLIME estimators which have better exact recovery properties and then using these in b) as constraints in the second stage graphical lasso.

  • Initial estimate a) from sparse regression for each node.
  • Initial estimate a) from DtraceLasso
  • Initial estimate a) from GraphLasso

Sparse non-convex penalties

Non-convex penalties that encourage sparsity (SCAD, MCPlus) are often implemented using locally linear approximations. This involves obtaining an initial estimate of the inverse covariance in step a) and then applying taking advantage of a weighted graphical lasso in step b) where the weights are obtained using the non-convex penalties. There is an additional step c) that involves repeating a-b until convergence.

  • Adaptive Graphical Lasso using two-step LLA approximations [1][2]. Very useful class of penalties. Initial estimates could come from Graphical Lasso itself or Neighborhood Selection (GELATO).
  • Hierarchical Group-Sparse Penalty of Guo can be implemented using a weighted graph-lasso. Very popular in enforcing shared zeros across multiple graphs.

Ensemble/Model Averaging/Random Glasso variants

Weights can also be used to implement ensemble model averaging of graphical lasso estimates.

  • Weights determined by Random Penalization [3][4]
  • Random Adaptive Penalization variants [4]

Thus a new two-stage graph lasso function that acts as a wrapper for the choice of weights and potentially goes back and forth over a-b would be extremely useful.

Initial examples for choice of weights

[1]: Folded Concave Penalties
[2]: GELATO from Zhou-Buhlmann
[3]: Stability Selection
[4]: Random Adaptive Penalization

TypeError: expected bytes, str found

Hi,
I was trying to run the convergence_comparison.py in the examples. I run across this issue.

Traceback (most recent call last):
File "convergence_comparison.py", line 28, in
vals = quic(Shat, .004, mode='default', tol=1e-6, max_iter=1000, Theta0=np.eye(Shat.shape[0]), Sigma0=np.eye(Shat.shape[0]), path=None, msg=1)
File "/home/tianpei/anaconda3/lib/python3.5/site-packages/skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/quic_graph_lasso.py", line 121, in quic
Theta, Sigma, opt, cputime, iters, dGap)
File "inverse_covariance/pyquic/pyquic.pyx", line 12, in pyquic.pyquic.quic (inverse_covariance/pyquic/pyquic.cpp:1644)
TypeError: expected bytes, str found

what is wrong with it. I was using python 3.5. It seems to be a problem of python 3

Dtrace Estimators

This is uses the D-trace Loss (which is akin to a least squares type loss) instead of the log-likelihood loss to estimate sparse inverse covariance.

It most closely resembles the graphical lasso (produces a positive definite, symmetric, and sparse inverse covariance estimate) but has good & different incoherence/irrepresentability conditions for exact recovery (possibly but not yet provably weaker).

Probably not a priority for the next scikitquic milestone, but personally very useful/critical to me, as I need a variant of this for work. Also no good implementations exist yet in python for this.

Model Selection via BIC and EBIC

Given sparse regularized inverse covariance estimates over a grid of regularization parameters, a popular criteria to choose the optimal penalty and corresponding estimate is to apply the Bayesian (or Swartz) Information Criterion or the BIC and the Extended BIC (EBIC) in high dimensional regimes.

The BIC criterion is defined as
BIC(lam) = -2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam))

The EBIC criterion is defined as
EBIC(lam) = - 2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam)) + 4 * (# of non-zeros in Theta(lam)) * (log p) * gam

Here

  • n is n_samples and p is n_features
  • Sigma is the sample covariance/correlation in self.covariance_ and Theta(lam) comes from self.precision_
  • gam is an additional parameter for EBIC that takes values in [0,1]. I recommend setting gam = .1. Setting gam=0 gives back traditional BIC.
  • lam is an element in the grad of lambda penalty parameters.

The goal is to implement model selection using the above criteria as an alternative to cross-validation.

References:
BIC in sparse inverse covariance estimation
EBIC in sparse inverse covariance estimation

Spark Example: Model Average

Building a spark-compatible version of this estimator seems like it would be the most immediately valuable (and easy to do naively).

Compatibility with sklearn model_selection module

Older cross validation module has been deprecated

DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection 
module into which all the refactored classes and functions are moved. Also note that the interface of
the new CV iterators are different from that of this module. This module will be removed in 0.20.

It might make sense to make all our model selection variants (EBIC, CV, and more to come) compatible with new interface.

QUIC alternative for adaptive graphical lasso (esp. with weights enforcing zero-edges)

Optimization algorithm (alternative to QUIC) that significantly speeds up any adaptive variant of graphical lasso where some entries of precision matrix $\Theta$ are enforced to be zero. It takes these zero contraints and converts them into constraints on sets of pathways that make solving the graphical lasso criterion much faster.

Open question: Can all zero constraints be transformed into a pathway formulation? Need to work this out.

A cython/python implementation available here
https://github.com/maximsch2/pglasso

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497513/
Path coding penalties: http://lear.inrialpes.fr/people/mairal/resources/pdf/path-flow.pdf

Examples to highlight metrics, model-selection and convergence

What we now have compared to sklearn.covariance

  1. Additional metrics (quadratic loss, kl-loss) (add model selection loss tp+fp) ?
  2. Additional methods for model selection/tuning parameters (ebic/bic and others coming)
  3. Better convergence guarantees (no figures here, just examples that break GraphLasso)
  4. What about timing benefits? (quic is supposed to be faster, big-quic definitely so)
  5. Would love some examples that take advantage of the meta-estimators. Even we if we reproduce something from the papers (if possible).
  • Example 1: Comparison of model selection procedures under different metrics
  • Example 2: Comparison of timing benefits for a range of (n,p) sizes
  • Example 3: Compare number of iterations for convergence + convergence failure in ill-conditioned regimes

Create InverseCovariance Subclass of EmpiricalCovariance

The QUIC algorithm estimates the inverse covariance using the sample covariance estimate $S$ as an input
$ That = max_{\Theta} logdet(Theta) - Trace(Theta_S) - \lambda_| \Theta |_1 $

As a result it makes sense for the sample covariance for QUIC to be computed using the methods inherited from the EmpiricalCovariance class.

Additionally, the log-likelihood and error metrics for the inverse covariance will differ from that of the covariance matrix. Thus corresponding methods in EmpiricalCovariance will need to be overridden where relevant. We need the ones below

  • Negative Log likelihood for That is: Trace(That*S) - logdet(That).
  • KL-Loss for (T, That): Trace(That^{-1}T) - log (That^{-1}T) - p
  • Quadratic Loss: Trace((That*Sigma - I)^2)
  • Frobenius loss (remains the same, but computed in the precision rather than covariance)

Memory issue with trace_plot

Currently inverse_covariance.plot_util.trace_plot plots the regularization path of all edges. This does not offer as much insight in larger matrices and also consumes too much memory. It would be more useful to

  • By default: Only plot the top k edge coefficients where k might max out at 25 or 50,
  • Let user input edge indices,
  • Let user specify ground truth precision matrix. Then the top 10 true positive and top 10, (strongest) false positives can be plotted.

Documentation for inverse_covariance.py

Our inverse covariance class differs from sklearn in a few good but important ways.
We will have to make the api clear for others to use.

Possible sources of confusion:

  • What is self.covariance versus self.sample_covariance_?
    We distinguish covariance estimates from the glasso algorithm (self.covariance) versus sample covariance which is an input
  • What is lam_scale? How to use it.
  • Make examples compatible with running on binder http://mybinder.org/

Should add more to this list after comparing to sklearn.covariance

Fix examples/plot_functional_brain_networks.py

I've updated all the skggm interface elements, I get this error when I try to run it:

Traceback (most recent call last):
  File "plot_functional_brain_networks.py", line 85, in <module>
    node_size=20)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nilearn-0.2.4-py2.7.egg/nilearn/plotting/img_plotting.py", line 1152, in plot_connectome
    colorbar=colorbar)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nilearn-0.2.4-py2.7.egg/nilearn/plotting/displays.py", line 1239, in add_graph
    .format(adjacency_matrix_shape, node_coords_shape))
ValueError: Shape mismatch between 'adjacency_matrix' and 'node_coords''adjacency_matrix' shape is (196, 196), 'node_coords' shape is (264, 3)

we also need the ABIDE example to be updated and make sure it can be run.

Rank based alternatives for sample covariance/correlation matrix

Description

Enables implementation of simple nonparametric variant of graphical lasso or any other future estimators for the precision matrix.

Overview of nonparametric alternatives

The class of nonparanormal (monotone transformations + gaussian graphical models) and transelliptical (monotone transformations + elliptical graphical models) are covered by the use of

  • spearman
  • kendall's tau
  • winsorized or trim-mean correlation estimates
  • symmetric rank covariances includes Hoeffding's D, Bergma-Dassios sign covariance
  • k-root of sample covariance substitutes sample covariance or correlation with k-root. Use k=2 to mirror benefits of the sqrt-lasso. Cannot support negative eigenvalues.

Implementation

Task involves creating alternative functions for the rank-based sample correlation/covariance and offering this an alternative to current empirical covariance or correlation options.

Key Steps:

  1. Take rows of observations and transform into ranks
  2. Unbiased spearman's rankk correlation (eq. 8 and eq. 9) for all pairs of features

screen shot 2016-12-11 at 12 29 27 pm

The Kendall's tau concordance variant is another alternative, that has better nicer variations for handling ties, weighting higher ranks differently from lower ranks. However, scipy.stats.kendalltau does not support 2D arrays so it is likely to be slow for large dimensions.

Furthermore, the bias correction for correlation via kendalltau amounts using sin(pi / 2 * kendalltau)

Note 1: These rank correlation estimates can violate positive semi-definite requirements (all though the biased spearman and kendall are always P.S.D). Thus, not all graphical model estimators will be compatible with these rank correlation estimates. **However regularized Dantzig-CLIME, CONCORD, and some pseudolikelihood type estimators should be able to handle rank_correlation matrices with negative eigenvalues. **

Note 2: Take advantage of RobustScaler and RankScaler if possible, as well as other BaseEstimators from sklearn.

References

Liu, Han, John Lafferty, and Larry Wasserman.
"The nonparanormal: Semiparametric estimation of high dimensional undirected graphs."
Journal of Machine Learning Research 10.Oct (2009): 2295-2328.

Wilcox, R. R. (1993), Some results on a Winsorized correlation coefficient. British Journal of Mathematical and Statistical Psychology, 46: 339–349.
doi:10.1111/j.2044-8317.1993.tb01020.x

Xue, Lingzhou; Zou, Hui.
Regularized rank-based estimation of high-dimensional nonparanormal graphical models.
Ann. Statist. 40 (2012), no. 5, 2541--2571. doi:10.1214/12-AOS1041.
http://projecteuclid.org/download/pdfview_1/euclid.aos/1359987530

Liu, Han; Han, Fang; Yuan, Ming; Lafferty, John; Wasserman, Larry.
High-dimensional semiparametric Gaussian copula graphical models.
Ann. Statist. 40 (2012), no. 4, 2293--2326. doi:10.1214/12-AOS1037 https://projecteuclid.org/euclid.aos/1358951383

Rina Foygel Barber, Mladen Kolar
"ROCKET: Robust Confidence Intervals via Kendall's Tau for Transelliptical Graphical Models"
https://arxiv.org/abs/1502.07641

Vahe Avagyan, Andrés M. Alonso & Francisco J. Nogales (2017)
"Improving the Graphical Lasso Estimation for the Precision Matrix Through Roots of the Sample Covariance Matrix", Journal of Computational and Graphical Statistics, 26:4, 865-872, DOI: 10.1080/10618600.2017.1340890

Two Stage or Adaptive Estimator Class

There are several methods to estimate the inverse covariance matrix (MLE, Pseudolikelihood, Dtrace, CLIME, ...). However, the MLE produces positive semi-definite/symmetric estimate with the standard minimum variance benefits (potentially? asymptotically efficient) and thus appropriate as a final estimator.

Therefore we want an adaptive estimator class that

  • Takes an initial estimate as an input
  • Takes a few different strategies to create adaptive weights and refits the MLE with a weighted penalty matrix.
    • These could take the form of 1/|coefficient| or 1/|coefficient|^2 (classic adaptive glasso)
    • Or these could be incorporated into a non-convex SCAD or MCP penalty.

See adaptivity/two-stage example in the GELATO (ftp://ess.r-project.org/Manuscripts/buhlmann/gelato.pdf) estimator (Section 3.3). Here the weights are very naive and basically just binary.

Ensemble/Model Averaging

Ensemble/Model Averaging/Random Glasso variants

Weights can also be used to implement ensemble model averaging of graphical lasso estimates.

Pseudo-algorithm:
Subsample rows/observations in datamatrix X.
For each subsampled datamatrix X*

  1. Create a random penalty matrix according to which every scheme below (either adaptive or non-adaptive) W
  2. Call graphlasso using X* and lam = W to give Theta*
  3. Repeat B=100 times
  4. Aggregate a confidence/stability score by checking whether entries of each Theta* is zero or non-zero. This creates a single proportion matrix Pi with entries that vary from 0 to B. This is the model averaged estimate.

How to determine random weights.

Weights determined by Random Penalization [1]
Random Adaptive Penalization variants [2]

Initial examples for choice of weights
[1]: Stability Selection (https://arxiv.org/abs/0809.2932)
[2]: Random Adaptive Penalization (http://biorxiv.org/content/early/2016/03/14/027516)

Create more examples using nilearn

Since our estimators are sklearn compatible, we can enable those who use nilearn for graphical model estimation to use skggm estimators as well.

This would involve using ConnectivityMeasure from nilearn as a base estimator class (which is designed for dealing with multiple graphical models) and then creating an skggm relevant implementation under the hood. Would be useful for application examples, where a similar procedure would need to be applied in any case.

EBIC Issues

Using my new network simulations seems to through EBIC totally crazy.

metric='log_likelihood';
n_features=15;
covariance, precision, adjacency = new_graph(n_features,.05,adj_type='banded',random_sign=False,seed=1)    
prng = np.random.RandomState(2)
X = mvn(75*n_features,n_features,covariance,random_state=prng)
gamma = 0 # gamma = 0 (BIC), gamma=1 (Greater Sparsity)
ebic_model = QuicGraphLassoEBIC(
    lam=1.0,
    init_method='corrcoef',
    gamma = gamma)
ebic_model.fit(X)
ebic_precision_ = ebic_model.precision_
print 'QuicGraphLassoEBIC with:'
print '   len(path lams): {}'.format(len(ebic_model.path_))
print '   lam_scale_: {}'.format(ebic_model.lam_scale_)
print '   lam_: {}'.format(ebic_model.lam_)

E.g. for alpha = .05, banded type. Super easy scenario fails.

QuicGraphLassoCV with:
   metric: log_likelihood
   len(cv_lams): 25
   lam_scale_: 1.0
   lam_: 0.0023908922265
QuicGraphLassoEBIC with:
   len(path lams): 100
   lam_scale_: 1.0
   lam_: 0.0231012970008

Firstly the results have too many random edges. But it is also likely that lambda values are not going low enough, so it might be an issue of choosing grid of parameters.

seg fault in statistical power?

When fitting the StatisticalPower simulation object using
model_selection_estimator = QUICGraphLassoEBIC()
the kernel dying in ipython notebook due to some unknown error. This will need further investigation.

Move functions in simulation_example.py to data generation class

The simplest data generation class needs the following functionality:

  • Adjacency generation according to erdos-renyi, or banded network structure types
  • Converting an adjacency matrix into a proper precision and covariance matrix
  • Simulating from a multivariate probability distribution (initially normal, but later one maybe t-variate and others as well).

https://github.com/skggm/skggm/blob/feature/add-simulation-examples/examples/simulate_networks.py

quic versus graph_lasso example

One of the benefits of the quic optimization solvers is better convergence guarantees.

There are many known instances where graph_lasso fails to converge. An analysis of this problem was provided in this paper

This could make for a good example, with convergence comparisons after 100 iterations. @jasonlaska

from sklearn.covariance import graph_lasso
from inverse_covariance import InverseCovariance, quic
import numpy as np

# Example 1 from Mazumder & Hastie "Glasso, New Insights". graph_lasso fails to converge at lam=.009*np.max(np.abs(Shat))
X = np.loadtxt('examples/Mazumder_example1',delimiter=",")
Shat = np.cov(X,rowvar=0)
graph_lasso(Shat,alpha=.004)
[Theta0 Sigma0] = quic(Shat,.004)


# Example 2 from Mazumder & Hastie "Glasso, New Insights". graph_lasso fails to converge at lam=.009*np.max(np.abs(Shat))
X = np.loadtxt('examples/Mazumder_example2',delimiter=",")
Shat = np.cov(X,rowvar=0)
graph_lasso(Shat,alpha=.02)
[Theta0 Sigma0] = quic(Shat,.02)

Mazumder_example1.txt
Mazumder_example2.txt

Pseudolikelihood estimators

sklearn.linear_model has all the ingredients to implement the "pseudolikelihood" regression version of inverse covariance estimation that doesn't have the wrapper block coordinate descent. It is literally p uncoupled high dimensional regressions with some extra steps to symmetry and positive definiteness using cholesky decompositions.

It would be easy for us to implement a pseudolikelihood estimator that does this properly using sklearn.linear_model functions as the internal. This can serve as a comparison to loglikelihood type estimators that QUIC-dirty gives us for mixed norm penalties

To be fleshed out more

References:

QuicGraphLassoCV bug

When input parameter to the function is a tuple such as cv=(3,3) function fails.

Line541 is the culprit. Tries to reference a local variablecv that is not yet defined instead of self.cv

Make InverseCovarianceCV

Follow a similar pattern to GraphLassoCV where the CV is integrated with the fact that the algorithm spits out a path of lambdas; when using path mode, should be more efficient than naive CV example.

Change control parameters for Montecarlo Simulations

Pick more intelligent combinations of sample sizes, sparsity and dimensionality.
For example, vary the following fraction by varying 1 parameter and holding others fixed

Error metric versus n/[10*degree* log(p)]

Fixing some error metrics in profiling and network simulations

  • Frobenious norm sometimes counts diagonal info (like in Average Power). Make sure this is changed across the package.
error_fro = np.linalg.norm(np.triu(adj - new_estimator.precision_,1), ord='fro')
  • Numerical issues in model selection error with banded networks.
    Why do these give inconsistent numbers, particularly exact_support versus count_support_diff
print 'Exact Recovery (T/F): {}, (TPR,FPR): {}, Count Diff: {}'.format(
    exact_support(adjacency,prec_hat),
    approx_support(adjacency,prec_hat,prob=0.1)[1:2],
    count_support_diff(adjacency,prec_hat),
)
  • Issues in [approx_support().] To be fixed by re-using other functions from AveragePower

Fix binary case of AdaptiveGraphLasso

If I use estimator=QuicGraphLasso(lam=0.2) as the initial estimator and I chose the method=binary option, I expect the default behavior for the second stage estimate to not use any regularization on the non-zero support. But this does not seem to be possible.

One option is to change following line to self.estimator_ = QuicGraphLasso(lam=self.lam_ ,

self.estimator_ = QuicGraphLasso(lam=self.lam_ * self.estimator.lam_,

Alternatively, we could add a flag to decide whether regularization parameter from initial estimator is carried into second stage.

Docstring cleanup

Carefully comb through docstrings, make sure they are complete and correct; look for any copy pasta.

Create simple example

To get started

  • Use a sample dataset from the nilearn project.
  • Plot simple visualizations of the estimated precision matrices
  • Future second example could be HCP dataset inspired power simulation.

References:
An example from nilearn

Add a statistical power method to the class

This function will take in a dataset with some ground truth inverse covariance matrix and plot probability of perfect model selection as a function of n/p.

Add an example to draw comparisons between say the naive and adaptive methods.

Repeated k-fold cross validation

Currently cross-validation either

  • computes a metric over k-folds and reports the result of a single cross validation
  • creates multiple training/test splits (multiple sample splits) and averages over these. (But each split might not be exactly independent of other split)

In fact, it is preferable to have repeated k-fold cross-validation to reduce variance in the final error metric.

For us, it is desirable to do only 2-fold or 3-folds for inverse covariance estimation. Repeating 2-fold or 3-fold many times and averaging over all scores or taking a mean of intersections will reduce its variance.

Create a new CV generator to implement repeated k-fold cross validation.
This involves calling https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L266 multiple times with the shuffle option set to true

Fork/copy pyquic bindings into this project

Yea, after reviewing the init.py and EmpiricalCovariance classes, I think it makes sense to make the cython bindings part of this project and simplify/transfer the py_quic/init.py/quic() routine into the main file. Also, rename estimatore to InverseCovariance

Check for logdet and numpy issues in python3

In applying AdaptiveGraphLasso to some new data, I see a lot of these warnings

/share/sw/free/anaconda/4.2.0/python3.5/lib/python3.5/site-packages/
numpy/linalg/linalg.py:1757:
RuntimeWarning: invalid value encountered in slogdet
  sign, logdet = _umath_linalg.slogdet(a, signature=signature)
/path/to/home/.local/lib/python3.5/site-packages/skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/metrics.py:25: 
RuntimeWarning: invalid value encountered in multiply
  fast_logdet(precision) - dim * np.log(2 * np.pi)
/path/to/home/.local/lib/python3.5/site-packages/
skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/metrics.py:116: 
RuntimeWarning: invalid value encountered in greater
  mask = np.abs(precision.flat) > np.finfo(precision.dtype).eps
/path/to/home/.local/lib/python3.5/site-packages/
skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/metrics.py:113:
RuntimeWarning: invalid value encountered in multiply
  l_theta = -np.sum(covariance * precision) + fast_logdet(precision)
/path/to/home/.local/lib/python3.5/site-packages/
skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/metrics.py:114: 
RuntimeWarning: overflow encountered in double_scalars
  l_theta *= n_features / 2.
/path/to/home/.local/lib/python3.5/site-packages/
skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/inverse_covariance.py:331: 
RuntimeWarning: invalid value encountered in less
  min_indices = np.where(np.abs(ebic_scores - ebic_scores.min()) < 1e-10)
  File "/path/to/home/.local/lib/python3.5/site-packages/
skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/quic_graph_lasso.py", line 888, in fit
    best_lam_idx = self.ebic_select(gamma=self.gamma)
  File "/path/to/home/.local/lib/python3.5/site-packages/
skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/inverse_covariance.py", line 332, in ebic_select
    return np.max(min_indices)
  File "/share/sw/free/anaconda/4.2.0/python3.5/lib/python3.5/
site-packages/numpy/core/fromnumeric.py", line 2252, in amax
    out=out, **kwargs)
  File "/share/sw/free/anaconda/4.2.0/python3.5/lib/python3.5/
site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

Encountered this when using latest develop branch, e16ff8c4d10
and calling skggm with python3.5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.