riken-aip / pyhsiclasso Goto Github PK

View Code? Open in Web Editor NEW

166.0 166.0 41.0 8.18 MB

Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data

License: MIT License

Python 100.00%

blackbox-algorithm feature-extraction feature-selection machine-learning-algorithms nonlinear python

pyhsiclasso's People

Contributors

Stargazers

Watchers

pyhsiclasso's Issues

A bug occurs when there are few explanatory variables

There are few explanatory variables, so bug occurs. Please fix it!

Block Lasso selects less features than vanilla algorithm

When I used block Lasso for 77 features treshold (from 770 features) I got only 57 features. Block was divisor of number of data instances. However, when I used block as zero, I got exactly 77 features. Is it normal when block Lasso returns less features? This happened, when I used permutation parameter M with value one.

The other difference is, that when I use vanilla Lasso I get following warning:
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: divide by zero encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])

Block lasso had no warnings.

Then I tried block Lasso with M=2. I got 77 features, but also following warnings:
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: invalid value encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:83: RuntimeWarning: invalid value encountered in less_equal gamma[gamma <= 1e-9] = np.inf
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:85: RuntimeWarning: invalid value encountered in less mu = min(gamma)
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: divide by zero encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])

At last, I tried M=3, also got 77 features and the same warning as with vanilla Lasso.

I have two questions. Should I use M=1 with no warning and less features or M=3 with the same warning as vanilla Lasso had? Are these warnings of some importance, or they are within normal expected behavior?

UPDATE
Now I tried to get 9200 features from 92000 with block Lasso with B=19, M=3 but I got even less features than before - only 33. Should I scale M with number of features?

Is there a way to extract the predicted value of the trained HSIC Lasso (Regression)?

After HSIC Lasso (Regression) has finished executing, we will have the beta values for every feature in the training dataset. Therefore, is there a way to determine the predicted value of a given instance? I am trying to evaluate the model fit via mean squared error, as done in the original paper (High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso, Section 4.3.2)

Multi-variate output support

Currently, the HSIC Lasso can only handle uni-variate output. Thus, extending the HSIC Lasso to multi-variate output.

Clarification on the difference between an input vs. output kernel

pyHSICLasso/pyHSICLasso/hsic_lasso.py

Line 23 in 0617219

y_kernel We employ the Gaussian kernel for inputs. For output kernels,

Hi, I'm wondering if some clarification could be provided on this difference.

In addition, is it necessary that the y_kernel and x_kernel are the same? My intuition is that they should be. But what I can see from the code, that is not enforced. What is the rationale that the y and X could be projected to a different space?

The number of selected features is less than specified

Accuracy decreased.

dataset: https://www.kaggle.com/artyomsalnikov/dataset-3
code: https://yadi.sk/d/xAsaL-TPGZe09A

Number of selected features

Hello, I just tried this tool on a Metabolomics data I have. Interestingly, HSIC Lasso selects just 76 metabolites out of 2035 available metabolites. And the R-squared score if I use these selected metabolites is just 0.18. In comparison to Lasso on the original 2035 metabolites which obtains an R-squared of about 0.60. My assumption is probably the amount of selected features are too small. I used SVR (kernel='ref') from sklearn after feature selection with HSIC.
Is there a way to increase the number of features HSIC Lasso selects ?

Support parallel processing

Update V1.1.0

Pull request #9
#9

As a predictor?

Hey That's awesome and I'm trying to use it in my thesis , but may I ask how to use it as a classifier ? I have looked the whole code but how to fit to different subsets and get overall precision score?

Update Version 1.0.2

Update Version 1.0.1

input

What does Y represent when numpy is the input? How do I use it? I'm a little confused

ImportError: cannot import name 'PackageFinder' from 'pip._internal.index'

Hi,

When trying to install the package in Anaconda, through pip or directly from setup.py, it throws the following error:

ImportError: cannot import name 'PackageFinder' from 'pip._internal.index'

Modeling combinatorial effects of features?

Hi,

I've been extending this HSIC-LASSO implementation to use specific types of distance-based kernels for microbiome data. I'd like to verify if my understanding of the implementation and purpose of the "block" HSIC implementation is correct. First off, my understanding of the "block" part of the HSIC-LASSO is an optimization to speed up kernel computation time, correct? Second, in this code here, I have noticed that the "block" HSIC LASSO kernel computation constructs essentially "mini" (subsets of) kernels based on subsets of samples for single features (over the range of d features). If I am reading this is correctly, then this means that the kernel computation is constructed for a single dimension only, which misses modeling the combinatorial effects of multiple features. Of course, this function is ideal when combinatorial effects are present in the data. Perhaps I am missing something or not looking at the full picture. Could someone please elaborate on this? Thank you!

riken-aip / pyhsiclasso Goto Github PK

pyhsiclasso's People

Contributors

Stargazers

Watchers

Forkers

pyhsiclasso's Issues

Recommend Projects

Recommend Topics

Recommend Org