Giter VIP home page Giter VIP logo

pyhsiclasso's People

Contributors

hclimente avatar inktoyou avatar myamada0321 avatar suecharo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyhsiclasso's Issues

Block Lasso selects less features than vanilla algorithm

When I used block Lasso for 77 features treshold (from 770 features) I got only 57 features. Block was divisor of number of data instances. However, when I used block as zero, I got exactly 77 features. Is it normal when block Lasso returns less features? This happened, when I used permutation parameter M with value one.

The other difference is, that when I use vanilla Lasso I get following warning:
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: divide by zero encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])

Block lasso had no warnings.

Then I tried block Lasso with M=2. I got 77 features, but also following warnings:
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: invalid value encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:83: RuntimeWarning: invalid value encountered in less_equal gamma[gamma <= 1e-9] = np.inf
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:85: RuntimeWarning: invalid value encountered in less mu = min(gamma)
C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: divide by zero encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])

At last, I tried M=3, also got 77 features and the same warning as with vanilla Lasso.

I have two questions. Should I use M=1 with no warning and less features or M=3 with the same warning as vanilla Lasso had? Are these warnings of some importance, or they are within normal expected behavior?

UPDATE
Now I tried to get 9200 features from 92000 with block Lasso with B=19, M=3 but I got even less features than before - only 33. Should I scale M with number of features?

Is there a way to extract the predicted value of the trained HSIC Lasso (Regression)?

After HSIC Lasso (Regression) has finished executing, we will have the beta values for every feature in the training dataset. Therefore, is there a way to determine the predicted value of a given instance? I am trying to evaluate the model fit via mean squared error, as done in the original paper (High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso, Section 4.3.2)

Multi-variate output support

Currently, the HSIC Lasso can only handle uni-variate output. Thus, extending the HSIC Lasso to multi-variate output.

Clarification on the difference between an input vs. output kernel

y_kernel We employ the Gaussian kernel for inputs. For output kernels,

Hi, I'm wondering if some clarification could be provided on this difference.

In addition, is it necessary that the y_kernel and x_kernel are the same? My intuition is that they should be. But what I can see from the code, that is not enforced. What is the rationale that the y and X could be projected to a different space?

Number of selected features

Hello, I just tried this tool on a Metabolomics data I have. Interestingly, HSIC Lasso selects just 76 metabolites out of 2035 available metabolites. And the R-squared score if I use these selected metabolites is just 0.18. In comparison to Lasso on the original 2035 metabolites which obtains an R-squared of about 0.60. My assumption is probably the amount of selected features are too small. I used SVR (kernel='ref') from sklearn after feature selection with HSIC.
Is there a way to increase the number of features HSIC Lasso selects ?

As a predictor?

Hey That's awesome and I'm trying to use it in my thesis , but may I ask how to use it as a classifier ? I have looked the whole code but how to fit to different subsets and get overall precision score?

input

What does Y represent when numpy is the input? How do I use it? I'm a little confused

Modeling combinatorial effects of features?

Hi,

I've been extending this HSIC-LASSO implementation to use specific types of distance-based kernels for microbiome data. I'd like to verify if my understanding of the implementation and purpose of the "block" HSIC implementation is correct. First off, my understanding of the "block" part of the HSIC-LASSO is an optimization to speed up kernel computation time, correct? Second, in this code here, I have noticed that the "block" HSIC LASSO kernel computation constructs essentially "mini" (subsets of) kernels based on subsets of samples for single features (over the range of d features). If I am reading this is correctly, then this means that the kernel computation is constructed for a single dimension only, which misses modeling the combinatorial effects of multiple features. Of course, this function is ideal when combinatorial effects are present in the data. Perhaps I am missing something or not looking at the full picture. Could someone please elaborate on this? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.