Giter VIP home page Giter VIP logo

Comments (4)

nbokulich avatar nbokulich commented on August 24, 2024

using such a formula is not supported by the regression methods in scikit-learn — am I missing something @mortonjt ?

Instead, all feature data are used to build the model. Only a single metadata category can be predicted at once. Multilabel prediction is not supported (in a useful way) by scikit-learn.

from q2-sample-classifier.

mortonjt avatar mortonjt commented on August 24, 2024

That's pretty easy to do -- we can enable it by using patsy. This has not been traditionally supported in scikit-learn, but it is supported in statsmodels, and is a standard when analyzing datasets in R.

The multilabel prediction is actually supported in scikit-learn (I know! I was surprised too).
See the input types for lasso regression and random forests regression

This could be a huge improvement in the usability over what is currently offered in scikit-learn, and also seriously open the doors for building complex models.

from q2-sample-classifier.

nbokulich avatar nbokulich commented on August 24, 2024

sorry, I meant multioutput prediction. scikit-learn does support basic multioutput but this is merely training multiple independent regressors, and does not predict the relationship among targets (i.e., metadata categories). There is a little more discussion of this in #15 .

I like the suggestion to use patsy for building regression formulae, but I don't think this is feasible here. The intended features (independent variables) are a feature table, which would most likely consist of many many features, NOT metadata. Metadata categories are the targets (dependent variables). Building a formula with hundreds of features would be arduous. I am familiar with the use of such formulae in R and patsy but when metadata are used as the independent variables and features/observed data are used as dependent variables. This is the reverse of what q2-sample-classifier is meant to perform. @mortonjt could you please provide a little more clarification on how you would image these formulae being used?

from q2-sample-classifier.

mortonjt avatar mortonjt commented on August 24, 2024

Sorry -- let me try to clarify.

I totally agree, you don't want to use regression formulas for the features (i.e. OTUs). That will get disgusting very quickly. But you can also use the regression formulas to model the interactions between the outputs.

Here is an example. Say that we wanted to use lasso regression. We can use something as follows

res = lasso()
res.fit('age * sex + disease', Y=metadata, X=otu_table)

where age, sex and disease are metadata variables. The formula will allow you create a new modified output matrix, that will explicitly test for the interaction effect due to age. It will also allow you to test for multiple categories simultaneously. In this particular case it will test for how well that the following can be predicted

  1. age
  2. sex
  3. age * sex interaction
  4. disease

from q2-sample-classifier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.