Giter VIP home page Giter VIP logo

sl3's Introduction

R/sl3: modern Super Learning with pipelines

Travis-CI Build Status Appveyor Build Status Coverage Status Project Status: Active – The project has reached a stable, usable state and is being actively developed. License: GPL v3 DOI

A modern implementation of the Super Learner ensemble learning algorithm

Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Oleg Sofrygin


What’s sl3?

sl3 is a modern implementation of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007). The Super Learner algorithm performs ensemble learning in one of two fashions:

  1. The discrete Super Learner can be used to select the best prediction algorithm from among a supplied library of machine learning algorithms (β€œlearners” in the sl3 nomenclature) – that is, the discrete Super Learner is the single learning algorithm that minimizes the cross-validated risk with respect to an appropriate loss function.
  2. The ensemble Super Learner can be used to assign weights to a set of specified learning algorithms (from a user-supplied library of such algorithms) so as to create a combination of these learners that minimizes the cross-validated risk with respect to an appropriate loss function. This notion of weighted combinations has also been referred to as stacked regression (Breiman 1996) and stacked generalization (Wolpert 1992).

Installation

Install the most recent version from the master branch on GitHub via remotes:

remotes::install_github("tlverse/sl3")

Past stable releases may be located via the releases page on GitHub and may be installed by including the appropriate major version tag. For example,

remotes::install_github("tlverse/[email protected]")

To contribute, check out the devel branch and consider submitting a pull request.


Issues

If you encounter any bugs or have any specific feature requests, please file an issue.


Examples

sl3 makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3 package in action:

set.seed(49753)
library(tidyverse)
library(data.table)
library(SuperLearner)
library(origami)
library(sl3)

# load example data set
data(cpp)
cpp <- cpp %>%
  dplyr::filter(!is.na(haz)) %>%
  mutate_all(~ replace(., is.na(.), 0))

# use covariates of intest and the outcome to build a task object
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
            "sexn")
task <- sl3_Task$new(cpp, covariates = covars, outcome = "haz")

# set up screeners and learners via built-in functions and pipelines
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)
SL.glmnet_learner <- Lrnr_pkg_SuperLearner$new(SL_wrapper = "SL.glmnet")

# stack learners into a model (including screeners and pipelines)
learner_stack <- Stack$new(SL.glmnet_learner, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
preds <- stack_fit$predict()
head(preds)
#>    Lrnr_pkg_SuperLearner_SL.glmnet Lrnr_glm_TRUE
#> 1:                      0.35618966    0.36298498
#> 2:                      0.35618966    0.36298498
#> 3:                      0.24964615    0.25993072
#> 4:                      0.24964615    0.25993072
#> 5:                      0.24964615    0.25993072
#> 6:                      0.03776486    0.05680264
#>    Pipeline(Lrnr_pkg_SuperLearner_screener_screen.glmnet->Lrnr_glm_TRUE)
#> 1:                                                            0.36228209
#> 2:                                                            0.36228209
#> 3:                                                            0.25870995
#> 4:                                                            0.25870995
#> 5:                                                            0.25870995
#> 6:                                                            0.05600958

Learner Properties

Properties supported by sl3 learners are presented in the following table:

binomial

categorical

continuous

cv

density

ids

multivariate_outcome

offset

preprocessing

sampling

timeseries

weights

wrapper

Lrnr_arima

x

x

√

x

x

x

x

x

x

x

√

x

x

Lrnr_bartMachine

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_bilstm

x

x

√

x

x

x

x

x

x

x

√

x

x

Lrnr_bound

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_caret

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_condensier

x

x

√

x

√

x

x

x

x

x

x

√

x

Lrnr_cv

x

x

x

√

x

x

x

x

x

x

x

x

√

Lrnr_cv_selector

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_dbarts

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_define_interactions

x

x

x

x

x

x

x

x

√

x

x

x

x

Lrnr_density_discretize

x

x

x

x

√

x

x

x

x

x

x

x

x

Lrnr_density_hse

x

x

x

x

√

x

x

x

x

x

x

x

x

Lrnr_density_semiparametric

x

x

x

x

√

x

x

x

x

√

x

x

x

Lrnr_earth

√

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_expSmooth

x

x

√

x

x

x

x

x

x

x

√

x

x

Lrnr_gam

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_gbm

√

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_glm

√

x

√

x

x

x

x

√

x

x

x

√

x

Lrnr_glm_fast

√

x

√

x

x

x

x

√

x

x

x

√

x

Lrnr_glmnet

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_grf

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_h2o_glm

√

√

√

x

x

x

x

√

x

x

x

√

x

Lrnr_h2o_grid

√

√

√

x

x

x

x

√

x

x

x

√

x

Lrnr_hal9001

√

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_haldensify

x

x

x

x

√

x

x

x

x

x

x

x

x

Lrnr_HarmonicReg

x

x

√

x

x

x

x

x

x

x

√

x

x

Lrnr_independent_binomial

x

√

x

x

x

x

x

x

x

x

x

x

x

Lrnr_lstm

x

x

√

x

x

x

x

x

x

x

√

x

x

Lrnr_mean

√

√

√

x

x

x

x

√

x

x

x

√

x

Lrnr_multivariate

x

√

x

x

x

x

x

x

x

x

x

x

x

Lrnr_nnls

x

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_optim

√

√

√

x

x

x

x

√

x

x

x

√

x

Lrnr_pca

x

x

x

x

x

x

x

x

√

x

x

x

x

Lrnr_pkg_SuperLearner

√

x

√

x

x

√

x

x

x

x

x

√

x

Lrnr_pkg_SuperLearner_method

√

x

√

x

x

x

x

x

x

x

x

√

x

Lrnr_pkg_SuperLearner_screener

√

x

√

x

x

√

x

x

x

x

x

√

x

Lrnr_polspline

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_pooled_hazards

x

√

x

x

x

x

x

x

x

x

x

x

x

Lrnr_randomForest

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_ranger

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_revere_task

x

x

x

√

x

x

x

x

x

x

x

x

√

Lrnr_rfcde

x

x

x

x

√

x

x

x

x

x

x

x

x

Lrnr_rpart

√

√

√

x

x

x

x

x

x

x

x

√

x

Lrnr_rugarch

x

x

√

x

x

x

x

x

x

x

√

x

x

Lrnr_screener_corP

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_screener_corRank

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_screener_randomForest

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_sl

x

x

x

√

x

x

x

x

x

x

x

x

√

Lrnr_solnp

√

√

√

x

x

x

x

√

x

x

x

√

x

Lrnr_solnp_density

x

x

x

x

√

x

x

x

x

x

x

x

x

Lrnr_stratified

√

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_subset_covariates

x

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_svm

√

√

√

x

x

x

x

x

x

x

x

x

x

Lrnr_tsDyn

x

x

√

x

x

x

√

x

x

x

√

x

x

Lrnr_xgboost

√

√

√

x

x

x

x

√

x

x

x

√

x


Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.


Citation

After using the sl3 R package, please cite the following:

 @manual{coyle2020sl3,
      author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
        Sofrygin, Oleg},
      title = {{sl3}: Modern Pipelines for Machine Learning and {Super
        Learning}},
      year = {2020},
      howpublished = {\url{https://github.com/tlverse/sl3}},
      note = {{R} package version 1.3.7},
      url = {https://doi.org/10.5281/zenodo.1342293},
      doi = {10.5281/zenodo.1342293}
    }

License

Β© 2017-2020 Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Oleg Sofrygin

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.


References

Breiman, Leo. 1996. β€œStacked Regressions.” Machine Learning 24 (1). Springer: 49–64.

van der Laan, Mark J., Eric C. Polley, and Alan E. Hubbard. 2007. β€œSuper Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

Wolpert, David H. 1992. β€œStacked Generalization.” Neural Networks 5 (2). Elsevier: 241–59.

sl3's People

Contributors

nhejazi avatar jeremyrcoyle avatar osofr avatar imalenica avatar rachaelvp avatar wilsoncai1992 avatar yulun-rayn avatar ck37 avatar jlstiles avatar katrinleinweber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.