Giter VIP home page Giter VIP logo

mlr3learners's Introduction

mlr3learners

Package website: release | dev

r-cmd-check Parameter Check CRAN Status StackOverflow Mattermost

This packages provides essential learners for mlr3, maintained by the mlr-org team. Additional learners can be found in the mlr3extralearners package on GitHub. Request additional learners over there.

👉 Table of all learners

Installation

# CRAN version:
install.packages("mlr3learners")

# Development version:
remotes::install_github("mlr-org/mlr3learners")

If you also want to install all packages of the connected learners, set dependencies = TRUE:

# CRAN version:
install.packages("mlr3learners", dependencies = TRUE)

# Development version:
remotes::install_github("mlr-org/mlr3learners", dependencies = TRUE)

Classification Learners

ID Learner Package
classif.cv_glmnet Penalized Logistic Regression glmnet
classif.glmnet Penalized Logistic Regression glmnet
classif.kknn k-Nearest Neighbors kknn
classif.lda LDA MASS
classif.log_reg Logistic Regression stats
classif.multinom Multinomial log-linear model nnet
classif.naive_bayes Naive Bayes e1071
classif.nnet Single Layer Neural Network nnet
classif.qda QDA MASS
classif.ranger Random Forest ranger
classif.svm SVM e1071
classif.xgboost Gradient Boosting xgboost

Regression Learners

ID Learner Package
regr.cv_glmnet Penalized Linear Regression glmnet
regr.glmnet Penalized Linear Regression glmnet
regr.kknn k-Nearest Neighbors kknn
regr.km Kriging DiceKriging
regr.lm Linear Regression stats
regr.nnet Single Layer Neural Network nnet
regr.ranger Random Forest ranger
regr.svm SVM e1071
regr.xgboost Gradient Boosting xgboost

mlr3learners's People

Contributors

adibender avatar be-marc avatar coorsaa avatar dagola avatar dependabot[bot] avatar github-actions[bot] avatar henrifnk avatar ja-thomas avatar jakob-r avatar jemus42 avatar m-muecke avatar mb706 avatar mboecker avatar mllg avatar pat-s avatar quayau avatar salauer avatar sebffischer avatar sumny avatar xenor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr3learners's Issues

Deal with option "contrasts"

Some models rely on the global option "contrasts" (lm, glm, maybe more). This renders the fitting process irreproducible.

We should set the option to the default (c(ordered = "contr.poly", unordered = "contr.treatment")) and document this properly.

Essential Learners

Here is a list of essential learners and their respective implementation for discussion:

  • Featureless. In mlr3.
  • Classification and regression trees: rpart (in mlr3). I'd like to keep this in mlr3 as rpart is shipped with R and I need it to run some basic tests and examples.
  • Linear / Logistic Reg: lm(), glm
  • Penalized regression: glmnet
  • kNN: kknn.
  • Naive Bayes: e1071.
  • SVM: e1071
  • Random Forest: ranger.
  • Boosting: xgboost.
  • Kriging: DiceKriging.
  • Neural Network: ?

Please, share your thoughts on the implementations and what is missing in this list.

@berndbischl @jakob-r @ja-thomas @larskotthoff @pat-s @Coorsaa @florianfendt @giuseppec @mb706 @zzawadz

Connect lightgbm learner

Hello,

I have experimented a little during the last weeks and maybe this is of interest for someone:

I was able to implement the lightgbm python module in the R package lightgbm.py and again use this "base-implementation" as dependency for the R package mlr3learners.lightgbm.

It works so far and I have added vignettes for binary classification and multiclass classification examples as well as a regression task.

The base-implementation uses reticulate as an R interface to the python module.

The mlr3-extension currently not passes the run_autotests test, so some debugging need so be done. However, if this approach is of interest for someone, it could be a possibility of bringing the great LightGBM to the mlr3 framework.

How do you filter tests for a specific learner

The general testing is done by test_classif_all, it doesn't make sense to implement the same tests for every learner again (I suppose test_classif_ranger.R will be deleted, correct?)

But then I can only test all learners and not single ones.

Design document for the tests

Is somewhere defined how tests for the learner should look like?

E.g.

  • General stuff is tested in test_[type]_all.R
  • Special bugs we discover for learners go in a seperate test file test_classif_[lrn].R

Learner fields are not documented

Learner fields like "properties" or "feature_types" are not documented in their respective help pages.
This is because they are not explicitly listed as fields in their class description.

These fields should be documented in the respective super classes.

LearnerRegrLm not working

Probably because something in mlr3 changed.

> LearnerRegrLm$new()$train(mlr_tasks$get("bh"))
Error: object of type 'closure' is not subsettable

Enter a frame number, or 0 to exit   

 1: LearnerRegrLm$new()$train(mlr_tasks$get("bh"))
 2: invoke(stats::lm, formula = task$formula, data = task$data(), .args = pars)
 3: eval.parent(expr, n = 1)
 4: eval(expr, p)
 5: eval(expr, p)
 6: stats::lm(formula = task$formula, data = task$data())
 7: eval(mf, parent.frame())
 8: eval(mf, parent.frame())
 9: stats::model.frame(formula = task$formula, data = task$data(), drop.unused.
10: model.frame.default(formula = task$formula, data = task$data(), drop.unused
11: as.formula(formula)
12: formula(object, env = baseenv())
13: formula.default(object, env = baseenv())
14: notnull(x$formula)

How can we support continue training?

This probably rather belongs in mlr3, but there needs to be custom code for learners that support that as well.

For a lot of algorithms (e.g. boosting, gradient descent based algos), training can be continued and models can be updated.

This was often wanted by people for mlr (especially in combination with early stopping) is there a nice way to support this in a general way for algorithms that support this?

glmnet test fails sporadically

When running rtest, I noticed this error, which was not present the next time I ran rtest.
After ~10 more runs, this happened again. So this is a sporadic error.

test_classif_glmnet.R:6: failure: autotest
result isn't true.
[train()] learner 'classif.glmnet:response' on task 'feat_all_binary' failed: train log has errors: Ersetzung hat Länge 0

classif.kknn predicts probabilities even when predict_type is response

the learner should only predict responses if predict_type is response, and only probabilities if predict_type is probability

> lrn("classif.kknn", predict_type = "response")$train(tsk("iris"))$predict(tsk("iris"))
<PredictionClassif> for 150 observations:
    row_id     truth  response prob.setosa prob.versicolor prob.virginica
         1    setosa    setosa           1       0.0000000      0.0000000
[...]

Avoid formula if possible

Formulas are known to be broken for large data. We should try to avoid the formula interface if possible.

ranger has an alternative interface, not sure about the others.

s param in glmnet

glmnet predicts a matrix for different values for s. Hence one need to specify a certain value for it to get a single prediction vector. in mlr and right now, we set s to 0.01 and dismiss all 100 calculated values. How should we deal with this? switch to cvglmnet?

Expose learner plot functions

Many learners bring their own plot functions. As we now only maintain few learners we can maybe expose selected plots.

Example: rpart plot, xgboost has plots etc.

xgboost learner hyper parameter (lambda and alpha) clarification

Hi,

Anther week, another issue by me. Hopefully you guys are ok with that?

To the issue.

I tried running some xgboost tuning where I tried to tune lambda and alpha parameters for the gbtree booster.

library(mlr3) 
library(mlr3learners)
library(mlr3tuning)
library(paradox)

lrn_xgboost <- lrn("classif.xgboost")

lrn_xgboost$predict_type <- "prob"

cv5 <- rsmp("cv", folds = 5)
tsk <- mlr_tasks$get("sonar")

xgb_ps <- ParamSet$new(list(
  ParamFct$new("booster", levels = c("gbtree")),
  ParamDbl$new("eta", lower = 0.003, upper = 0.3),
  ParamDbl$new("gamma", lower = 0, upper = 10),
  ParamInt$new("max_depth", lower = 3, upper = 20),
  ParamDbl$new("colsample_bytree", lower = 0.5, upper = 1),
  ParamDbl$new("colsample_bylevel", lower = 0.5, upper = 1),
  ParamDbl$new("lambda", lower = 0, upper = 10),
  ParamDbl$new("alpha", lower = 0, upper = 10),
  ParamDbl$new("subsample", lower = 0.5, upper = 1),
  ParamInt$new("nrounds", lower = 20, upper = 100)
))

instance <- TuningInstance$new(
  task = tsk,
  learner = lrn_xgboost,
  resampling = cv5,
  measures = msr("classif.auc"),
  param_set = xgb_ps,
  terminator = term("evals", n_evals = 20)
)
tuner <- TunerRandomSearch$new()
tuner$tune(instance)

this results in error

INFO  [11:56:14.498] Starting to tune 10 parameters with '<TunerRandomSearch>' and '<TerminatorEvals>' 
INFO  [11:56:14.499] Terminator settings: n_evals=20 
INFO  [11:56:14.534] Evaluating 1 configurations 
INFO  [11:56:14.536]  booster        eta     gamma max_depth colsample_bytree colsample_bylevel   lambda    alpha subsample nrounds 
INFO  [11:56:14.536]   gbtree 0.03936435 0.2437546         5        0.6446646         0.7196681 6.194857 9.013528  0.961457      87 
Error in (function (xs)  : 
  Assertion on 'xs' failed: Condition for 'lambda' not ok: booster equal gblinear; instead: booster=gbtree.

Clearly lambda and alpha parameters are reserved for the linear booster by mlr3. This is probably because the xgboost function help states they are parameters of the linear booster

2.2. Parameter for Linear Booster
• lambda L2 regularization term on weights. Default: 0
• lambda_bias L2 regularization term on bias. Default: 0
• alpha L1 regularization term on weights. (there is no L1 reg on bias because it is not important). >Default: 0

However this source: https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster mentioned them as parameters for the tree and dart boosters also.

When a test is run to see if these parameters have an effect on xgboost "gbtree" models run in R:

library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(train = dtrain, eval = dtest)
param <- list(booster = "gbtree",
              max_depth = 2,
              eta = 1,
              verbose = 0,
              objective = "binary:logistic",
              eval_metric = "auc")
set.seed(1)
bst <- xgb.train(param,
                 dtrain,
                 nrounds = 2,
                 watchlist)
[1]	train-auc:0.958228	eval-auc:0.960373 
[2]	train-auc:0.981413	eval-auc:0.979930 
param2 <- list(booster = "gbtree",
              max_depth = 2,
              eta = 1,
              verbose = 0,
              objective = "binary:logistic",
              eval_metric = "auc",
              alpha = 100)

set.seed(1)
bst2 <- xgb.train(param2,
                 dtrain,
                 nrounds = 2,
                 watchlist)
[1]	train-auc:0.979337	eval-auc:0.980196 
[2]	train-auc:0.996274	eval-auc:0.995977 
param3 <- list(booster = "gbtree",
               max_depth = 2,
               eta = 1,
               verbose = 0,
               objective = "binary:logistic",
               eval_metric = "auc",
               lambda = 1000)

set.seed(1)
bst3 <- xgb.train(param3,
                  dtrain,
                  nrounds = 2,
                  watchlist)
[1]	train-auc:0.957067	eval-auc:0.958731 
[2]	train-auc:0.986000	eval-auc:0.986332 

It can be observed they do have an effect on the trained models.

Could you change the dependencies for xgboost learner so that lambda and alpha parameters can be tuned regardless of the booster?

For instance autoxgboost has no such constraints.

Kind regards,

Milan

xgboost learner inverts labels

label = match(as.character(as.matrix(task$data(cols = task$target_names))), lvls) - 1

The match line for extracting labels from the task inverts the labels which messes with measures on binary tasks. This causes issues when supplying a watchlist to an xgb task for early stopping.

# positive class comes first
lvls = c('1', '0')
labels = c('0', '1', '0')
new_labels = match(labels, pos_1_lvls) - 1
new_labels == labels # FALSE

Suggested:

label = length(lvls) - match(as.character(as.matrix(task$data(cols = task$target_names))), lvls)

glmnet with a single feature

glmnet does not allow training if the dataset has only a single feature,

See here in the glmnet code

    np = dim(x)
    if (is.null(np) | (np[2] <= 1)) 
        stop("x should be a matrix with 2 or more columns")

How should we handle this?

automatic tests for all learners

One (!) test for all learners, i.e. new learners should be testet automatically

  • learner properties
  • does prediction work?
  • tests should be fast
  • test predictive performance with simple examples, e.g. better than mmce of 0.3

helper_learners_all could be a starting point

Connect learner multinom from package nnet

mlr3learners currently does not provide simple tuning-free softmax regression for multiclass tasks.
I think it should, as a versatile and interpretable baseline against more complicated tunable algorithms (glmnet, trees, svm).

give glmnet the $importance slot

because then it could be used in combination with FilterEmbedded in mlr3featsel for feature selection in order of L1 inclusion. Importance could be the (approximate) lambda value at which a feature is first included and can easily be calculated from the model.

gamboost learner does not properly specify that it can only do binary classif

ll = lrn("classif.gamboost")
tt = tsk("iris")
rr = resample(tt, ll, rsmp("cv", folds = 2))

--> results in

INFO [21:26:36.247] Applying learner 'classif.gamboost' on task 'iris
' (iter 1/2)
Error in family@check_y(y) :
response is not a factor at two levels but 'family = Binomial()'

shouldt there be an autotest that makes sure that stuff like this does not happen?

Properly assert learner properties

similar to #56

The following should error as "log_reg" does not support "multiclass" tasks by definition.
There might be more learners with missing assertions on their properties.

library(mlr3verse, quietly = TRUE)

lrn = lrn("classif.log_reg")
lrn$properties
#> [1] "twoclass" "weights"
tsk = tsk("iris")
tsk$properties
#> [1] "multiclass"

# works - but should error
lrn("classif.log_reg")$train(tsk("iris"))
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Created on 2020-03-13 by the reprex package (v0.3.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.