mlr-org / mlr3learners Goto Github PK

View Code? Open in Web Editor NEW

89.0 17.0 14.0 11.62 MB

Recommended learners for mlr3

Home Page: https://mlr3learners.mlr-org.com

License: GNU Lesser General Public License v3.0

R 100.00%

mlr3 machine-learning regression classification learners r r-package

mlr3learners's Introduction

mlr3learners

Package website: release | dev

This packages provides essential learners for mlr3, maintained by the mlr-org team. Additional learners can be found in the mlr3extralearners package on GitHub. Request additional learners over there.

👉 Table of all learners

Installation

# CRAN version:
install.packages("mlr3learners")

# Development version:
remotes::install_github("mlr-org/mlr3learners")

If you also want to install all packages of the connected learners, set dependencies = TRUE:

# CRAN version:
install.packages("mlr3learners", dependencies = TRUE)

# Development version:
remotes::install_github("mlr-org/mlr3learners", dependencies = TRUE)

Classification Learners

ID	Learner	Package
classif.cv_glmnet	Penalized Logistic Regression	glmnet
classif.glmnet	Penalized Logistic Regression	glmnet
classif.kknn	k-Nearest Neighbors	kknn
classif.lda	LDA	MASS
classif.log_reg	Logistic Regression	stats
classif.multinom	Multinomial log-linear model	nnet
classif.naive_bayes	Naive Bayes	e1071
classif.nnet	Single Layer Neural Network	nnet
classif.qda	QDA	MASS
classif.ranger	Random Forest	ranger
classif.svm	SVM	e1071
classif.xgboost	Gradient Boosting	xgboost

Regression Learners

ID	Learner	Package
regr.cv_glmnet	Penalized Linear Regression	glmnet
regr.glmnet	Penalized Linear Regression	glmnet
regr.kknn	k-Nearest Neighbors	kknn
regr.km	Kriging	DiceKriging
regr.lm	Linear Regression	stats
regr.nnet	Single Layer Neural Network	nnet
regr.ranger	Random Forest	ranger
regr.svm	SVM	e1071
regr.xgboost	Gradient Boosting	xgboost

mlr3learners's People

Contributors

Stargazers

Watchers

Forkers

zzawadz pfistfl 001ben guopupaul gedevan-aleksizde salauer drzhaojie jinshuai886 soukeiu statist-bhfz ja-thomas dagola datalearns m-muecke

mlr3learners's Issues

Deal with option "contrasts"

Some models rely on the global option "contrasts" (lm, glm, maybe more). This renders the fitting process irreproducible.

We should set the option to the default (c(ordered = "contr.poly", unordered = "contr.treatment")) and document this properly.

learner links in README: 404

Essential Learners

Here is a list of essential learners and their respective implementation for discussion:

Please, share your thoughts on the implementations and what is missing in this list.

@berndbischl @jakob-r @ja-thomas @larskotthoff @pat-s @Coorsaa @florianfendt @giuseppec @mb706 @zzawadz

Connect learner h2o from package h2o

CRAN

Check learner parameters

Right now, the learner parameters are mostly taken from mlr. They need to be updated

more unit tests in auto testing

test every predict type (prob and response, se and response)
test binary and multiclass (not only multiclass)

Connect learner bart from package dbarts

Hello,

This is an initial issue for a dbarts implementation.

I have a draft repository which is nearly done but not passing all tests yet:
https://github.com/ck37/mlr3learners.dbarts

On classification I'm having trouble getting the sanity checks for pass, and would take any help on that.

Thanks,
Chris

Connect lightgbm learner

Hello,

I have experimented a little during the last weeks and maybe this is of interest for someone:

I was able to implement the lightgbm python module in the R package lightgbm.py and again use this "base-implementation" as dependency for the R package mlr3learners.lightgbm.

It works so far and I have added vignettes for binary classification and multiclass classification examples as well as a regression task.

The base-implementation uses reticulate as an R interface to the python module.

The mlr3-extension currently not passes the run_autotests test, so some debugging need so be done. However, if this approach is of interest for someone, it could be a possibility of bringing the great LightGBM to the mlr3 framework.

How do you filter tests for a specific learner

The general testing is done by test_classif_all, it doesn't make sense to implement the same tests for every learner again (I suppose test_classif_ranger.R will be deleted, correct?)

But then I can only test all learners and not single ones.

Design document for the tests

Is somewhere defined how tests for the learner should look like?

E.g.

General stuff is tested in test_[type]_all.R
Special bugs we discover for learners go in a seperate test file test_classif_[lrn].R

Learner fields are not documented

Learner fields like "properties" or "feature_types" are not documented in their respective help pages.
This is because they are not explicitly listed as fields in their class description.

These fields should be documented in the respective super classes.

LearnerRegrLm not working

Probably because something in mlr3 changed.

> LearnerRegrLm$new()$train(mlr_tasks$get("bh"))
Error: object of type 'closure' is not subsettable

Enter a frame number, or 0 to exit   

 1: LearnerRegrLm$new()$train(mlr_tasks$get("bh"))
 2: invoke(stats::lm, formula = task$formula, data = task$data(), .args = pars)
 3: eval.parent(expr, n = 1)
 4: eval(expr, p)
 5: eval(expr, p)
 6: stats::lm(formula = task$formula, data = task$data())
 7: eval(mf, parent.frame())
 8: eval(mf, parent.frame())
 9: stats::model.frame(formula = task$formula, data = task$data(), drop.unused.
10: model.frame.default(formula = task$formula, data = task$data(), drop.unused
11: as.formula(formula)
12: formula(object, env = baseenv())
13: formula.default(object, env = baseenv())
14: notnull(x$formula)

Connect learner glmboost from package mboost

this should be really simpe and nearly a copy of the current mboost learner
it should also live in the same package

Connect learner JRip from package RWeka

This is an initial issue for JRip implementation.
https://github.com/henrifnk/mlr3learners.RWeka

All check are running so far but there are is an issue left:
mlr3learners/mlr3learners.rweka#1

Thanks,
Henri

LDA interface problem

mlr-org/mlr3viz#21

How can we support continue training?

This probably rather belongs in mlr3, but there needs to be custom code for learners that support that as well.

For a lot of algorithms (e.g. boosting, gradient descent based algos), training can be continued and models can be updated.

This was often wanted by people for mlr (especially in combination with early stopping) is there a nice way to support this in a general way for algorithms that support this?

Connect learner LiblineaR from package LiblineaR

CRAN

glmnet test fails sporadically

When running rtest, I noticed this error, which was not present the next time I ran rtest.
After ~10 more runs, this happened again. So this is a sporadic error.

test_classif_glmnet.R:6: failure: autotest
result isn't true.
[train()] learner 'classif.glmnet:response' on task 'feat_all_binary' failed: train log has errors: Ersetzung hat Länge 0

Connect learner gamboost from package mboost

CRAN

classif.kknn predicts probabilities even when predict_type is response

the learner should only predict responses if predict_type is response, and only probabilities if predict_type is probability

> lrn("classif.kknn", predict_type = "response")$train(tsk("iris"))$predict(tsk("iris"))
<PredictionClassif> for 150 observations:
    row_id     truth  response prob.setosa prob.versicolor prob.virginica
         1    setosa    setosa           1       0.0000000      0.0000000
[...]

Avoid formula if possible

Formulas are known to be broken for large data. We should try to avoid the formula interface if possible.

ranger has an alternative interface, not sure about the others.

general learner test for label switching

we now already had this issue #32
and we had this issue multiple times in mlr2

do we have a general unit test to detect this? for all learners?

consider moving all suggested learner packages in mlr3learners to imports

isnt it much nicer if i dont have to install all 10 learner packs manually?
@mllg

s param in glmnet

glmnet predicts a matrix for different values for s. Hence one need to specify a certain value for it to get a single prediction vector. in mlr and right now, we set s to 0.01 and dismiss all 100 calculated values. How should we deal with this? switch to cvglmnet?

Expose learner plot functions

Many learners bring their own plot functions. As we now only maintain few learners we can maybe expose selected plots.

Example: rpart plot, xgboost has plots etc.

XGBoost learner has SVM example in constructor

xgboost learner hyper parameter (lambda and alpha) clarification

Hi,

Anther week, another issue by me. Hopefully you guys are ok with that?

To the issue.

I tried running some xgboost tuning where I tried to tune lambda and alpha parameters for the gbtree booster.

library(mlr3) 
library(mlr3learners)
library(mlr3tuning)
library(paradox)

lrn_xgboost <- lrn("classif.xgboost")

lrn_xgboost$predict_type <- "prob"

cv5 <- rsmp("cv", folds = 5)
tsk <- mlr_tasks$get("sonar")

xgb_ps <- ParamSet$new(list(
  ParamFct$new("booster", levels = c("gbtree")),
  ParamDbl$new("eta", lower = 0.003, upper = 0.3),
  ParamDbl$new("gamma", lower = 0, upper = 10),
  ParamInt$new("max_depth", lower = 3, upper = 20),
  ParamDbl$new("colsample_bytree", lower = 0.5, upper = 1),
  ParamDbl$new("colsample_bylevel", lower = 0.5, upper = 1),
  ParamDbl$new("lambda", lower = 0, upper = 10),
  ParamDbl$new("alpha", lower = 0, upper = 10),
  ParamDbl$new("subsample", lower = 0.5, upper = 1),
  ParamInt$new("nrounds", lower = 20, upper = 100)
))

instance <- TuningInstance$new(
  task = tsk,
  learner = lrn_xgboost,
  resampling = cv5,
  measures = msr("classif.auc"),
  param_set = xgb_ps,
  terminator = term("evals", n_evals = 20)
)
tuner <- TunerRandomSearch$new()
tuner$tune(instance)

this results in error

INFO  [11:56:14.498] Starting to tune 10 parameters with '<TunerRandomSearch>' and '<TerminatorEvals>' 
INFO  [11:56:14.499] Terminator settings: n_evals=20 
INFO  [11:56:14.534] Evaluating 1 configurations 
INFO  [11:56:14.536]  booster        eta     gamma max_depth colsample_bytree colsample_bylevel   lambda    alpha subsample nrounds 
INFO  [11:56:14.536]   gbtree 0.03936435 0.2437546         5        0.6446646         0.7196681 6.194857 9.013528  0.961457      87 
Error in (function (xs)  : 
  Assertion on 'xs' failed: Condition for 'lambda' not ok: booster equal gblinear; instead: booster=gbtree.

Clearly lambda and alpha parameters are reserved for the linear booster by mlr3. This is probably because the xgboost function help states they are parameters of the linear booster

2.2. Parameter for Linear Booster
• lambda L2 regularization term on weights. Default: 0
• lambda_bias L2 regularization term on bias. Default: 0
• alpha L1 regularization term on weights. (there is no L1 reg on bias because it is not important). >Default: 0

However this source: https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster mentioned them as parameters for the tree and dart boosters also.

When a test is run to see if these parameters have an effect on xgboost "gbtree" models run in R:

library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(train = dtrain, eval = dtest)

param <- list(booster = "gbtree",
              max_depth = 2,
              eta = 1,
              verbose = 0,
              objective = "binary:logistic",
              eval_metric = "auc")
set.seed(1)
bst <- xgb.train(param,
                 dtrain,
                 nrounds = 2,
                 watchlist)
[1]	train-auc:0.958228	eval-auc:0.960373 
[2]	train-auc:0.981413	eval-auc:0.979930

param2 <- list(booster = "gbtree",
              max_depth = 2,
              eta = 1,
              verbose = 0,
              objective = "binary:logistic",
              eval_metric = "auc",
              alpha = 100)

set.seed(1)
bst2 <- xgb.train(param2,
                 dtrain,
                 nrounds = 2,
                 watchlist)
[1]	train-auc:0.979337	eval-auc:0.980196 
[2]	train-auc:0.996274	eval-auc:0.995977

param3 <- list(booster = "gbtree",
               max_depth = 2,
               eta = 1,
               verbose = 0,
               objective = "binary:logistic",
               eval_metric = "auc",
               lambda = 1000)

set.seed(1)
bst3 <- xgb.train(param3,
                  dtrain,
                  nrounds = 2,
                  watchlist)
[1]	train-auc:0.957067	eval-auc:0.958731 
[2]	train-auc:0.986000	eval-auc:0.986332

It can be observed they do have an effect on the trained models.

Could you change the dependencies for xgboost learner so that lambda and alpha parameters can be tuned regardless of the booster?

For instance autoxgboost has no such constraints.

Kind regards,

Milan

define API, test procedures and "learner sematics" protocol

Things we need:

id
name (includes type)
hyperparameters (paradox)
...

Connect learner ctree from package partykit

CRAN

add dependencies on parameters for all learners

as far as i can see, this is not done yet

Connect learner C5.0 from package C50

Hello,

This is an initial issue for C50 implementation.

The repository is not completely done yet but the Learner is working basicly:
https://github.com/henrifnk/mlr3learners.C50

Thanks,
Henri

xgboost learner inverts labels

mlr3learners/R/LearnerClassifXgboost.R

Line 102 in b48679b

 label = match(as.character(as.matrix(task$data(cols = task$target_names))), lvls) - 1 

The match line for extracting labels from the task inverts the labels which messes with measures on binary tasks. This causes issues when supplying a watchlist to an xgb task for early stopping.

# positive class comes first
lvls = c('1', '0')
labels = c('0', '1', '0')
new_labels = match(labels, pos_1_lvls) - 1
new_labels == labels # FALSE

Suggested:

label = length(lvls) - match(as.character(as.matrix(task$data(cols = task$target_names))), lvls)

Feature Request: Learner using rsparse package backend

Hi All,
Love the new design principles of the new mlr3. I just wanted to request a new learner. The package rsparse (https://github.com/dselivanov/rsparse) is a pretty fast robust package that enables R users to develop recommender systems and NLP using sparse matrices. Or other recommender system learner.

Add "last mlr3 revision date" when loading learners?

I was wondering if we should add .onLoad message to each (custom) learner showing at which package version of the underlying package the hyperparameters of this mlr3-learner have been checked on last?

Connect learner extraTrees from package extraTrees

CRAN

glmnet with a single feature

glmnet does not allow training if the dataset has only a single feature,

See here in the glmnet code

    np = dim(x)
    if (is.null(np) | (np[2] <= 1)) 
        stop("x should be a matrix with 2 or more columns")

How should we handle this?

Connect learner GauPro

https://cran.r-project.org/web/packages/GauPro/index.html

Enable more xgboost parameters

What is it about this TODO? If this worked with mlr it should also work with mlr3?

Please be aware that the tree_method default is wrong as far as I understand the documentation of xgboost: The default should be "auto".

automatic tests for all learners

One (!) test for all learners, i.e. new learners should be testet automatically

learner properties
does prediction work?
tests should be fast
test predictive performance with simple examples, e.g. better than mmce of 0.3

helper_learners_all could be a starting point

Connect learner gbm from package gbm

CRAN

Connect learner multinom from package nnet

mlr3learners currently does not provide simple tuning-free softmax regression for multiclass tasks.
I think it should, as a versatile and interpretable baseline against more complicated tunable algorithms (glmnet, trees, svm).

give glmnet the $importance slot

because then it could be used in combination with FilterEmbedded in mlr3featsel for feature selection in order of L1 inclusion. Importance could be the (approximate) lambda value at which a feature is first included and can easily be calculated from the model.

gamboost learner does not properly specify that it can only do binary classif

ll = lrn("classif.gamboost")
tt = tsk("iris")
rr = resample(tt, ll, rsmp("cv", folds = 2))

--> results in

INFO [21:26:36.247] Applying learner 'classif.gamboost' on task 'iris
' (iter 1/2)
Error in family@check_y(y) :
response is not a factor at two levels but 'family = Binomial()'

shouldt there be an autotest that makes sure that stuff like this does not happen?

Properly assert learner properties

similar to #56

The following should error as "log_reg" does not support "multiclass" tasks by definition.
There might be more learners with missing assertions on their properties.

library(mlr3verse, quietly = TRUE)

lrn = lrn("classif.log_reg")
lrn$properties
#> [1] "twoclass" "weights"
tsk = tsk("iris")
tsk$properties
#> [1] "multiclass"

# works - but should error
lrn("classif.log_reg")$train(tsk("iris"))
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

^{Created on 2020-03-13 by the reprex package (v0.3.0)}

mlr-org / mlr3learners Goto Github PK

mlr3learners's Introduction

mlr3learners

Installation

Classification Learners

Regression Learners

mlr3learners's People

Contributors

Stargazers

Watchers

Forkers

mlr3learners's Issues

Recommend Projects

Recommend Topics

Recommend Org