Giter VIP home page Giter VIP logo

bbotk's Introduction

bbotk - Black-Box Optimization Toolkit

Package website: release | dev

r-cmd-check CRAN Status Badge Mattermost

bbotk is a black-box optimization framework for R. It features highly configurable search spaces via the paradox package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Grid Search, Iterated Racing, Bayesian Optimization (in mlr3mbo) and Hyperband (in mlr3hyperband). bbotk is the base package of mlr3tuning, mlr3fselect and miesmuschel.

Resources

There are several sections about black-box optimization in the mlr3book. Often the sections about tuning are also relevant for general black-box optimization.

Installation

Install the latest release from CRAN.

install.packages("bbotk")

Install the development version from GitHub.

pak::pkg_install("mlr-org/bbotk")

Example

# define the objective function
fun = function(xs) {
  - (xs[[1]] - 2)^2 - (xs[[2]] + 3)^2 + 10
}

# set domain
domain = ps(
  x1 = p_dbl(-10, 10),
  x2 = p_dbl(-5, 5)
)

# set codomain
codomain = ps(
  y = p_dbl(tags = "maximize")
)

# create objective
objective = ObjectiveRFun$new(
  fun = fun,
  domain = domain,
  codomain = codomain,
  properties = "deterministic"
)

# initialize instance
instance = oi(
  objective = objective,
  terminator = trm("evals", n_evals = 20)
)

# load optimizer
optimizer = opt("gensa")

# trigger optimization
optimizer$optimize(instance)
##    x1 x2  x_domain  y
## 1:  2 -3 <list[2]> 10
# best performing configuration
instance$result
##    x1 x2  x_domain  y
## 1:  2 -3 <list[2]> 10
# all evaluated configuration
as.data.table(instance$archive)
##            x1        x2          y           timestamp batch_nr x_domain_x1 x_domain_x2
##  1: -4.689827 -1.278761 -37.716445 2024-08-13 17:52:54        1   -4.689827   -1.278761
##  2: -5.930364 -4.400474 -54.851999 2024-08-13 17:52:54        2   -5.930364   -4.400474
##  3:  7.170817 -1.519948 -18.927907 2024-08-13 17:52:54        3    7.170817   -1.519948
##  4:  2.045200 -1.519948   7.807403 2024-08-13 17:52:54        4    2.045200   -1.519948
##  5:  2.045200 -2.064742   9.123250 2024-08-13 17:52:54        5    2.045200   -2.064742
## ---                                                                                    
## 16:  2.000000 -3.000000  10.000000 2024-08-13 17:52:54       16    2.000000   -3.000000
## 17:  2.000001 -3.000000  10.000000 2024-08-13 17:52:54       17    2.000001   -3.000000
## 18:  1.999999 -3.000000  10.000000 2024-08-13 17:52:54       18    1.999999   -3.000000
## 19:  2.000000 -2.999999  10.000000 2024-08-13 17:52:54       19    2.000000   -2.999999
## 20:  2.000000 -3.000001  10.000000 2024-08-13 17:52:54       20    2.000000   -3.000001

bbotk's People

Contributors

be-marc avatar berndbischl avatar github-actions[bot] avatar jakob-r avatar jemus42 avatar lionel- avatar mb706 avatar michaelchirico avatar mllg avatar pat-s avatar sebffischer avatar sumny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bbotk's Issues

make domain of objective optional

Sometimes we only know the search_space (param_set of the OptimInstance) but not the domain (search space after trafo) of the objective. We should not always be obliged to define a domain because it is not used anyways.

Therefore it should be optional to define the domain.

Terminator stagnation and an occasionally failing learner return an error during tuning

Hi,

I stumbled on this by chance, not sure if it can be classified as bug but I thought you should know about it.

When combined with a learner that fails occasionally stagnation terminator returns the error:

Error in if (self$terminator$is_terminated(self)) { : 
  missing value where TRUE/FALSE needed

Example:

lrn_rpart <- lrn("classif.rpart")
ig <- po("filter", flt("information_gain"))

ps <- ParamSet$new(list(
  ParamDbl$new("classif.rpart.cp", lower = 0, upper = 0.05),
  ParamInt$new("information_gain.filter.nfeat", lower = 20L, upper = 60L),
  ParamFct$new("information_gain.type", levels = c("infogain",
                                                   "gainratio")) # I know gainratio does not work well with Sonar
))

glrn <- ig %>>%
  lrn_rpart

glrn <- GraphLearner$new(glrn) 

glrn$encapsulate <-  c(train = "evaluate", predict = "evaluate")

cv5 <- rsmp("cv", folds = 5)

tsk <- mlr_tasks$get("sonar")

instance <- TuningInstance$new(
  task = tsk,
  learner = glrn,
  resampling = cv5,
  measures = msr("classif.ce"),
  param_set = ps,
  terminator =  term("stagnation", iters = 5, threshold = 0)
)

tuner <- TunerRandomSearch$new()
set.seed(123)
tuner$tune(instance)

After 6 configurations evaluated the error occurs - I trust it is due to the NaN in the performance measure

instance$archive()
   nr batch_nr  resample_result task_id                     learner_id resampling_id iters params tune_x warnings errors classif.ce
1:  1        1 <ResampleResult>   sonar information_gain.classif.rpart            cv     5 <list> <list>        0      0  0.2648084
2:  2        2 <ResampleResult>   sonar information_gain.classif.rpart            cv     5 <list> <list>        0      0  0.2596980
3:  3        3 <ResampleResult>   sonar information_gain.classif.rpart            cv     5 <list> <list>        0      5        NaN
4:  4        4 <ResampleResult>   sonar information_gain.classif.rpart            cv     5 <list> <list>        0      0  0.2454123
5:  5        5 <ResampleResult>   sonar information_gain.classif.rpart            cv     5 <list> <list>        0      0  0.2501742
6:  6        6 <ResampleResult>   sonar information_gain.classif.rpart            cv     5 <list> <list>        0      0  0.2737515

All the best,

Milan

eval_batch(xdt): xdt should be able to contain more then x cols

If we call OptimInstance$eval_batch(xdt) from inside an optimizer we might have more information than just the x values (e.g. AcqFunction value that lead to this x value in MBO).

Either we allow more information in xdt or we allow adding more info afterwards (which would be a bit cumbersome),

RS as part of bbotk

It would make sense to have this as a reference optimizer implementation in bbotk

Write Tutorial/Vignette

-inlcude basic example
-inlcude MOO
-inlcude Parallelization
-include how to handle y + "extra" returns from the objective

mlr3pipelines: Optimizers for threshold tuning

I implemented two Optimizers for threshold tuning: OptimizerNloptr and OptimizerGenSA.
You can find them here.

In general, I would need to factor out the threshold tuning logic and the optimization logic, but this should be trivial.

It would be cool if they could be added to bbotk or mlr3tuning.

See also: mlr-org/mlr3tuning#231

Rename param_set of the OptimInstance

  • param_set is used in many objects to control how this object behaves
  • here the param_set is actually the search space
  • Therefore: Name it search_space ?

other suggestions welcome

Structure of instance$result unclear

Should be defined properly. Also the signature of assign_result:

Suggestion:

list(
  xdt, # data.table with one or multiple rows, 
       # subset of the archive, meaning xdt is subset of search_space
  y, # numerical vector for single-crit, data.table for multi-crit
  x_opt # list (of lists) with one or mulitple elements. 
        # transformed x values, subset of domain of objective
)

Alternative: the result is just a data.table in the exact same way as in archive$data. Basically a subset of arhcive$data and in cases the optimizer returns a result that was not previously evaluated such a data.table has to be constructed.

Actually I would prefer the alternative suggestion because it feels simpler and more coherent.

OptimInstance should have an optional sampler slot

That would be the right place where you can put a ParamSet specific sampler that could be used for Hyperband or RandomSearch etc.

Alternatively we can have a Subclass but I don't see a big need for that here.

Prolem: The user does not directly see if the sampler actually works. i.e. some tuners like the GridSearch would just use the ParamSet directly and ignore the Sampler.

plot for tuning instance?

It would be nice to have a quick plot function that shows the tuning curve and for 1 and 2d problems some response surface with a simple interpolation.

Would that be part of mlr3viz?

Add terminator stagnation batch

For sequential feature selection a terminator that terminates after the performance does not improve more than threshold over the last batch would be useful.

Implement basic optimizers

FIXME: we could add some basic, simple optimizers from R here. connecting them here would enable them for many tasks in optimization, not only mlr3tuning. think then how mlr3mbo extends this system then / regiusters itself

I think this goes beyont bbotk?

annoying "redundancies" with mlr3: future and encapsulation

not sure how to handle this:

the package should allow for the following:
if multiple points are evaluated, this should be parallelized (by future) and encapsulated (by callr).
now it seems reasonable to copy over / do something similar as in mlr3.
NB: I have no problem with copy-pasting that code, thats not the issue here!

If I do that, and bbotk is used in mlr3tuning, we now have these features twice. That seems confusing to the user?
Example: I could now switch on the parallel-option for bbotk, but I could also switch it on for mlr3.
The same for encapsulation.

What would be the best way out here? @mllg

Check what happens with not persistent extras

At the moment we just actively support extras going into the instance and coming out of the objective to be stored in the archive if they are always the same (same names). What happens if some extras are just added for some evaluations?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.