Giter VIP home page Giter VIP logo

scikit-optimize / scikit-optimize Goto Github PK

View Code? Open in Web Editor NEW
2.7K 2.7K 544.0 9.2 MB

Sequential model-based optimization with a `scipy.optimize` interface

Home Page: https://scikit-optimize.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 94.83% Shell 4.86% Makefile 0.31%
bayesian-optimization bayesopt binder hacktoberfest hyperparameter hyperparameter-optimization hyperparameter-search hyperparameter-tuning machine-learning optimization scientific-computing scientific-visualization scikit-learn sequential-recommendation visualization

scikit-optimize's Introduction

Logo

pypi conda Travis Status CircleCI Status binder gitter Zenodo DOI

Scikit-Optimize

Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization. skopt aims to be accessible and easy to use in many contexts.

The library is built on top of NumPy, SciPy and Scikit-Learn.

We do not perform gradient-based optimization. For gradient-based optimization algorithms look at scipy.optimize here.

Approximated objective function after 50 iterations of gp_minimize. Plot made using skopt.plots.plot_objective.

Install

scikit-optimize requires

  • Python >= 3.6
  • NumPy (>= 1.13.3)
  • SciPy (>= 0.19.1)
  • joblib (>= 0.11)
  • scikit-learn >= 0.20
  • matplotlib >= 2.0.0

You can install the latest release with: :

pip install scikit-optimize

This installs an essential version of scikit-optimize. To install scikit-optimize with plotting functionality, you can instead do: :

pip install 'scikit-optimize[plots]'

This will install matplotlib along with scikit-optimize.

In addition there is a conda-forge package of scikit-optimize: :

conda install -c conda-forge scikit-optimize

Using conda-forge is probably the easiest way to install scikit-optimize on Windows.

Getting started

Find the minimum of the noisy function f(x) over the range -2 < x < 2 with skopt:

import numpy as np
from skopt import gp_minimize

def f(x):
    return (np.sin(5 * x[0]) * (1 - np.tanh(x[0] ** 2)) +
            np.random.randn() * 0.1)

res = gp_minimize(f, [(-2.0, 2.0)])

For more control over the optimization loop you can use the skopt.Optimizer class:

from skopt import Optimizer

opt = Optimizer([(-2.0, 2.0)])

for i in range(20):
    suggested = opt.ask()
    y = f(suggested)
    opt.tell(suggested, y)
    print('iteration:', i, suggested, y)

Read our introduction to bayesian optimization and the other examples.

Development

The library is still experimental and under heavy development. Checkout the next milestone for the plans for the next release or look at some easy issues to get started contributing.

The development version can be installed through:

git clone https://github.com/scikit-optimize/scikit-optimize.git
cd scikit-optimize
pip install -e.

Run all tests by executing pytest in the top level directory.

To only run the subset of tests with short run time, you can use pytest -m 'fast_test' (pytest -m 'slow_test' is also possible). To exclude all slow running tests try pytest -m 'not slow_test'.

This is implemented using pytest attributes. If a tests runs longer than 1 second, it is marked as slow, else as fast.

All contributors are welcome!

Making a Release

The release procedure is almost completely automated. By tagging a new release travis will build all required packages and push them to PyPI. To make a release create a new issue and work through the following checklist:

  • update the version tag in __init__.py
  • update the version tag mentioned in the README
  • check if the dependencies in setup.py are valid or need unpinning
  • check that the doc/whats_new/v0.X.rst is up to date
  • did the last build of master succeed?
  • create a new release
  • ping conda-forge

Before making a release we usually create a release candidate. If the next release is v0.X then the release candidate should be tagged v0.Xrc1 in __init__.py. Mark a release candidate as a "pre-release" on GitHub when you tag it.

Commercial support

Feel free to get in touch if you need commercial support or would like to sponsor development. Resources go towards paying for additional work by seasoned engineers and researchers.

Made possible by

The scikit-optimize project was made possible with the support of

Wild Tree Tech

NYU Center for Data Science

NSF

Northrop Grumman

If your employer allows you to work on scikit-optimize during the day and would like recognition, feel free to add them to the "Made possible by" list.

scikit-optimize's People

Contributors

bacoknight avatar betatim avatar carlosdanielcsantos avatar chschroeder avatar cmmalone avatar cschell avatar glouppe avatar guillaumesimo avatar holgern avatar hvass-labs avatar iaroslav-ai avatar jkleint avatar jusjusjus avatar kartikayyer avatar kejiashi avatar kernc avatar liebscher avatar lucasplagwitz avatar mechcoder avatar mirca avatar mp4096 avatar nel215 avatar nfcampos avatar scott-graham-bose avatar shakesb33r avatar stefanocereda avatar thomasjpfan avatar tupui avatar xmatthias avatar yngtodd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scikit-optimize's Issues

ExtraTrees returns NaN for std

yield (check_minimize, minimizer, bench1, 0., [(-2.0, 2.0)], 0.05, 75) with et_minimize produces

scikit-optimize/skopt/acquisition.py:165: RuntimeWarning: invalid value encountered in greater
  mask = std > 0

and std is:

(Pdb) print(std)
[  0.00000000e+00   3.44874701e-01   4.35236492e-01              nan
   5.35666028e-01   3.76289149e-01   0.00000000e+00   3.44874701e-01
   3.03596891e-01   2.84929167e-01              nan              nan
   1.11601649e-01   3.44874701e-01              nan              nan
   2.98023224e-08   6.69582536e-01   1.68631973e-01              nan]

Doc: generate examples gallery

Would be nice to generate examples upon deployment to build a nice gallery. This would require some changes to ci_scripts/deploy.sh and to the templates, but nothing impossible.

API for non continuous inputs

At the moment, input values are assumed to live within a bounded continuous range. We should think about an API on how to specify integer and symbolic values as well, and what would be the consequences for the algorithms we implemented so far.

Support for categorical parameters seems to be broken

def bench1(x):
    return np.asscalar(np.asarray(x))

def bench2(x):
    return np.asscalar(np.asarray(x, dtype=np.int))

bench1([1])
1
bench2(["1"])
1

from skopt import forest_minimize
# Works
forest_minimize(bench1, ((1.0, 4.0),))
# Fails
forest_minimize(bench2, (("1", "2", "3", "4"),))

# Works
gp_minimize(bench1, ((1.0, 4.0),), maxiter=5)
# Fails
gp_minimize(bench2, (("1", "2", "3", "4"),))

Move GBRT in a `learning` submodule

I propose creating a learning submodule, for basically everything which is a modification of a ML algorithm. The wrapper around Gradient Boosting should be moved there.

Expected input/output shape

When exploring #37 and #38, I noticed that we are not very consistent with respect to the input/output shape. We should enforce one and only way to do things.

I would suggest the following conventions:

  • func: 1d array-like as input, scalar as output (as in scipy.minimize)
  • acquisition functions: 2d array-like as input, 1d array as output.

Everything else raises an error.

Definition of terms related to uncertainty

Some thoughts on "uncertainty". This issue was inspired by @MechCoder's comment in #9. The first part of this issue tries to correctly define various terms that often get used interchangeably and are easy to confuse (I confidently predict that I will make at least one error in this post). Once we have defined the terms, we can decide which of them we need in order to evaluate various acquisition functions.

Standard deviation (\sigma): this is the square root of the variance. Can be calculated for any sample no matter what distribution the samples come from.

Standard error (of the mean): \sigma / \sqrt(N) a measure of the uncertainty associated with the estimated value of the mean.

Confidence interval (CI): The N% confidence interval will contain the measured value N% of the time. Alice wants to estimate the value of a parameter t, so she constructs an estimator that as well as a CI. The 68% CI (around that) will contain the true value t in 68% of experiments (that is we clone Alice and repeat what she did many times).

N% quantile: The N% quantile starts at negative infinity and goes until a point x, think of it as the integral of the p.d.f. between -inf and x which equals N%.

If that is distributed according to a normal distribution then the 68% CI is [that - sigma, that + sigma].

For a normal distribution mu-sigma = the 16% quantile.

For our purposes we have a surrogate model (a GP or what have you) for the true, expensive function f. At a given point x our best estimate of the true value of f is the mean mu(x) of our surrogate model.


Now my understanding runs out -> need help.

What is the band we get from a GP and then feed into EI and friends? Is it the "standard error on the mean" or "68% confidence interval" or "68% credible interval" or something else?

Tests are slow! (episode II)

The current build takes more than 15mn, this is very long, given that we dont have so much code yet... We should really try to trim some of the tests.

Add scikit-learn compatible BayesSearchCV

Hey.
Do you intent to provide a GridSearchCV plug-in replacement or only the optimizer?
The thing is that it might take a while to get that into scikit-learn, and it would be nice if people had access to it.

Cheers,
Andy

Implement RF based model selection

The computed variance for each RandomForest is given in http://arxiv.org/pdf/1211.0906v2.pdf in section 4.3.2 (This will involve wrapping sklearn's DecisionTrees to return the standard deviation of each leaf)

The ExpectedImprovement makes the same assumption about the predictions being gaussian except there is a minor modification given in Section 5.2 of https://www.cs.ubc.ca/~murphyk/Papers/gecco09.pdf

There is a change from sklearn's RF implementation in computing the split point described in 4.3.2 in http://arxiv.org/pdf/1211.0906v2.pdf but we can try without that modification.

Extend test suite

Before implementing any more things, we should really extend the test suite with more thorough tests. At the moment, I cant even minimize a 1D parabola with the default parameters of gp_minimize...

(and I dont even understand why it fails... so many things to adjust :/)

We might want to look at other packages for good defaults.

Slow tests

The current three tests take 17 mins to run on Travis, while the entire sklearn test suite runs in 10 mins

Add random search

For API checks and baseline purposes, I think it would be nice to have dummy random search method.

Incompatibility with Python 2.7.x?

I noticed that when run with 2.7.11, there is a syntax error:

in space.py
def __init__(self, *categories, prior=None):
SyntaxError: invalid syntax

The regular argument cannot come after the *argument. Simply reversing these parameters causes other issues in space.py
This seems to be in accordance with this accepted Python enhancement proposal.

Do the devs plan on making skopt compatible with 2.7.x?

Matern kernel returning a zeroed out covariance matrix

I have been playing around the code for sometime and it doesn't seem to work at least for the test example (or seems to at least by chance)

a = 1
b = 5.1 / (4 * pi**2)
c = 5.0 / pi
r = 6
s = 10
t = 1 / (8*pi)

def branin(x):
    x1 = x[0]
    x2 = x[1]
    return a * (x2 - b * x1**2 + c * x1 - r)**2 + s * (1 - t) * cos(x1) + s

bounds = [[-5, 10], [0, 15]]
res = gp_minimize(
    branin, bounds, search='sampling', maxiter=2, random_state=0,
    acq='UCB')

More specifically this line https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L282 returns a matrix of zeros.

This is because the optimized scale parameter of the Matern kernel is 1e-5, which sets the covariance between all the samples to be zero.

Should we try a different approach other than scaling the parameters down to 0 and 1.

@glouppe @betatim What are your thoughts on this?

Ask-and-tell interface?

Hi, I just discovered this project. I wonder whether it is really the goal to provide only a scipy-like interface or whether you think it would be possible to provide an ask-and-tell interface, too. That would be much more convenient for use cases in which the optimization process is controlled actually by the objective function.

Local-search based technique to optimize the acquisition function of trees and friends

We cannot optimize the acquisition function of using conventional gradient / 2nd order information based methods. SMAC does it in the following way described in page 13 of http://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf

Some terminology.

  1. If we have p parameters and a parameter configuration, a one-exchange neighbourhood is defined as a parameter configuration that is different in exactly one parameter.
  2. For a parameter (say X) that is continuous, this neighbor is sampled from a Gaussian centered at X with std 0.2 keeping all other parameters constant.
  3. For a parameter (say Y) that is categorical, this neighbour is any other categorical parameter keeping all other parameters constant.

Seems like they do a multi-start local search with 10 points. For each local search:

  1. Initialize a random point p
  2. Check the acquisition values at "4X + Y" neighbours.
  3. If none of the neighbours have a lesser acquisition than p, then terminate
    Else reassign p to the neighbour with minimum acquisition value.

Then return the minimum of all the 10 local searches.

Explicit is better than implicit

This sounds like a incredibly formal, bureaucratic and heavy, try and read to the end before panicking.

I think one of the first things we should do is make sure we are all on the same page on how the project will work. I suggest the following:

  • all changes by PR
  • a PR solves one problem (don't mix problems together in one PR) with the minimal set of changes
  • describe why you are proposing the changes you are proposing
  • try to not rush changes (definition of rush depends on how big your changes are)
  • someone else has to merge your PR
  • new code needs to come with a test
  • no merging if travis is red

I don't see this as rules to be enforced by ๐Ÿš“ but as guidelines.

I think it is important to write down briefly these kind of "obvious" things if you want to start a project that is long term (not just a hackathon hack) with people who you haven't worked with so much. Basically: explicit is better than implicit ๐Ÿ˜€

Refactor minimize functions to make use of sampling API

Now that #75 has been merged, we should refactor all *_minimize functions in order to make use of the new API.

We may need to make a few internal changes since sample_points return values in the original space, while we will need to feed the transformed values instead to the optimizer.

I would expect something along the following lines:

  1. Make _check_grid a public util returning the corresponding list of Distribution objects.
  2. sample_grid(grid, n_samples)
  3. warp(grid, samples): from original to warped space
  4. unwarp(grid, samples): from warped to original sapce

0.1 release

I would like to get the 0.1 release out before school starts again (i.e September). This is just a parent issue to track the blockers.

  • Consistent and backward-compatible API. Addressed by #75
  • SMAC #57
  • (Local search technique that performs better than random sampling on piecewise constant predict functions (#74), postponed till we have a conclusion in #109)
  • Examples (#65)
  • Support for Python 2.7 (#87)
  • Consistent return types #86
  • Name collision #76 (punting for now)
  • Need a logo #107 (code speaks louder than images, no logo required)
  • release mechanics #133
  • better defaults #166
  • merge #145
  • merge #169
  • maybe merge #162 (nice to have but don't hold the ๐Ÿš„ for it)
  • stop this list from getting ever longer ๐Ÿ“‹

Is there anything else?

Run examples as part of the CI

Avoid broken examples like what happened in #29 by running them as part of travis. Not sure if there is anything more useful we can do than to check that they run with exit code == 0.

GBRT based minimization

GBRT now returns the quantiles. We can get a naive approximation to the std by subtracting the 68th quantile from the 50th quantile and feeding it to the acquistion functions.

Diagnostic plots ๐Ÿ“ˆ๐Ÿ“Š๐Ÿ“‰

We should add some convenience functions that make plots similar to what is in the examples for "generic" problems to help people debug why things aren't converging or why they are converging to the value they are etc etc.

Maybe use something in the style of https://github.com/dfm/corner.py to show N>2 spaces, where the samples are, what the acquisition function looks like, ...

Project name

If we plan on getting serious with this, we should think of a better project name.

One that I like would be scikit-optimize, abbreviated as skopt.

CC: @MechCoder @betatim

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.