Giter VIP home page Giter VIP logo

simple's Introduction

Simple(x) Global Optimization

Quick rundown:

Simple is a radically more scalable alternative to Bayesian Optimization. Like Bayesian Optimization, it is highly sample-efficient, converging to the global optimum in as few samples as possible. Unlike Bayesian Optimization, it has a runtime performance of O(log(n)) instead of O(n^3) (or O(n^2) with approximations), as well as a constant factor that is roughly three orders of magnitude smaller. Simple's runtime performance, combined with its superior sample efficiency in high dimensions, allows the algorithm to easily scale to problems featuring large numbers of design variables.

For typical optimization workloads, the CPU time consumed by Bayesian Optimization is measured in minutes, while the CPU time used by Simple is measured in milliseconds. See for yourself:

Simple vs Bayesian Optimization (Also you can find a comparison with CMA-ES here.)

How does it work?

Like Bayesian Optimization, Simple operates by constructing an internal surrogate model of the objective function's behavior. Both algorithms sample from points that their surrogate models predict to have a high objective value, as well as a large amount of information gain in order to make more accurate predictions in future iterations. Bayesian Optimization uses Gaussian processes to model the objective function, which, while they are very statistically rigorous, are also very computationally expensive.

Simple constructs its surrogate model by breaking up the optimization domain into a set of discrete, local interpolations that are independent from one another. Each of these local interpolations takes the form of a simplex, which is the mathematical term for the geometric concept of a triangle, but extended to higher dimensions. By testing a single point inside each simplex, the algorithm can split the parent interpolation into dim+1 smaller, more accurate child interpolations. A somewhat similar method for optimization using local simplicial interpolations and barycentric subdivision was first proposed by Wu, Ozdamar, and Kumar in 2005. Simple belongs to this same class of "space partitioning algorithms."

Simple achieves such an enormous speedup over Bayesian Optimization by transforming the global optimization problem into a dynamic programming problem. Since each interpolation is based on purely local information, individual samples can be taken without the need to recompute the updated surrogate model for the entire search space. At each step, Simple takes the local interpolation with the highest acquisition function value (the highest combined interpolated value and information gain) from its priority queue and evaluates the objective function at this interpolation's test point. The algorithm then uses the new sample information to subdivide the parent interpolation into smaller child interpolations, which it proceeds to add back to the priority queue. Here is a view of the algorithm in action:

Algorithm demonstration

Each simplex has a candidate test location that is determined geometrically to help guarantee convergence and inhibit the creation of less accurate high aspect ratio interpolations. In more detail, the acquisition function is given by acquisition = interpolated_value + exploration_preference * opportunity_cost. The exploration preference is an adjustable hyperparameter that is used to inform the algorithm how much importance it should give to exploring the optimization domain, as opposed to pursuing the first local optimum that it finds.

Inverse distance weighting is used to interpolate the objective value across each simplex from its vertices. The opportunity cost is a relative measure of how much each region of space containing a candidate test point has already been explored. This is determined by opportunity_cost = (best_sample - worst_sample) * log(parent_area_fraction * child_area_coordinate) where log is the dim+1 base logarithm. When subdividing a regular simplex of dimensiion dim, it will be split into dim+1 simplexes each with content that is 1/(dim+1) of the parent simplex. Thus the opportunity cost calculation is represetative of how many times a regularly shaped simplex of equivalent content would have already been sampled.

Usage

First, pip install numpy matplotlib, then place Simple.py in the same working directory as your code. Once that is done you can try the following:

from Simple import SimpleTuner

objective_function = lambda vector: -((vector[0] - 0.2) ** 2.0 + (vector[1] - 0.3) ** 2.0) ** 0.5
optimization_domain_vertices = [[0.0, 0.0], [0, 1.0], [1.0, 0.0]]
number_of_iterations = 30
exploration = 0.05 # optional, default 0.15

tuner = SimpleTuner(optimization_domain_vertices, objective_function, exploration_preference=exploration)
tuner.optimize(number_of_iterations)
best_objective_value, best_coords = tuner.get_best()

print('Best objective value ', best_objective_value)
print('Found at sample coords ', best_coords)
tuner.plot() # only works in 2D

The resulting output is then:

Best objective value  -0.00823447695587
Found at sample coords  [ 0.19285289  0.29591033]

Caveats

There are three minor caveats to this algorithm:

  1. The optimization domain must be defined in terms of a simplex
  2. Linear startup cost wrt. the number of dimensions (not an issue in practice, see below)
  3. It's still experimental

Simple works by constructing local interpolations between sets of sample points, which implies that there must be a sample taken at every corner point of the optimization domain. A prism-shaped optimization domain has 2^dim corner points, while a simplex-shaped domain possesses only dim+1 corner points. Therefore, a simplex-shaped optimization domain is the most sample-efficient choice for this algorithm, and allows it to efficiently optimize highly dimensional objective functions.

So while Simple does possess a hard requirement of needing to sample dim+1 corner points before optimization can proceed, this is actually an improvement when compared to the typical behavior of Bayesian Optimization. Bayesian Optimization usually operates on prism-shaped domains, and in practice has an overwhelming preference to sample the 2^dim corner points of its optimization domain since these tend to represent the most unexplored regions of the search space. In 10D, Simple only needs to test 11 corner points while Bayesian Optimization needs to sample 1024.

Future work

A few features got left out of this proof of concept release, mostly concerning guarantees for convergence with objective functions that plateau near their global optimum. These include enforcing a peak opportunity cost penalty, as well as transforming the values recieved from the objective function. Simple can also be extended to solve MINLP problems.

If there is enough interest, I plan to do a proper rewrite in C with strong support for parallel objective function evaluations, suitable for large-scale optimization in HPC applications.

P.S.

If you like this work, I am seeking employment.

simple's People

Contributors

chrisstroemel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simple's Issues

Rust port

Thank you for publishing this great algorithm (terrible name for online visibility but great idea)!

I wrote a Rust port in order to experiment with it, it currently features a mapping from the unit simplex to the unit hypercube in order to take an hypercube as parameter and a reformulation of the exploration_preference into an exploration_depth parameter which, I hope, has a clearer semantic.

A couple questions

Thanks for this, it appears to be nearly as effective as the bayesian optimization method I was using for my problem yet with much less overhead. I had a few "theoretical " questions before I start trying to look deeper and expand on this:

  • Do you see any issue with limiting some parameters to integer values?
  • You say: "Simple does possess a hard requirement of needing to sample dim+1 corner points before optimization can proceed,"
    -- To do Bayesian Optimization you could hotstart using a small random search and/or hand picked values. Would it make sense to incorporate a step like this into simple at some point of the process? If so, when? What about other schemes to avoid local minima like checking a random parameter set every n steps?

Simplex initialization

Hi! I'm doing some ~10--20 dimensional parameter optimization for an astrophysical model and came across your package whilst researching efficient algorithms. My model takes ~2 mins to run for each call and I do not have access to gradient information for the underlying PDF, so Simple appears like it might be perfect for my needs.

Whilst I'd love to try this package, I am having some trouble working out how to initialize a suitable nd-simplex that covers the entire hyper-rectangle corresponding to my allowed parameter space.

I'm imagining that this would be a common requirement for a lot of people wishing to make use of Simple, and so I was wondering if you had any plans to add a utility function for generating an initial nd-simplex with constraints?

Does simple support noisy functions?

Thank you for the great tool! Baysian optimization really is very slow.

One thing I'm unsure about from the readme is whether Simple assumes that calling f(x) once gives the true value of the function, or if it still works if f(x) is heavily noisy, say returns a random value 95% of the time?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.