Giter VIP home page Giter VIP logo

Comments (3)

HeSchatz avatar HeSchatz commented on May 23, 2024

Hey there. I gave implementing the zoom line search algorithm a try, as I find it to be a generally well-working method. At least for some of the problems I have tackled in the past.

The main obstacle (apart from getting used to the JAX control flow for nested if-expressions) is how to retrieve the descent information needed to evaluate the decrease and curvature conditions as well as interpolate the step sizes. In the current implementation of the search and descent interfaces there seems to be no general way to retrieve the descent direction after the descent step is made. Also, the descent methods currently have no common naming scheme regarding the descent direction, if it's even calculated explicitly (I get that this may not alway be necessary), which further complicates things. My understanding is that one would have to divide the current step by the step size to (re-)evaluate the descent direction, which may generally be a bad idea for a value that is supposed to decrease with further convergence. Or am I missing something obvious?

If not, my proposal would be to modify the interface of the descent methods in a way that allows to retrieve the current descent direction in a unified way. Also, I think the abstract line search step method should have an additional argument for the descent direction or even the descent state, as this is most likely required by complex line search methods in one way or another.

On a side note, I'm also working on implementing the L-BFGS algorithm (kind of finished, at least from a methodological point of view), which also relies on common interfaces for descents.

Looking forward to further discuss this topic!

from optimistix.

packquickly avatar packquickly commented on May 23, 2024

Hello! Thanks for the interest in implementing some new line searches.

Just to be clear, for a minimisation problem of a function $f\colon \mathbb{R}^n \to \mathbb{R}$, an Optimistix descent $\delta\colon \mathbb{R}_+ \to \mathbb{R}^n$ is a curve in $\mathbb{R}^n$ parameterised by the step-size. Line searches handle the inner optimisation loop of choosing $\alpha$ such that $f(y_k + \delta(\alpha))$ is approximately minimised with respect to $\alpha$. We only do the classical line-search algorithm $f(y_k + \alpha p_k)$ for a fixed descent direction $p_k \in \mathbb{R}^n$ when using certain descents, such as NewtonDescent and SteepestDescent. We don't have a common name/interface for the descent direction $p_k$ because most the descents don't compute a quantity like $p_k$ at all, so it's intentionally not supported. It also shouldn't be needed for the Wolfe conditions (or related conditions.)

$\delta(\alpha)$ is computed with the step method on any descent (see the abstract base class here.) The way this gets passed to the line search is via the y_eval argument in the line search step method. y_eval is the proposed next step for a given step-size $\alpha_k$. For a step-size $\alpha_k$ the actual diff $\delta(\alpha_k)$ can then be computed by subtracting y_eval from y, see the BacktrackingArmijo line search for an example.

As for technical details of implementing the Wolfe conditions, there should be no issues here. The classic Wolfe conditions for current iterate $y_k \in \mathbb{R}^n$, proposed step-size $\alpha_k \in \mathbb{R}$ and descent direction $p_k \in \mathbb{R}^n$ are the sufficient decrease condition
$$f(y_k + \alpha_k p_k) < f(y_k) + c_1 \alpha_k p^T \nabla f(y_k)$$
and the curvature condition
$$\nabla f(y_k + \alpha_k p_k)^T p_k \geq c_2 \nabla f(y_k)^T p_k$$
for some constants $c_1, c_2$. We can replace the traditional line-search step $\alpha_k p_k$ with the generalised descent $\delta(\alpha_k)$
$$f(y_k + \delta(\alpha_k)) < f(y_k) + c_1 \delta(\alpha_k)^T \nabla f(y_k)$$
and the curvature condition
$$\nabla f(y_k + \delta(\alpha_k))^T \delta(\alpha_k) \geq c_2 \nabla f(y_k)^T \delta(\alpha_k)$$
without changing the meaning of either of the conditions. When $\delta(\alpha) = \alpha p_k$ these are equivalent to the classical Wolfe conditions. Different relaxations of the Wolfe conditions should be generalisable in roughly the same way.

Does this clear things up?

from optimistix.

HeSchatz avatar HeSchatz commented on May 23, 2024

Thank you for taking the time to write down the general line search problem and current implementation in such detail. This actually helps a lot to clarify the problem I'm currently facing.

First, you are totally right about the Wolfe conditions. I understood that the descent information is computed by substracting y from the current evaluate y_eval, which gives

$\delta(\alpha_k) = \alpha_k p_k $

Evaluating the sufficient decrease condition with $\delta(\alpha_k)$ is straight forward, but I did not notice that multiplying the gradient with $p_k$ or $\delta(\alpha_k)$ on both sides of the curvature condition is equivalent.

The problem still persisting is the computation of new trial step sizes $\alpha_{k+1}$ in the zoom phase of the line search algorithm suggested in Nocedal, Wright (2006) - Numerical Optimization, 2nd ed. (algorithm 3.6 and 3.5, respectively).

They propose computing new trial step sizes $\alpha_{k+1}$ by minimizing the cubic function interpolating $\phi(\alpha_{k-1})$. $\phi'(\alpha_{k-1})$, $\phi(\alpha_k)$ and $\phi'(\alpha_k)$. The new step size can then be computed by:

$\alpha_{k+1} = \alpha_k - (\alpha_k - \alpha_{k-1}) \left[ \frac{\phi'(\alpha_k) + d_2 - d_1}{\phi'(\alpha_k) - \phi'(\alpha_{k - 1}) + 2 d_2} \right]$

with

$d_1 = \phi'(\alpha_{k-1}) + \phi'(\alpha_k) - 3 \frac{\phi(\alpha_{k-1}) - \phi(\alpha_k)}{\alpha_{k-1} - \alpha_k}$

$d_2 = \left[ d_1^2 - \phi'(\alpha_{k-1}) \phi'(\alpha_k) \right]^{1/2}$

This requires explicit computation of the derivative

$\phi'(\alpha) = \nabla f(y_k + \alpha p_k)^T p_k = \nabla f(y_k + \delta(\alpha))^T \delta'(\alpha)$

of

$\phi(\alpha) = f(y_k + \alpha p_k) = f(y_k + \delta(\alpha))$

which requires computation of the descent direction $p_k$. At least I could not come up with a way to evaluate the minimizer without doing so.

Please let me know if you have any suggestions. Maybe I'm once again misleaded by the usual notation.

Edit: fixed some typos in equations

from optimistix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.