This is more of an Optim issue than a Pathfinder one, but I'm saving this example here for future reference.
Nocedal and Wright recommend using the More-Thuente line search method with L-BFGS. However, for this particular log-density, More-Thuente fails while Hagher-Zhang succeeds:
using ForwardDiff, Pathfinder, Random
using Pathfinder.Optim
using Pathfinder.Optim.LineSearches
Random.seed!(42)
z = [-0.32, -1.3, -0.79, 0.88, 0.13, -0.72, -0.31, -0.56, 0.28, 0.7, -0.43, -1.68, -0.91, 1.16, 1.04, -1.35, -0.5, 0.21, -0.08, -1.26]
y = [8.96, 11.48, 10.9, 7.11, 10.84, 12.91, 9.55, 11.04, 8.84, 8.91, 10.44, 12.75, 10.39, 8.27, 9.3, 10.55, 10.74, 10.14, 12.13, 9.77]
# α ~ Flat(), β ~ Normal(0, 1), σ ~ HalfNormal(1), y ~ MvNormal(z + α + β, σ*I)
function logp(x)
α, β, logσ = x
σ = exp(logσ)
return (β^2 + σ^2 + sum(abs2, (z .- y .+ α .+ β) ./ σ)) / -2 + logσ
end
∇logp(x) = ForwardDiff.gradient(logp, x)
julia> θ₀ = rand(3) .* 4 .- 2
3-element Vector{Float64}:
0.13273206417544525
-0.18388345765143033
-1.9292526931401426
julia> q, θ = pathfinder(logp, ∇logp, θ₀, 10; optimizer=LBFGS(m=5, linesearch=MoreThuente()));
[ Info: Optimized for 3 iterations. Maximum ELBO of -Inf reached at iteration 1.
julia> q.μ
3-element Vector{Float64}:
17.292063592972053
0.0
-1.6625205206909156e162
The culprit is the gradient, which explodes during L-BFGS computation:
julia> θs, logpθs, ∇logpθs = Pathfinder.maximize_with_trace(logp, ∇logp, θ₀, LBFGS(m=5, linesearch=MoreThuente()));
julia> ∇logpθs
4-element Vector{Vector{Float64}}:
[10040.660277421557, 10040.844160879209, 110412.25151284959]
[-2.8542596163821086e-160, -16.975762325101126, -1.6625205206909156e162]
[10040.660277421557, 10040.844160879209, 110412.25151284956]
[10040.660277421557, 10040.844160879209, 110412.25151284956]
However, if we use the HagerZhang
line search method, then we get reasonable results.
julia> q, θ = pathfinder(logp, ∇logp, θ₀, 10; optimizer=LBFGS(m=5, linesearch=HagerZhang()));
[ Info: Optimized for 17 iterations. Maximum ELBO of -7.35 reached at iteration 15.
julia> q.μ
3-element Vector{Float64}:
10.5415000553547
-4.1972214879016894e-8
1.1392597435574199
The More-Thuente line search by itself doesn't seem to be the issue though. If we use it with ConjugateGradient
, then we likewise get a reasonable answer:
julia> q, θ = pathfinder(logp, ∇logp, θ₀, 10; optimizer=ConjugateGradient(linesearch=MoreThuente()));
[ Info: Optimized for 27 iterations. Maximum ELBO of -7.44 reached at iteration 11.
julia> q.μ
3-element Vector{Float64}:
10.491138291213138
0.026184553956604018
1.1552195552341622