Giter VIP home page Giter VIP logo

pathfinder.jl's Introduction

Pathfinder.jl: Parallel quasi-Newton variational inference

Stable Dev Build Status Coverage Code Style: Blue ColPrac: Contributor's Guide on Collaborative Practices for Community Packages DOI

An implementation of Pathfinder, 1 a variational method for initializing Markov chain Monte Carlo (MCMC) methods.

Footnotes

  1. Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari (2021). Pathfinder: Parallel quasi-Newton variational inference. arXiv: 2108.03782 [stat.ML]. Code

pathfinder.jl's People

Contributors

burtonjosh avatar dkarrasch avatar github-actions[bot] avatar sethaxen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pathfinder.jl's Issues

More-Thuente line search fails for posterior

This is more of an Optim issue than a Pathfinder one, but I'm saving this example here for future reference.

Nocedal and Wright recommend using the More-Thuente line search method with L-BFGS. However, for this particular log-density, More-Thuente fails while Hagher-Zhang succeeds:

using ForwardDiff, Pathfinder, Random
using Pathfinder.Optim
using Pathfinder.Optim.LineSearches
Random.seed!(42)

z = [-0.32, -1.3, -0.79, 0.88, 0.13, -0.72, -0.31, -0.56, 0.28, 0.7, -0.43, -1.68, -0.91, 1.16, 1.04, -1.35, -0.5, 0.21, -0.08, -1.26]
y = [8.96, 11.48, 10.9, 7.11, 10.84, 12.91, 9.55, 11.04, 8.84, 8.91, 10.44, 12.75, 10.39, 8.27, 9.3, 10.55, 10.74, 10.14, 12.13, 9.77]

# α ~ Flat(), β ~ Normal(0, 1), σ ~ HalfNormal(1), y ~ MvNormal(z + α + β, σ*I)
function logp(x)
    α, β, logσ = x
    σ = exp(logσ)
    return^2 + σ^2 + sum(abs2, (z .- y .+ α .+ β) ./ σ)) / -2 + logσ
end
∇logp(x) = ForwardDiff.gradient(logp, x)
julia> θ₀ = rand(3) .* 4 .- 2
3-element Vector{Float64}:
  0.13273206417544525
 -0.18388345765143033
 -1.9292526931401426

julia> q, θ = pathfinder(logp, ∇logp, θ₀, 10; optimizer=LBFGS(m=5, linesearch=MoreThuente()));
[ Info: Optimized for 3 iterations. Maximum ELBO of -Inf reached at iteration 1.

julia> q.μ
3-element Vector{Float64}:
 17.292063592972053
  0.0
 -1.6625205206909156e162

The culprit is the gradient, which explodes during L-BFGS computation:

julia> θs, logpθs, ∇logpθs = Pathfinder.maximize_with_trace(logp, ∇logp, θ₀, LBFGS(m=5, linesearch=MoreThuente()));

julia> ∇logpθs
4-element Vector{Vector{Float64}}:
 [10040.660277421557, 10040.844160879209, 110412.25151284959]
 [-2.8542596163821086e-160, -16.975762325101126, -1.6625205206909156e162]
 [10040.660277421557, 10040.844160879209, 110412.25151284956]
 [10040.660277421557, 10040.844160879209, 110412.25151284956]

However, if we use the HagerZhang line search method, then we get reasonable results.

julia> q, θ = pathfinder(logp, ∇logp, θ₀, 10; optimizer=LBFGS(m=5, linesearch=HagerZhang()));
[ Info: Optimized for 17 iterations. Maximum ELBO of -7.35 reached at iteration 15.

julia> q.μ
3-element Vector{Float64}:
 10.5415000553547
 -4.1972214879016894e-8
  1.1392597435574199

The More-Thuente line search by itself doesn't seem to be the issue though. If we use it with ConjugateGradient, then we likewise get a reasonable answer:

julia> q, θ = pathfinder(logp, ∇logp, θ₀, 10; optimizer=ConjugateGradient(linesearch=MoreThuente()));
[ Info: Optimized for 27 iterations. Maximum ELBO of -7.44 reached at iteration 11.

julia> q.μ
3-element Vector{Float64}:
 10.491138291213138
  0.026184553956604018
  1.1552195552341622

NaNs introduced with 3 posteriordb models

Nikolas Siccha on Slack shared:

I've tried [Pathfinder.jl] for a few (3) models from posteriordb, and pathfinder seems to run into nans relatively often, without being able to recover.

It was the dogs-dogs, kilpisjarvi_mod-kilpisjarvi and diamonds-diamonds posteriors from posteriordb, brought to Julia using bridgestan.

Options for approximation?

Since the MAP step seems to run to convergence anyway, it seems like it might be straightforward to get to a Laplace approximation. In some settings, it could be useful to have this instead of or in addition to the variational fit. Maybe this could be a configuration option?

Load time is doubled on the latest Julia beta

Julia 1.7.2

julia> @time using Pathfinder
  2.264230 seconds (7.80 M allocations: 538.022 MiB, 5.56% gc time, 33.56% compilation time)

Julia 1.8.0-beta3

julia> @time using Pathfinder
  4.412077 seconds (11.26 M allocations: 765.483 MiB, 7.98% gc time, 55.01% compilation time)

Nothing to do yet, just tracking this for future reference.

Making Optimization (formerly GalacticOptim) optional

On the 1.8 beta, loading GalacticOptim amd GalacticOptimJL together makes up over half the load time of Pathfinder.

julia> @time_imports using Pathfinder
    ...
   1172.3 ms    ┌ GalacticOptim
   1237.2 ms  ┌ GalacticOptimJL
    ...
   4552.7 ms  Pathfinder

This is not ideal, since we use these to support experimentation with other optimizers, not for normal usage. Currently we build everything around GalacticOptim, but perhaps we should make this conditionally loaded and default to doing everything with Optim types.

Relates #36

Benchmark with posteriordb models

We should benchmark the method similarly to the Pathfinder paper by running Pathfinder on many posteriordb models using code adapted from #94 (comment). We shouldn't do this in CI though, because the benchmarks would probably take a long time to run, but it would be nice to show the results in the docs and periodically re-run them.

We should look at how other packages like some in the SciML ecosystem do this.

Using Optimization.MaxSense

Currently we need to negate the user-provided function, gradients, and Hessian to to maximize the log-density with Optimization.jl, which by default assumes the optimization problem should be minimized. However, OptimizationProblem can take a sense=Optimization.MaxSense keyword to treat the optimization problem as a maximization one.

This is only worth doing if we can make sure that whether we use Optim.jl or Optimization.jl, the downstream code can be the same.

Return all intermediates in a custom struct

Currently pathfinder only returns the ELBO-maximizing distribution as well as the requested samples, but for debugging and visualization, it would be better to return a PathfinderResult object that contains also the trace, gradients, and all Laplace approximations.

multipathfinder could return a MultiPathfinderResult object that contains the vector of objects, all draws used for importance sampling, and the Pareto k parameter returned by PSIS.

Turing integration

We should include the following methods:

  • pathfinder(::DynamicPPL.Model, ndraws; init=DynamicPPL.SampleFromUniform(), kwargs...) -> MCMCChains.Chains
  • multipathfinder(::DynamicPPL.Model, nruns, ndraws; init=DynamicPPL.SampleFromUniform(), kwargs...) -> MCMCChains.Chains

Here the convenience method takes care of

  1. getting an initial point
  2. getting the log-density for the unconstrained form of the model
  3. running pathfinder and getting draws
  4. transforming the draws back to the constrained space
  5. packing the draws into a familiar Chains object, whose info contains the Pathfinder results

This would make it more straightforward for Turing users to incorporate Pathfinder into their workflows.

Fixing type-noninferrability from Optimization.solve

Optimization v3.13.1 implemented SciMLBase.init for OptimizationProblem, which caused a check for type-inferrability in our test suite to begin failing (see first time it failed here).

It's not yet clear to me which change broke type-inference, or exactly how broken type-inference is, but perhaps we can fix it by using SciMLBase.init ourselves, followed by SciMLBase.solve!.

Choosing AD backend in Turing integration

Does Pathfinder interact with the AD backend chosen in Turing, ie.., Turing.setadbackend ? As far as I can tell from the source, there is no way to pass a specific AD backend when using the Turing integration - it seems like Forwarddiff is used as the default.

P.S.
Thanks for developing Pathfinder.jl ! Great to have Pathfinder in the Julia ecosystem.

multipathfinder - no method matching iterate

Dear Seth,

Cool package! I have some troubles running the multi-pathfinder with the most recent release. The actual computation works fine, but it seems like an iterator is not working when collecting the draws (line 196 in multipath.jl). MWE:

using Distributions #Distributions v0.25.59
using Pathfinder #Pathfinder v0.4.5
using ForwardDiff


function targetclosure(data)
    function target(θ)
        μ = sum(θ)
        return sum( logpdf.(Normal(μ, 1.0), data) )
    end
end

target = targetclosure(randn(1000))
θ = randn(2)
∇logp(θ) = ForwardDiff.gradient(target, θ)

#using Pathfinder works -> posterior mean of θ == 0
result = pathfinder(target, ∇logp; init = θ)

#using multipathfinder
Nruns = 10
θ_initial = [zeros(2) for _ in Base.OneTo(Nruns)]
for iter in eachindex(θ_initial)
    θ_initial[iter] = randn(2)
end

q, ϕ, component_ids = multipathfinder(
    target,
    ∇logp,
    Nruns;
    init = θ_initial
) #MethodError: no method matching iterate(::MultiPathfinderResult{Tuple{var"#target#8"{Vector{Float64}}, typeof(∇logp)}

Best regards,
Patrick

Support alternative ways of choosing normal approximations

Given an optimization trace, Pathfinder proposes the multivariate normal approximation constructed from the trace that maximizes the ELBO. It does this by approximating the ELBO at each point.

The discussion notes that instead of exhaustively approximating the ELBO at each point, Bayesian optimization could be used to optimize over (or even between) the points. More generally, we could allow alternative objective functions than ELBO and allow any discrete optimizer to be provided. While between points, we could interpolate means, we'd need to think a bit about how to interpolate covariances between points.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Making subpackages for compatibility with HMC packages

Currently we use Requires to conditionally load DynamicHMC or AdvancedHMC and implement functionality for compatibility. We should consider instead making subpackages PathfinderDynamicHMC and PathfinderAdvancedHMC for better dependency management.

Switching to Hager-Zhang line search

Optim.LBFGS defaults to LineSearches.HagerZhang for the linesearch, while we currently use Linesearches.MoreThuente in keeping with the Pathfinder paper.

However, in informal tests, I tend to encounter more numerical issues with MoreThuente, and Pathfinder needs to retry much more often. Moreover, note the different performance between the two methods on a multivariate normal log density:

julia> using Pathfinder, LinearAlgebra, Optim

julia> Σ = [
           2.71   0.5    0.19   0.07   1.04
           0.5    1.11  -0.08  -0.17  -0.08
           0.19  -0.08   0.26   0.07  -0.7
           0.07  -0.17   0.07   0.11  -0.21
           1.04  -0.08  -0.7   -0.21   8.65
       ];

julia> μ = [-0.55, 0.49, -0.76, 0.25, 0.94];

julia> P = inv(Symmetric(Σ));

julia> function logp_mvnormal(x)
           z = x - μ
           return -dot(z, P, z) / 2
       end;

julia> result = pathfinder(logp_mvnormal; dim=5, init_scale=4)
Single-path Pathfinder result
  tries: 1
  draws: 5
  fit iteration: 25 (total: 27)
  fit ELBO: 3.98 ± 0.18
  fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}, UpperTriangular{Float64, Matrix{Float64}}}, Vector{Float64}}(
dim: 5
μ: [-0.5499999997063411, 0.4899999984855008, -0.7599999997859143, 0.2500000006537195, 0.9399999998847138]
Σ: [3.033268135838942 0.4958177031900541  0.2454020867450207 1.1825514484977466; 0.4958177031900541 1.074411231505785  -0.16076485670008017 -0.0911342669944741;  ; 0.2454020867450207 -0.16076485670008017  0.20144627493353065 -0.12991587816970984; 1.1825514484977466 -0.0911342669944741  -0.12991587816970984 8.704692432462565]
)


julia> Σ - result.fit_distribution.Σ
5×5 Matrix{Float64}:
 -0.323268    0.0041823    0.0499336   -0.175402    -0.142551
  0.0041823   0.0355888   -0.00722606  -0.00923514   0.0111343
  0.0499336  -0.00722606  -0.00640033   0.0291832    0.019554
 -0.175402   -0.00923514   0.0291832   -0.0914463   -0.0800841
 -0.142551    0.0111343    0.019554    -0.0800841   -0.0546924

julia> result2 = pathfinder(logp_mvnormal; dim=5, init_scale=4, optimizer=LBFGS(m=6))
Single-path Pathfinder result
  tries: 1
  draws: 5
  fit iteration: 5 (total: 5)
  fit ELBO: 3.84 ± 0.0
  fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}, UpperTriangular{Float64, Matrix{Float64}}}, Vector{Float64}}(
dim: 5
μ: [-0.55, 0.49, -0.76, 0.25, 0.94]
Σ: [2.709999999999999 0.49999999999999983  0.07000000000000031 1.0399999999999991; 0.49999999999999983 1.1099999999999994  -0.17000000000000087 -0.08000000000000052;  ; 0.07000000000000031 -0.17000000000000087  0.11000000000000121 -0.21000000000000002; 1.0399999999999991 -0.08000000000000052  -0.21000000000000002 8.65]
)

julia> Σ - result2.fit_distribution.Σ
5×5 Matrix{Float64}:
  8.88178e-16   5.55112e-16  -1.94289e-16  -1.52656e-16  -2.22045e-16
 -3.33067e-16  -2.22045e-16   8.18789e-16   3.05311e-16   6.245e-16
 -8.32667e-17   8.46545e-16   2.22045e-16   7.21645e-16  -2.22045e-16
 -2.22045e-16   6.93889e-16   5.13478e-16  -1.20737e-15   1.38778e-16
  6.66134e-16   5.13478e-16  -2.22045e-16   0.0          -1.77636e-15

So Hager-Zhang exactly fits the target distribution, while Moré-Thuente only gets close. This result is reproducible. It's not clear whether this is due to some instability/bug in MoreThuente or if this is expected.

If this trend continues for more models, we should consider switching to HagerZhang as the default linesearch.

Multi-threaded multi-path Pathfinder broken with recent Transducers versions

Transducers v0.4.76 has caused our tests to fail (see first failure here).

Some not-necessarily-relevant notes:

┌ Warning: `reduce(rf, itr::Foldable; kw...)` is deprecated, use `foldxt(rf, itr; kw...)` instead.
│   caller = multipathfinder(optim_fun::OptimizationFunction{true, SciMLBase.NoAD, Pathfinder.var"#f#5"{LogDensityFunctionWithGradHess{var"#logp#44"{FullNormal}, var"#∇f#40"{var"#logp#44"{FullNormal}}, var"#Hf#41"{var"#logp#44"{FullNormal}}}}, Pathfinder.var"#grad#6"{LogDensityFunctionWithGradHess{var"#logp#44"{FullNormal}, var"#∇f#40"{var"#logp#44"{FullNormal}}, var"#Hf#41"{var"#logp#44"{FullNormal}}}}, Pathfinder.var"#hess#7"{LogDensityFunctionWithGradHess{var"#logp#44"{FullNormal}, var"#∇f#40"{var"#logp#44"{FullNormal}}, var"#Hf#41"{var"#logp#44"{FullNormal}}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, ndraws::Int64; init::Nothing, input::LogDensityFunctionWithGradHess{var"#logp#44"{FullNormal}, var"#∇f#40"{var"#logp#44"{FullNormal}}, var"#Hf#41"{var"#logp#44"{FullNormal}}}, nruns::Int64, ndraws_elbo::Int64, ndraws_per_run::Int64, rng::MersenneTwister, history_length::Int64, optimizer::LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.MoreThuente{Float64}, Optim.var"#20#22"}, executor::SequentialEx{NamedTuple{(), Tuple{}}}, executor_per_run::SequentialEx{NamedTuple{(), Tuple{}}}, importance::Bool, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:dim,), Tuple{Int64}}}) at multipath.jl:182
└ @ Pathfinder ~/work/Pathfinder.jl/Pathfinder.jl/src/multipath.jl:182
rng = TaskLocalRNG(): Error During Test at /home/runner/work/Pathfinder.jl/Pathfinder.jl/test/multipath.jl:31

Supporting inputs implementing the LogDensityProblems interface

Both DynamicHMC and now also AdvancedHMC (TuringLang/AdvancedHMC.jl#301) support as inputs objects that implement the LogDensityProblems interface. It makes sense then to support as inputs objects that implement the LogDensityProblems interface. We might even consider switching to using LogDensityProblems internally.

I don't think we should switch to using LogDensityProblems internally, mainly because currently the LogDensityProblems API only supports computing the log density and its gradient, while we want to also support optimizers that compute the Hessian.

Turing is starting to use LogDensityProblems under the hood, but to keep support for Hessians, we should probably continue to use its OptimizationFunction.

It might be worth pushing to have Hessian abilities in the LogDensityProblems API.

Move WoodburyPDMat code to PDMatsExtras

The WoodburyPDMat code here is more general than that in PDMatsExtras.jl. To support the more general case, we use a different set of decompositions. If our version is not much less efficient than PDMatsExtras on the same matrices it supports, then this code should be moved out to PDMatsExtras.

Add more examples to docs

We should add at least the following examples to the docs:

  • banana distribution (#68)
  • a multi-modal distribution
  • donut

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.