turinglang / bijectors.jl Goto Github PK

View Code? Open in Web Editor NEW

184.0 184.0 31.0 787 KB

Implementation of normalising flows and constrained random variable transformations

Home Page: https://turinglang.org/Bijectors.jl/

License: MIT License

Julia 100.00%

bayesian-inference mcmc-sampler transforms turing-language

bijectors.jl's People

Contributors

Stargazers

Watchers

bijectors.jl's Issues

Add LKJ Matrix Distribution

I'm currently trying to build a model in Turing that utilizes the newly added LKJ distribution from Distributions.jl. I ran into issues when I attempted to sample the model:

ERROR: MethodError: no method matching bijector(::LKJ{Float64,Int64})
Closest candidates are:
  bijector(::KSOneSided) at /Users/joshualeond/.julia/packages/Bijectors/bHaf6/src/transformed_distribution.jl:58
  bijector(::Product{Discrete,T,V} where V<:AbstractArray{T,1} where T<:Distribution{Univariate,Discrete}) at /Users/joshualeond/.julia/packages/Bijectors/bHaf6/src/transformed_distribution.jl:39
  bijector(::Product{Continuous,T,Tdists} where Tdists<:(FillArrays.Fill{T,1,Axes} where Axes) where T<:Distribution{Univariate,Continuous}) at /Users/joshualeond/.julia/packages/Bijectors/bHaf6/src/compat/distributionsad.jl:16

It appears that the LKJ distribution maybe needs to be added here in Bijectors:

Bijectors.jl/src/Bijectors.jl

Line 202 in 1f3b581

const PDMatDistribution = Union{MatrixBeta, InverseWishart, Wishart}

However, after I simply added LKJ to this line I hit the following error:

ERROR: MethodError: no method matching getlogp(::LKJ{Float64,Int64}, ::Cholesky{Float64,Array{Float64,2}}, ::Array{Float64,2})
Closest candidates are:
  getlogp(::MatrixBeta, ::Any, ::Any) at C:\Users\dunjos0\.julia\dev\Bijectors\src\Bijectors.jl:231
  getlogp(::Wishart, ::Any, ::Any) at C:\Users\dunjos0\.julia\dev\Bijectors\src\Bijectors.jl:236
  getlogp(::InverseWishart, ::Any, ::Any) at C:\Users\dunjos0\.julia\dev\Bijectors\src\Bijectors.jl:239

So it looks like there needs to be a new getlogp method in Bijectors for the LKJ. I see that these getlogp methods follow closely to the logkernel methods in Distributions.jl but am unsure what needs to be adjusted to make sure it works nicely with Turing.

Better fallback for inversion/logpdf

Right now the behavior when Roots.find_zero() (particularly with PlanarLayer and RadialLayer) fails is to stop everything. It would be nice to have a helper message on what to correct/improve to make the model more stable or to have a fallback which makes approximations but is more stable!

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

`@debug` causing type instability and performance hit

It seems the @debug statements when expanded are causing a type instability and performance hit. ref: #17

`Bijector` support for batch computation

Problem

Currently the Bijector interface doesn't "officially" support batch computations. With this I mean, what do you do when a SimplexBijector receives an AbstractMatrix rather than an AbstractVector as is expected? In some cases it's fairly straight forward to do so, e.g. the SimplexBijector only makes sense when applied to a vector so when we receive a matrix we know it's a batch and we can just do the right thing. In other cases what to do with a Real vs. AbstractVector{<:Real} is not so clear, e.g. Exp elementwise is a well-defined bijection on Real, AbstractVector, etc. so what do we do in logabsdetjac(b::Exp, x) when x isa AbstractArray? Is x a batch of inputs so we should return [y[i] for i = 1:length(x)], or is it intended as a vector and we should return sum(y) as is need for the distribution MvLogNormal?

To drive the point home, what do we do in this situation:

julia> struct Scale{T}
           a::T
       end

julia> (b::Scale)(x) = b.a .* x

julia> b = Scale(2.0)
Scale{Float64}(2.0)

julia> x = randn()
0.2414826706733002

julia> b(x)
0.4829653413466004

julia> logabsdetjac(b::Scale, x) = log(abs(b.a))
logabsdetjac (generic function with 1 method)

julia> logabsdetjac(b, x)
0.6931471805599453

julia> x = rand(2) # suppose we want to scale vectors elementwise instead
2-element Array{Float64,1}:
 0.7910530695591607 
 0.03333662406026572

julia> b(x) # transform still works nicely
2-element Array{Float64,1}:
 1.5821061391183213 
 0.06667324812053144

julia> logabsdetjac(b, x)
0.6931471805599453

julia> x = rand(2) # batch of 2
2-element Array{Float64,1}:
 0.546411899713563  
 0.16872866221657734

julia> b(x) # transform works on batch as expected
2-element Array{Float64,1}:
 1.092823799427126 
 0.3374573244331547

julia> logabsdetjac(b, x) # NO! Want this to return [logabsdetjac(b, x[1]), logabsdetjac(b, x[2])]!
0.6931471805599453

In this example we could use the parametric type T of Scale{T} to decide what to do, but what do we do when T <: AbstractVector and logabsdetjac is given an AbstractMatrix? Is Scale intended to act on a matrix by scaling each column elementwise or is it intended to act on a vector and this matrix is actually a batch?

At the time of writing, some of the bijectors support batch computation by "accident", e.g. SimplexBijector there is only one thing to do when given an AbstractMatrix, but it's crucial that we have support for such a feature for all bijectors for it to be of any use. If not, a lot of incompatibilities will arise when users try to compose two bijectors, one supporting batch-computation and one does not; then you end up with size-mismatches in the logabsdetjac field.

Possible solutions

1. `Univariate`, `Multivariate`, etc.

We could introduce something similar to Distributions.jl's Univariate, Multivariate, etc. The example of Scale would then instead be

struct Scale{O, T}
    a::T
end

(b::Scale)(x) = b.a .* x

logabsdetjac(b::Scale{Univariate}, x::Real) = log(abs(b.a))
logabsdetjac(b::Scale{Univariate}, x::AbstractVector) = log(abs(b.a)) .* ones(length(x))

logabsdetjac(b::Scale{Multivariate}, x::AbstractVector{<:Real}) = sum(log(abs(b.a)))
logabsdetjac(b::Scale{Multivariate}, x::AbstractMatrix{<:Real}) = sum(log(abs(b.a))) .* ones(size(x, 2))

and so on.

As suggested by @xukai92, this can also be used to specify the device location if we just use aliases like Univariate = Real, etc. This way one could have specific implementations for CuArray, etc.

The main issue I see is that we'll then have to force this upon all bijectors, some for which it is redundant information as they are only well-defined on a single specific space.

2. Fixed-size

A super-simple solution is to force each bijector should to the dimensions of the output upon construction. I personally don't like it as it can be very inflexible and in most cases the bijection is well-defined on a particular space, so the dimension can be quite redundant.

3. ???

Unit tests for PlanarLayer failed in Julia 1.5

] test

raise:

PlanarLayer: Test Failed at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:30
  Expression: ≈((inv(flow))(flow(z)), z, rtol = 0.2)
   Evaluated: [2.2075135426840555 -0.18182425891670018 … 1.3994120484164432 0.9248803865772497; -1.5719067963233282 -0.1836889984451685 … 0.7022959819078269 0.27282017311337303] ≈ [1.9716204287413468 -0.15707698104964826 … 1.3606109850208474 0.4787377009574211; -1.3961992590989196 -0.20212227601885754 … 0.7311973736313007 0.6051343777576946] (rtol=0.2)
Stacktrace:
 [1] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:30
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
 [3] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:23
PlanarLayer: Test Failed at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:31
  Expression: ≈((inv(flow) ∘ flow)(z), z, rtol = 0.2)
   Evaluated: [2.2075135426840555 -0.18182425891670018 … 1.3994120484164432 0.9248803865772497; -1.5719067963233282 -0.1836889984451685 … 0.7022959819078269 0.27282017311337303] ≈ [1.9716204287413468 -0.15707698104964826 … 1.3606109850208474 0.4787377009574211; -1.3961992590989196 -0.20212227601885754 … 0.7311973736313007 0.6051343777576946] (rtol=0.2)
Stacktrace:
 [1] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:31
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
 [3] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:23
Test Summary: | Pass  Fail  Total
PlanarLayer   |   11     2     13

Strangely, this test passed in Julia 1.4.1.

I guess it should be attributed to seed (seed!(1) before tests), since even in Julia 1.4.1, which passed the test, will raise similar errors when run corresponding code multiple times or change seed (ex: seed!(3)). Is it valid to specify a new seed to make this test pass? But this will lead to subtle version compatible problems.

Model call: `no method matching eps(::Type{Real})` when reusing a `VarInfo`

With SampleFromPrior and DefaultContext:

julia> @model function bernoulli_mixture(x)
           w ~ Dirichlet(2, 1.0)
           p ~ DiscreteNonParametric([0.3, 0.7], w)
           x ~ Bernoulli(p)
       end

julia> vi = VarInfo();
julia> bernoulli_mixture(false)(vi, SampleFromPrior(), DefaultContext())
false

julia> bernoulli_mixture(false)(vi, SampleFromPrior(), DefaultContext())
ERROR: MethodError: no method matching eps(::Type{Real})
Closest candidates are:
  eps(::Dates.Time) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Dates/src/types.jl:387
  eps(::Dates.Date) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Dates/src/types.jl:386
  eps(::Dates.DateTime) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Dates/src/types.jl:385
  ...
Stacktrace:
 [1] logpdf_with_trans(::Dirichlet{Float64}, ::Array{Real,1}, ::Bool) at /home/philipp/.julia/packages/Bijectors/bHaf6/src/Bijectors.jl:124
 [2] assume(::Random._GLOBAL_RNG, ::SampleFromPrior, ::Dirichlet{Float64}, ::VarName{:w,Tuple{}}, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:142
 [3] _tilde at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:59 [inlined]
 [4] tilde at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:23 [inlined]
 [5] tilde_assume(::Random._GLOBAL_RNG, ::DefaultContext, ::SampleFromPrior, ::Dirichlet{Float64}, ::VarName{:w,Tuple{}}, ::Tuple{}, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:52
 [6] macro expansion at ./REPL[23]:2 [inlined]
 [7] ##evaluator#453(::Random._GLOBAL_RNG, ::Model{var"###evaluator#453",(:x,),Tuple{Bool},(),ModelGen{var"###generator#454",(:x,),(),Tuple{}}}, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::SampleFromPrior, ::DefaultContext) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/compiler.jl:356
 [8] evaluate_threadunsafe at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/model.jl:157 [inlined]
 [9] (::Model{var"###evaluator#453",(:x,),Tuple{Bool},(),ModelGen{var"###generator#454",(:x,),(),Tuple{}}})(::Random._GLOBAL_RNG, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::SampleFromPrior, ::DefaultContext) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/model.jl:136
 [10] (::Model{var"###evaluator#453",(:x,),Tuple{Bool},(),ModelGen{var"###generator#454",(:x,),(),Tuple{}}})(::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::SampleFromPrior, ::Vararg{Any,N} where N) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/model.jl:126
 [11] top-level scope at REPL[38]:1
 [12] eval(::Module, ::Any) at ./boot.jl:330
 [13] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:86
 [14] run_backend(::REPL.REPLBackend) at /home/philipp/.julia/packages/Revise/AMRie/src/Revise.jl:1023
 [15] top-level scope at REPL[2]:0

The same thing does not happen with LikelihoodContext:

julia> vi = VarInfo();
julia> bernoulli_mixture(false)(vi, SampleFromPrior(), LikelihoodContext())
false

julia> bernoulli_mixture(false)(vi, SampleFromPrior(), LikelihoodContext())
false

Tests using Flux.Tracker

We should add test which test agains the use of Flux.Tracker for AD. While playing around with the transformations I encountered some issues while using Flux.Tracker and simplex distributions.

Skellam (special case) is broken with Zygote and SpecialFunctions >= 1

Currently, these distributions are broken due to FluxML/Zygote.jl#873. As discussed in #155, the corresponding tests are marked as broken. The issue will be fixed upstream.

Zygote ad backend for normalizing flows

Hi,
The Normalizing flow example uses Tracker, a discontinued AD package.
I am trying to fit a NF using Zygote, but I have some problems.

Example:

@model function gen()
    x ~ Exponential(0.5)
    y ~ Normal(0, x)
end
s = sample(gen(), NUTS(), 500)
train_data = hcat(s[:x].data, s[:y].data)' |> Array

b = PlanarLayer(2) ∘ LeakyReLU{Float64, 1}(0.05) ∘ RadialLayer(2) ∘ LeakyReLU{Float64, 1}(0.05) ∘ PlanarLayer(2)
d = MvNormal(zeros(2), ones(2))
tb = transformed(d, b);
loss(tb :: Bijectors.TransformedDistribution, x :: Matrix{Float64}) = begin
    return sum(-logpdf(tb, x))
end
function nf_train(tb, x, opt, ps, epochs)
    @showprogress for i ∈ 1:epochs
        train_loss, back = Zygote.pullback(() -> loss(tb, x), ps)
        gs = back(one(train_loss))
        Flux.update!(opt, ps, gs)
    end
end
nf_train(tb, train_data, ADAM(), Zygote.Params(Flux.params(b)), 10)

I get the error:

Mutating arrays is not supported

gs = back(one(train_loss))

Any way I can make this work?

Thanks

Zygote currently not working with inversion

Having some trouble with taking the gradient wrt. parameters of a Bijector using Zygote.jl when the computation involves a inversion. MVEs:

julia> using Zygote

julia> using Bijectors

julia> using Bijectors: Shift

julia> Zygote.gradient(θ -> sum(Shift(θ[1:1])(zeros(1))), randn(1))  # <= works
([1.0],)

julia> Zygote.gradient(θ -> sum(inv(Shift(θ[1:1]))(zeros(1))), randn(1))  # <= fails
ERROR: MethodError: no method matching adjoint(::Shift{Array{Float64,1},1})
Closest candidates are:
  adjoint(::Missing) at missing.jl:100
  adjoint(::IRTools.Inner.CFG) at /var/home/tef30/.julia/packages/IRTools/YplhY/src/passes/passes.jl:29
  adjoint(::Number) at number.jl:193
  ...
Stacktrace:
 [1] (::Zygote.var"#1325#1326"{Shift{Array{Float64,1},1}})(::NamedTuple{(:a,),Tuple{FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/lib/array.jl:354
 [2] (::Zygote.var"#3416#back#1327"{Zygote.var"#1325#1326"{Shift{Array{Float64,1},1}}})(::NamedTuple{(:a,),Tuple{FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at /var/home/tef30/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [3] (::typeof(∂(#5)))(::Float64) at ./REPL[5]:1
 [4] (::Zygote.var"#38#39"{typeof(∂(#5))})(::Float64) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:36
 [5] gradient(::Function, ::Array{Float64,1}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:45
 [6] top-level scope at REPL[5]:1

This is also an issue in the case where we use Inverse:

julia> Zygote.gradient(θ -> Bijectors.Logit(θ[1], θ[2])(0.5), [0.0, 1.0])
([-2.0, -2.0],)

julia> Zygote.gradient(θ -> inv(Bijectors.Logit(θ[1], θ[2]))(0.5), [0.0, 1.0])
ERROR: MethodError: no method matching adjoint(::Inverse{Bijectors.Logit{Float64},0})
Closest candidates are:
  adjoint(::Missing) at missing.jl:100
  adjoint(::IRTools.Inner.CFG) at /var/home/tef30/.julia/packages/IRTools/YplhY/src/passes/passes.jl:29
  adjoint(::Number) at number.jl:193
  ...
Stacktrace:
 [1] (::Zygote.var"#1325#1326"{Inverse{Bijectors.Logit{Float64},0}})(::NamedTuple{(:orig,),Tuple{NamedTuple{(:a, :b),Tuple{Float64,Float64}}}}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/lib/array.jl:354
 [2] (::Zygote.var"#3416#back#1327"{Zygote.var"#1325#1326"{Inverse{Bijectors.Logit{Float64},0}}})(::NamedTuple{(:orig,),Tuple{NamedTuple{(:a, :b),Tuple{Float64,Float64}}}}) at /var/home/tef30/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [3] #9 at ./REPL[21]:1 [inlined]
 [4] (::typeof(∂(#9)))(::Float64) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface2.jl:0
 [5] (::Zygote.var"#38#39"{typeof(∂(#9))})(::Float64) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:36
 [6] gradient(::Function, ::Array{Float64,1}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:45
 [7] top-level scope at REPL[21]:1

invlink for simplex distributions

Hi,

I've played around with the invlink function for simplex distributions and checked the correctness of the current implementation against the code used in Stan. (https://github.com/stan-dev/math/blob/502487c511594ccb93eb979df5b8fe163becb417/stan/math/rev/mat/fun/simplex_constrain.hpp)

The reimplementation I used is:

function invlink_simplex(y::AbstractVector{T}) where {T<:Real}
    N = length(y)
    stick = one(T)
    x = zeros(T, length(y))
    for k in 1:(N-1)
        lk = map(T, log(N - k))
        z = logistic(y[k] - lk)
        x[k] = stick * z
        stick -= x[k]
    end
    x[N] = stick
    return x
end

and I found quite a bit of speed improvement if this implementation is used...

K = 1000
y = rand(K)
d = Dirichlet(K, 1.0)

@btime invlink_simplex(y);
@btime invlink(d, y);
  77.077 μs (1 allocation: 7.94 KiB)
  369.305 μs (1 allocation: 7.94 KiB)

Should I make a PR for this or do we think there are stability issues with the code?

Setup Travis and AppVeyor

This repo needs travis and appveyor files.

ReverseDiff doesn't support callable struct

See:

JuliaDiff/ReverseDiff.jl#135

and comments on:

#125

I have opened a PR here to add support for callable struct:

JuliaDiff/ReverseDiff.jl#144

If ReverseDiff supports callable struct. It's possible to define adjoint/grad on callable struct, like (ib::Inverse{<:CorrBijector})(y::AbstractMatrix{<:Real}) directly. So some "internal" functions used to support AD, such as _inv_link_chol_lkj and _link_chol_lkj can be removed to simplify code.

Improve Logit

Unlike Scale, Exp or Log, Logit is currently defined as a subtype of Bijector{0}

struct Logit{T<:Real} <: Bijector{0}
    a::T
    b::T
end

This avoids this to be applied to vectors and matrices easily. We should extend it to Bijectors{N} in a similar way to other bijectors.

TransformedDistributions should support maximum, minimum

Was testing these against the UnivariateDistribution fallback in Expectations.jl, but it fails because these methods aren't defined.

Document APIs in readme file.

Perhaps we can add a documentation on the main APIs in the readme.md file, in a way similar to MCMCChain.

Related packages

Here are some related julia packages:

Other related packages

TensorFlow distributions (sec 4): https://arxiv.org/pdf/1711.10604.pdf
Stan (sec 35): https://github.com/stan-dev/stan/releases/download/v2.17.0/stan-reference-2.17.0.pdf
PyTorch distributions: https://github.com/pytorch/pytorch/tree/master/torch/distributions

Related issue:

New interface

logpdf(t::Transform, l::Distribution, x)

LKJ follow-up

Some things that probably can be improved as a follow-up to the initial implementation (but might require changes in other packages such as AD backends):

Avoid dense matrices (currently triangular matrices are converted to dense matrices in a quite hacky way although it shouldn't be needed to work with dense matrices)
Simplify the custom adjoints and potentially improve numerical stability (while the implementation of the link and inverse link seems to be quite optimized, probably the custom adjoints can be improved)

Rename `rv` from `forward(b::Bijector, x)` result

At the moment, if you call forward(b, x) you get back a named tuple (rv = b(x), logabsdetjac = logabsdetjac(b, x)). This was introduced in the first PR for the new interface.

When I started out working on this, using rv made sense as the transform was always related to some Distribution. Now a Bijector can be used as a standalone transform to do more than just transform random variables, e.g. constrained-to-unconstrained transformations in optimization. Therefore I suggest we rename rv.

A couple of suggestions (in no particular order):

y since throughout the entire package it's a recurring theme where x is "un-transformed" and y denotes transformed. Also, in forward, x is the input. This also corresponds with the (x = x, y = b(x), ...) returned by calling forward(d::TransformedDistribution). Though could also be up for discussion in this issue.
val or value
res or result

Any suggestions?

logabsdetjac with Stacked throws: "ArgumentError: reducing over an empty collection is not allowed"

Consider the following code:

using Bijectors

d0 = (Beta(), MvNormal(2, 1.0))
d1 = (MvNormal(2, 1.0),)

sb0 = Stacked(bijector.(d0), [1:1, 2:3])
sb1 = Stacked(bijector.(d1), [1:2])

logabsdetjac(sb0, [.5, 1, 2]) # <- does not throw
logabsdetjac(sb1, [.5, 1])    # <- throws

The last line throws an error:

ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:299
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:342
 [3] reduce_empty(::Base.MappingRF{Bijectors.var"#26#27"{Stacked{Tuple{Identity{1}},1},Array{Float64,1}},typeof(Base.add_sum)}, ::Type{Int64}) at ./reduce.jl:329
 [4] reduce_empty_iter at ./reduce.jl:355 [inlined]
 [5] mapreduce_empty_iter(::Function, ::Function, ::UnitRange{Int64}, ::Base.HasEltype) at ./reduce.jl:351
 [6] _mapreduce(::Bijectors.var"#26#27"{Stacked{Tuple{Identity{1}},1},Array{Float64,1}}, ::typeof(Base.add_sum), ::IndexLinear, ::UnitRange{Int64}) at ./reduce.jl:400
 [7] _mapreduce_dim at ./reducedim.jl:318 [inlined]
 [8] #mapreduce#620 at ./reducedim.jl:310 [inlined]
 [9] mapreduce at ./reducedim.jl:310 [inlined]
 [10] _sum at ./reducedim.jl:724 [inlined]
 [11] #sum#628 at ./reducedim.jl:720 [inlined]
 [12] sum at ./reducedim.jl:720 [inlined]
 [13] logabsdetjac(::Stacked{Tuple{Identity{1}},1}, ::Array{Float64,1}) at .../.julia/packages/Bijectors/98OWP/src/bijectors/stacked.jl:102
 [14] top-level scope at REPL[16]:1
 [15] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088

The cause seems to be here:

Bijectors.jl/src/bijectors/stacked.jl

Lines 96 to 105 in dfca913

 (sb::Stacked)(x::AbstractMatrix{<:Real}) = eachcolmaphcat(sb, x) 

 function logabsdetjac( 

 b::Stacked{<:Any, N}, 

 x::AbstractVector{<:Real} 

 ) where {N} 

 init = sum(logabsdetjac(b.bs[1], x[b.ranges[1]])) 

 init + sum(2:N) do i 

 sum(logabsdetjac(b.bs[i], x[b.ranges[i]])) 

 end 

 end

because typeof(sb1) is Stacked{Tuple{Identity{1}},1} so N = 1 and 2:1 is empty.

A possible solution is:

function logabsdetjac(
    b::Stacked{<:Any, N},
    x::AbstractVector{<:Real}
) where {N}
    init = sum(logabsdetjac(b.bs[1], x[b.ranges[1]]))
    mapreduce(+, 2:N; init = init) do i
    	sum(logabsdetjac(b.bs[i], x[b.ranges[i]]))
	end
end

then mapreduce doesn't throw because an explicit init is given.

Rename `logpdf_with_trans` to `logpdf_with_jacobian`?

The current function name logpdf_with_trans is a bit unclear. Can we rename it to something more self-explanatory, e.g. logpdf_with_jacobian?

Overloading `Base.show`

Currently we've done nothing to improve the "visual inspection" of objects. At the moment it can be very difficult for the user to inspect a Bijector. And with PR #44 we're going to also have dimensionality in the type which makes it even more important to easily be able to inspect a Bijector.

Sooo let's have another poll! Which of the following styles do you prefer:

Distributions.jl-style and Dims=1 for "container" types, e.g. Composed and Stacked

julia> Bijectors.Exp()
Exp{0}()

julia> Bijectors.Logit(0.0, 1.0)
Logit{Float64}(a=0.0, b=1.0)

julia> Bijectors.composel([PlanarLayer(10) for i = 1:3]...)
Composed{..., Dims=1}(ts=(PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [-1.10143; 1.33204; … ; -0.725184; -0.279187]
u: [-0.755879; 1.6483; … ; 1.15966; -0.164425]
b: [0.09642]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [-0.913726; -2.03212; … ; 0.0719403; -0.0622856]
u: [-1.28574; -0.0194101; … ; -0.760258; 0.555373]
b: [-0.951921]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [1.14896; 0.710024; … ; -2.24741; -0.904142]
u: [0.917471; 1.07336; … ; -1.01782; 0.230862]
b: [0.535096]
)
))

Distributions.jl-style

julia> Bijectors.Exp()
Exp{0}()

julia> Bijectors.Logit(0.0, 1.0)
Logit{Float64}(a=0.0, b=1.0)

julia> Bijectors.composel([PlanarLayer(10) for i = 1:3]...)
Composed(ts=(PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [1.67265; -0.888749; … ; -1.3403; -0.0615207]
u: [0.741016; 0.355929; … ; -1.22854; 0.5851]
b: [1.00029]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [0.401001; -0.586282; … ; 0.359551; 0.736369]
u: [1.26738; -0.194372; … ; 1.13786; -0.0398834]
b: [1.08065]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [-1.01392; -0.70209; … ; -1.8464; 0.212027]
u: [-0.726696; -2.39618; … ; -0.169901; -0.526239]
b: [1.02857]
)
))

Current one:

julia> Bijectors.Exp()
Exp{0}()

julia> Bijectors.Logit(0.0, 1.0)
Logit{Float64}(0.0, 1.0)

julia> Bijectors.composel([PlanarLayer(10) for i = 1:3]...)
Composed{Tuple{PlanarLayer{Array{Float64,2},Array{Float64,1}},PlanarLayer{Array{Float64,2},Array{Float64,1}},PlanarLayer{Array{Float64,2},Array{Float64,1}}},1}((PlanarLayer{Array{Float64,2},Array{Float64,1}}([0.700402; -0.489015; … ; -1.31095; -1.94269], [-0.199911; 1.38871; … ; -0.464837; 0.853309], [0.0885244]), PlanarLayer{Array{Float64,2},Array{Float64,1}}([-0.238407; -0.00701318; … ; 0.921103; 1.26528], [-0.17988; -0.0505573; … ; -0.886066; 1.18275], [0.467128]), PlanarLayer{Array{Float64,2},Array{Float64,1}}([0.231724; 0.39539; … ; -0.418577; -1.75923], [-0.055946; 1.37152; … ; 1.26864; -1.2355], [-0.636524])))

To vote: 👍 : for (1), 😕 for (2), and 👎 for (3).

Other suggestions welcome!

Weird behavior from `link` and `invlink` for `Distributions.Product` with DistributionsAD

Seems to be an issue with Bijectors.jl's compat/distributionsad.jl:

julia> using DistributionsAD, Bijectors

julia> dists = [Gamma(2, 3) for i = 1:2]
2-element Array{Gamma{Float64},1}:
 Gamma{Float64}(α=2.0, θ=3.0)
 Gamma{Float64}(α=2.0, θ=3.0)

julia> dist = product_distribution(dists)
Product{Continuous,Gamma{Float64},Array{Gamma{Float64},1}}(v=Gamma{Float64}[Gamma{Float64}(α=2.0, θ=3.0), Gamma{Float64}(α=2.0, θ=3.0)])

julia> x = rand(dist)
2-element Array{Float64,1}:
 3.3962613472010568
 4.504150811063049

julia> Bijectors.link(dist, x)
2-element Array{Float64,1}:
 1.222675222849948
 1.5049993740830023

julia> Bijectors.invlink(dist, Bijectors.link(dist, x))
2-element Array{Float64,1}:
 1.222675222849948
 1.5049993740830023

while on a project with Bijectors.jl only:

julia> using Bijectors

julia> dists = [Gamma(2, 3) for i = 1:2]
2-element Array{Gamma{Float64},1}:
 Gamma{Float64}(α=2.0, θ=3.0)
 Gamma{Float64}(α=2.0, θ=3.0)

julia> dist = product_distribution(dists)
Product{Continuous,Gamma{Float64},Array{Gamma{Float64},1}}(v=Gamma{Float64}[Gamma{Float64}(α=2.0, θ=3.0), Gamma{Float64}(α=2.0, θ=3.0)])

julia> x = rand(dist)

2-element Array{Float64,1}:
 11.597241830334823
 19.19914746393061

julia> Bijectors.link(dist, x)
2-element Array{Float64,1}:
 11.597241830334823
 19.19914746393061

julia> Bijectors.invlink(dist, Bijectors.link(dist, x))
2-element Array{Float64,1}:
 11.597241830334823
 19.19914746393061

(TestingBijectors) pkg> st
Project TestingBijectors v0.1.0
Status `/tmp/TestingBijectors/Project.toml`
  [76274a88] Bijectors v0.6.6

Behavior is caused by the following when used with DistributionsAD.jl:

julia> @which Bijectors.link(dist, x)
link(dist::Product, x::AbstractArray{#s99,1} where #s99<:Real) in Bijectors at /home/tor/.julia/packages/Bijectors/to1z3/src/compat/distributionsad.jl:249

julia> @which Bijectors.invlink(dist, x)
invlink(d::Distribution{Multivariate,S} where S<:ValueSupport, y::Union{AbstractArray{#s25,1}, AbstractArray{#s25,2}} where #s25<:Real) in Bijectors at /home/tor/.julia/packages/Bijectors/to1z3/src/Bijectors.jl:427

where src/compat/distributionsad.jl:249 will call link for each of the Univariate components of the Product dist while src/Bijectors.jl:427 calls identity, treating dist as Multivariate.

`invlink` and Reverse Mode AD - (1) method error for InverseWishart Distribution, (2) different solutions for ForwardDiff vs ReverseDiff/Zygote

Hi guys,

I am using the following package versions (updated to the current DistributionsAD module after last closed issue):

  [76274a88] Bijectors v0.8.0     
  [31c24e10] Distributions v0.23.4  
  [ced4e74d] DistributionsAD v0.6.1 
  [f6369f11] ForwardDiff v0.10.10 
  [37e2e3b7] ReverseDiff v1.2.0
  [e88e6eb3] Zygote v0.4.22

I have a problem using invlink and Reverse Mode AD. When I only import ForwardDiff and ReverseDiff, there seems to be method error for the logpdf_with_trans evaluation for the InverseWishart distribution (all other distributions that I tested worked). MWE is here:

using Distributions, DistributionsAD, Bijectors
using ForwardDiff, ReverseDiff#, Zygote

function get_logpost(distr::Distributions.Distribution)
    function logposterior(θₜ::AbstractVector{<:Real})
        θ  = invlink(distr, reshape(θₜ, (2,2) ) )
        lp = logpdf_with_trans(distr, θ, true) #method error here when using ReverseDiff
        ll = logpdf(MvNormal([0., .0], θ), [2., 2.])
        return ll + lp
    end
end
theta_transformed = randn(4)
lposterior = get_logpost( InverseWishart(10, [0.15 0.05 ; 0.05 .15]) )
lposterior(theta_transformed)

ForwardDiff.gradient(lposterior, theta_transformed)
ReverseDiff.gradient(lposterior, theta_transformed) #error - MethodError: no method matching pullback(::DistributionsAD.ReverseDiff....

Now, if I additionally import Zygote, ReverseDiff gradient evaluation suddenly works. However, I get different solutions for ForwardDiff and ReverseDiff/Zygote unfortunately. I suspect the ForwardDiff solution to be correct, MWE here:

using Distributions, DistributionsAD, Bijectors
using ForwardDiff, ReverseDiff, Zygote

function get_logpost2(distr1::Distributions.Distribution, distr2::Distributions.Distribution)
    function logposterior2(θₜ::AbstractVector{<:Real})
        mu  = invlink(distr1, θₜ[1:2] )
        sigma  = invlink(distr2, reshape(θₜ[3:6], (2,2) ) )

        lp = logpdf_with_trans(distr1, mu, true)
        lp += logpdf_with_trans(distr2, sigma, true)

        ll = logpdf(MvNormal(mu, sigma), [2., 2.])

        return ll + lp
    end
end
theta_transformed = randn(6)
lposterior2 = get_logpost2( MvNormal([.1, .1]), InverseWishart(10, [0.15 0.05 ; 0.05 .15]) )
lposterior2(theta_transformed)

ForwardDiff.gradient(lposterior2, theta_transformed)
ReverseDiff.gradient(lposterior2, theta_transformed) #works now, but first 2 parameter with MvNormal prior have different solution.
Zygote.gradient(lposterior2, theta_transformed) #same solution as ReverseDiff

Register using the new package system

A Project.toml file is required to get registered with the new Julia package management system.

Dirichlet transform is not numerically stable for high dimensions

Running the following code

using Turing

@model mdemo(d, N) = begin
    Θ = Vector(undef, N)
   for n=1:N
      Θ[n] ~ d
   end
end

# This leads to many numerical issues in gradients
dim  = 5 # This works fine
dim2 = 50 # This breaks
d = Dirichlet(ones(dim2) / dim2)

Turing.setadbackend(:forward_diff)
chain = sample(mdemo(d, 1), HMC(5000, 0.1, 5)) 

Turing.setadbackend(:reverse_diff)
chain = sample(mdemo(d, 1), HMC(5000, 0.1, 5))

produces

julia> chain = sample(mdemo(d, 1), HMC(5000, 0.1, 5)) 
[ Info:  Assume - `Θ` is a parameter
[HMC] Sampling...  0%  ETA: 3:23:51
  ϵ:         0.1
  α:         0.9992054011165935
  pre_cond:  [1.0, 1.0, 1.0, 1.0, 1.0, 1.0,...┌ Warning: Numerical error has been found in gradients.
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:114
┌ Warning: grad = Real[-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, NaN]
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:115
┌ Warning: Numerical error has been found in gradients.
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:114
┌ Warning: grad = Real[-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, NaN]
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:115
┌ Warning: Numerical error has been found in gradients.
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:114
┌ Warning: grad = Real[-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, NaN]
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:115

Broadcasting?

There's an approach I've been thinking about for MeasureTheory.jl, and it could be nice to have it in Bijectors and avoid the type piracy :)

Say you have

julia> dist = MvNormal(zeros(2), ones(2))
DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)


julia> f = PlanarLayer(2)
PlanarLayer{Array{Float64,1},Array{Float64,1}}([0.8187830879660829, -1.280004857469378], [-0.12761939163655306, -0.7795079813636753], [-0.025642876005072607])

As a Function, f takes arguments in R^2:

julia> f(zeros(2))
2-element Array{Float64,1}:
 0.009244220896803523
 0.010647769052268647

To get the pushforward, we need to do

julia> transformed(dist, f)
Bijectors.TransformedDistribution{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}},PlanarLayer{Array{Float64,1},Array{Float64,1}},Multivariate}(
dist: DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)

transform: PlanarLayer{Array{Float64,1},Array{Float64,1}}([0.8187830879660829, -1.280004857469378], [-0.12761939163655306, -0.7795079813636753], [-0.025642876005072607])
)

But conceptually, a distribution or measure is a kind of container (since it's a monad). This makes me wonder about using broadcasting for this. In that case we could just write

f.(dist)

I'm not sure if there might be unintended consequences of doing it this way, but it wouldn't be the first case of broadcasting off the beaten path - see https://github.com/JuliaApproximation/QuasiArrays.jl

Another option would be to just add a method

(f::Bijector)(dist::Distribution) = transformed(dist, f)

But if it works, the broadcasting approach seems more natural, from a mathematical perspective. What do you think?

Numerical issue with random walk for Dirichlet

I've noticed a numeric issue when using this package with random walk algorithm. The code below is the minimal example to produce the bug:

using Distributions, Bijectors
dim = 10
dist = Dirichlet(dim, 2.0)
stepsize = 1e10
# Below gives DomainError with high probability
[logpdf_with_trans(dist, invlink(dist, link(dist, rand(dist)) .+ randn(dim) .* stepsize), true) for _ in 1:1_000]

I don't know if a stepsize of 10 is too high or not but it seems to be possible to have this kind of values during adaptation.

Should probably fix a seed when testing with random numbers

It seems reasonable to fix some seed when testing with random data input. This can give some consistency across platforms. I have seen this done in packages like IterativeSolvers.jl. Any reason not to?

Readme confusing example

The current Readme section on Functions is a bit confusing.
In 1. we get a value of x = 0.7472542331020509, but in 2. the inverse gives z = 0.6543406780096065 and claims x == z.

I find that depending on your luck (or precision), x will not always equal z:

julia> using Random; Random.seed!(42);

julia> x = rand(dist)
0.36888689965963756

julia> y = link(dist, x)
-0.5369949942509267

julia> z = invlink(dist, y)
0.36888689965963756

julia> x==z
true

julia> using Random; Random.seed!(43);

julia> x = rand(dist)
0.3885220158396991

julia> y = link(dist, x)
-0.45352911444046184

julia> z = invlink(dist, y)
0.38852201583969914

julia> x == z
false

Maybe use \approx to show x ≈ z?

LKJ bijector is having problems

I wish I could be more specific about the problems in the issue description, but I'm having a hard time codifying them. I've gotten singular exceptions, non-PD exceptions, and non-Hermitian matrix exceptions, neither of which seem to have an obvious cause to me.

MWE:

using Turing, LinearAlgebra

@model function ex(data)
    # Set up covariance matrix
    sigma ~ filldist(InverseGamma(2, 3), size(data, 2))
    omega ~ LKJ(size(data, 2), 1.0)
    Σ = Symmetric(diagm(sigma) * omega * diagm(sigma))

    # Observations
    for t in 1:size(data, 1)
        data[t,:] ~ MvNormal(zeros(size(data, 2)), Σ)
    end
end

# Parameters
dim = 10
n_obs = 1000

# Variances
sigma = rand(InverseGamma(2, 3), dim)

# Correlation matrix
omega = rand(LKJ(dim, 2.0))
Sigma = Symmetric(omega)

# Data
data = rand(MvNormal(zeros(dim), Sigma), n_obs)'

# Construct model
model = ex(data)
chain = sample(model, NUTS(), 100)

Running the example above gets me

ERROR: LoadError: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
 [1] checkpositivedefinite at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:18 [inlined]
 [2] cholesky!(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float6
4},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,
1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{Na
medTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{S
et{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},Arra
y{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int
64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.S
elector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega)
,Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1
}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}, ::Val{false}; check::Boo
l) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:219
 [3] #cholesky#130 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined]
 [4] cholesky at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined] (repeats 2 times)
 [5] (::Bijectors.CorrBijector)(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{In
verseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},
1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{Dyna
micPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{F
loat64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64}
,Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tup
le{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array
{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple
{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{Dynam
icPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}) at /hom
e/cameron/.julia/dev/Bijectors/src/bijectors/corr.jl:68
 [6] logabsdetjac at /home/cameron/.julia/packages/Bijectors/Fye2h/src/bijectors/corr.jl:101 [inlined]
 [7] logpdf_with_trans(::LKJ{Float64,Int64}, ::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillA
rrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:om
ega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.Samp
lerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{
}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultCont
ext},Float64},Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{F
loat64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Flo
at64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarIn
fo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Ar
ray{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10}
,2}}, ::Bool) at /home/cameron/.julia/dev/Bijectors/src/Bijectors.jl:130

If I reduce dims to 3, sampling basically just hangs forever. Occasionally it'll throw a singular exception, but I haven't been able to recreate it and so I don't have a callstack.

Any thoughts here?

rand call to dist of MvTransformed not using rng

Minor thing I noticed:

Bijectors.jl/src/transformed_distribution.jl

Line 134 in a3ece91

res = hcat([td.transform(rand(td.dist)) for i = 1:num_samples]...)

should be:

res = hcat([td.transform(rand(rng, td.dist)) for i = 1:num_samples]...)

For inversion, `Scale` should use `\` instead of explicitly inverting matrix

Currently, if b isa Scale then inv(b) results in Scale(inv(b.a)). After @willtebbutt educating me in performance and stability of matrix-inversion, it seems like it would be a better idea to do:

(ib::Inversed{Scale})(y) = Scale(inv(ib.orig.a))(y)
(ib::Scale{<:AbstractVector})(y) = Scale(inv.(ib.orig.a))(y)

# in case were we're scaling vectors using matrices => use `\` instead
function (ib::Inversed{<:Scale{<:AbstractMatrix, 1}})(y::AbstractVecOrMat)
    return ib.orig.a \ y
end

For reference, the current implementation is

Bijectors.jl/src/bijectors/scale.jl

Lines 13 to 14 in dc736c8

 inv(b::Scale{T, D}) where {T, D} = Scale(inv(b.a); dim = Val(D)) 

 inv(b::Scale{<:AbstractVector, D}) where {D} = Scale(inv.(b.a); dim = Val(D))

which will explicitly invert the scaling-matrix.

This will also make Scale compatible with GPU, since CuArrays.jl does not yet have inv implemented (see https://github.com/JuliaGPU/CuArrays.jl/issues/440)

PlanarLayer parameter b is not Tracked

Hi, it seems that parameter b in PlanarLayer is not Tracked. I have tried to reproduce the example in documentation and I get


using Bijectors
using Tracker
d = MvNormal(zeros(2), ones(2));
b = PlanarLayer(2, param);
flow = transformed(d, b)
Bijectors.TransformedDistribution{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}},PlanarLayer{TrackedArray{…,Array{Float64,1}},Float64},Multivariate}(
dist: DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
transform: PlanarLayer{TrackedArray{…,Array{Float64,1}},Float64}([1.1142736114526957, -1.8696189914068417] (tracked), [0.6892713124675772, 0.53047763560658] (tracked), -1.0403736727795068)
)

Bijectors is v0.8.1

Issue related to ReverseDiff

Seems like there's an issue with type-inference when it comes to replace_diag:

julia> using ReverseDiff, DistributionsAD, Bijectors

julia> using LinearAlgebra

julia> X = [0.3239552320287513 0.07451752964111555 -0.010516169325024586; 0.07451752964111555 0.8201186717610907 -0.16101202850282761; -0.010516169325024586 -0.16101202850282761 0.8397115540645331]
3×3 Array{Float64,2}:
  0.323955    0.0745175  -0.0105162
  0.0745175   0.820119   -0.161012
 -0.0105162  -0.161012    0.839712

julia> ReverseDiff.gradient(X) do x
           L = cholesky(x).L 
           A = Bijectors.replace_diag(exp, L)

           return sum(LowerTriangular(A) * LowerTriangular(A)')
       end
ERROR: MethodError: no method matching valtype(::Type{ReverseDiff.TrackedReal{Float64,Float64,O} where O})
Closest candidates are:
  valtype(::ReverseDiff.TrackedReal{V,D,O} where O where D<:Real) where V at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:106
  valtype(::Type{ReverseDiff.TrackedReal{V,D,O}}) where {V, D, O} at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:107
  valtype(::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N where D) where V at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:108
  ...
Stacktrace:
 [1] convert(::Type{ReverseDiff.TrackedReal{Float64,Float64,O} where O}, ::Int64) at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:252
 [2] zero(::Type{ReverseDiff.TrackedReal{Float64,Float64,O} where O}) at ./number.jl:242
 [3] *(::LowerTriangular{ReverseDiff.TrackedReal{Float64,Float64,O} where O,Array{ReverseDiff.TrackedReal{Float64,Float64,O} where O,2}}, ::Adjoint{ReverseDiff.TrackedReal{Float64,Float64,O} where O,LowerTriangular{ReverseDiff.TrackedReal{Float64,Float64,O} where O,Array{ReverseDiff.TrackedReal{Float64,Float64,O} where O,2}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/triangular.jl:1947
 [4] (::var"#3#4")(::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}) at ./REPL[6]:5
 [5] ReverseDiff.GradientTape(::var"#3#4", ::Array{Float64,2}, ::ReverseDiff.GradientConfig{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}) at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/api/tape.jl:199
 [6] gradient(::Function, ::Array{Float64,2}, ::ReverseDiff.GradientConfig{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}) at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/api/gradients.jl:22 (repeats 2 times)
 [7] top-level scope at REPL[6]:1

julia> ReverseDiff.gradient(X) do x
           L = cholesky(x).L

           # Convert to array because apparently `Diagonal(L)` isn't okay.
           # Results in `Array{<:TrackedReal}`.
           A = Array(L)
           A = A + exp.(Diagonal(A)) - Diagonal(A)

           return sum(LowerTriangular(A) * LowerTriangular(A)')
       end
3×3 Array{Float64,2}:
 11.1536  9.38521  11.7767
  0.0     9.93215   9.95776
  0.0     0.0       6.71535

NOTE: Need to import DistributionsAD.jl since this contains the impl of cholesky.

`logpdf_with_trans` is not consistent with `Distributions.logpdf`

Distributions.logpdf is only defined for single samples (and not multiple samples) whereas logpdf_with_trans is defined for sets of samples as well. To be consistent with the behaviour in Distributions, one should remove the definitions of logpdf_with_trans for multiple samples and add a loglikelihood_with_trans function that computes the sum of logpdf_with_trans applied to multiple samples. This would also avoid all the unnecessary allocations in https://github.com/TuringLang/DynamicPPL.jl/blob/275ccc8791be7d5dc4eb74b6e0c19de247f56d74/src/context_implementations.jl#L264 and https://github.com/TuringLang/DynamicPPL.jl/blob/275ccc8791be7d5dc4eb74b6e0c19de247f56d74/src/context_implementations.jl#L278.

New release

@JuliaRegistrator register()

Add numerical tests to transformations

Migrated from TuringLang/Turing.jl#542

#543 now resolves most of these issues, but there remain a couple of outstanding issues:

A few distributions (Dirichlet, Wishart, InverseWishart) don't have tests to ensure that their link functions are consistent with their logpdf_with_trans implementation. Due to the way that we transform between the constrained and unconstrained representations the Jacobian of the transformation is singular. This isn't a fundamental issue, but I'm not completely sure what the best way to resolve it is beyond directly hacking away at stuff.
The logpdf of the Kolmogorov distribution returns -Inf for x close to zero. I assume this is a numerical stability issue, so I've disabled testing for that for now as it messes everything up, and is probably on the Distributions.jl end.

Type-stability of `bijector(d::Distribution)` not complete

I've made an effort to make things in the interface type-stable, but there's one bijector(d::Distribution) call that is currently not type-stable: Truncated.

For bijector(d::Truncated) we need to decide which bijectors to compose depending on whether or not it's only lower-bounded, upper-bounded or both.

Also, at the moment, some other univariate distributions (all the ones defined in union TransformDistributions) also have unstable bijector calls because they are also using the implementation as Truncated, e.g. ArcSine, but these we can just implement on a case-by-case basis to get type-stability in all cases except Truncated.

In practice, I don't think type-stability of bijector(d::Distribution) is that important, as one will usually construct the distributions at the very beginning of the computation. Even so, for the cases where this is not the case it would be nice to have type-stability.

Also, in the case of Truncated{<:Distribution} the union is of size 3 which can possibly be dealt with by union-splitting since default split-size is 4.

julia> for d in uni_dists
           c = @code_typed bijector(d)
           if (c.second isa Union) || (c.second == Any)
               println("-----------------")
               println(d, ": ", c.second)
               println("(", minimum(d), ", ", maximum(d), ")")
               println(bijector(d))
           end
       end
-----------------
Arcsine{Float64}(a=2.0, b=4.0): Union{Identity, Logit{Float64}, Composed}
(2.0, 4.0)
Bijectors.Logit{Float64}(2.0, 4.0)
-----------------
Biweight{Float64}(μ=0.0, σ=1.0): Union{Identity, Logit{Float64}, Composed}
(-1.0, 1.0)
Bijectors.Logit{Float64}(-1.0, 1.0)
-----------------
Cosine{Float64}(μ=0.0, σ=1.0): Union{Identity, Logit{Float64}, Composed}
(-1.0, 1.0)
Bijectors.Logit{Float64}(-1.0, 1.0)
-----------------
Epanechnikov{Float64}(μ=0.0, σ=1.0): Union{Identity, Logit{Float64}, Composed}
(-1.0, 1.0)
Bijectors.Logit{Float64}(-1.0, 1.0)
-----------------
Levy{Float64}(μ=0.0, σ=1.0): Union{Identity, Composed{Tuple{Shift{Float64},Log}}}
(0.0, Inf)
Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}((Bijectors.Shift{Float64}(-0.0), Bijectors.Log()))
-----------------
Pareto{Float64}(α=1.0, θ=1.0): Union{Identity, Composed{Tuple{Shift{Float64},Log}}}
(1.0, Inf)
Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}((Bijectors.Shift{Float64}(-1.0), Bijectors.Log()))
-----------------
Truncated(Normal{Float64}(μ=0.0, σ=1.0), range=(-Inf, 2.0)): Union{Identity, Logit{Float64}, Composed}
(-Inf, 2.0)
Composed{Tuple{Bijectors.Scale{Float64},Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}}}((Bijectors.Scale{Float64}(-1.0), Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}((Bijectors.Shift{Float64}(2.0), Bijectors.Log()))))

Dirichlet dimensionality off-by-one

As implemented, link is not bijective for Dirichlet distributions:

julia> dist = Dirichlet([1,1])
Dirichlet{Float64}(alpha=[1.0, 1.0])

julia> link(dist, [0.5,0.5])
2-element Array{Float64,1}:
 0.0
 0.0

julia> invlink(dist,[0.0,0.0])
2-element Array{Float64,1}:
 0.5
 0.5

julia> invlink(dist,[0.0,1.0])
2-element Array{Float64,1}:
 0.5
 0.5

Because values must add to one, a Dirichlet with n values only has n-1 degrees of freedom, so the bijection needs to be to ℝⁿ⁻¹. There's an implementation in TransformVariables.jl that seems to work correctly.

It seems likely this is behind the troubles described in TuringLang/Turing.jl#935

Implementing a custom bijector is a hassle: solve by adding macro?

Currently there a couple of annoyances when implementing a new Bijector:

Difficult to share implementation between bijectors, mostly because of the fact that callable types (b::Bijector)(x::T) cannot be implemented for abstract types on Julia <1.3. This means that we have to implement batch-computations on a case-by-case basis, which is both annoying and sometimes difficult to do in a AD-friendly + type-stable way (we have a bunch of mapvcat and eachcolmaphcat methods to do this, which is an unnecessary complication for a newcomer).

When we started out with the re-design, we were also considering using a transform(b, x) method as the "evaluation" method, as this would allow us to have more generic implementations for batching, etc. But we decided not to do that, as it also felt clunky.

forward(b::Bijector, x) is supposed to allow the user to share computation between the evaluation, i.e. (b::MyBijector)(x), and logabsdetjac(b, x). Buuut it's annoying to have to first implement (b::MyBijector)(x) and logabsdetjac(b, x), which are mandatory, and then have to go through these methods to figure out what is shared and then copy-paste certain parts to a transform method, etc.

Since we're in Julia, my first idea is of course to throw a macro at problem! I'm thinking introduce transfrom but make it -super-easy for the user to define everything in one go. I.e. something along the lines of:

struct MyBijector <: Bijector{0} end

@bijector function forward(b::MyBijector, x::Real)
    # Shared computation
    z = exp(x)
    
    rv = begin
        # `(b::Bijector)(x)` goes here
        z
    end
    logabsdetjac = begin
        # `logabsdetjac(b, x)` goes here
        log(z)
    end
end

which is then transformed into something along the lines of

quote
    function (Bijectors).transform(b::MyBijector, x::Real)
        z = exp(x)
        return z
    end
    (b::MyBijector)(x) = (Bijectors).transform(b::MyBijector, x::Real)
    function (Bijectors).logabsdetjac(b::MyBijector, x::Real)
        z = exp(x)
        return log(z)
    end
    function (Bijectors).forward(b::MyBijector, x::Real)
        z = exp(x)
        rv = z
        logabsdetjac = log(z)
        return (rv = rv, logabsdetjac = logabsdetjac)
    end
end

Then the only thing that is left for the user to implement is the inverse evaluation.

Also, I do have a somewhat "dirty" implementation ready (from which the above output was generated + MacroTools.prettify): https://gist.github.com/torfjelde/8675bba686afdf693476ae1c70f516d3.

This would then allow us to easily transition to transform, thus ensuring compatibility with Julia <1.3 but still using more generic methods, i.e. transform(b::Bijector{0}, x::AbstractVector) = b.(x). It would make it super-easy to share computation in forward. Finally, we could start thinking about adding in complementary inplace methods, e.g. transform!(b::Bijector, x, out), logabsdetjac!(b::Bijector, x, out), etc, as a next step.

The only question is: are we overcomplicating things here? Is there an easier way of achieving what we want?

softplus bijector?

If someone has some time, a softplus Bijector would be cool to have.

New releases

@JuliaRegistrator register()

New version

@JuliaRegistrator register()

`Composed` improvements

I've re-visted some ideas I had earlier on but didn't have quite the knowledge nor time to explore. As a result I think I've come up with some really nice improvements to the Composed bijector.

The changes suggested here already exists on a local branch of mine and tests are passing without issues. So if people seem happy with this I'll make a PR asap.

Nested compositions is bleh

We're currently not doing anything to "flatten" the compositions, e.g.

julia> using Bijectors

julia> b = Bijectors.Exp()
Bijectors.Exp()                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                               
julia> map(typeof, (b ∘ b ∘ b).ts)
(Bijectors.Exp, Composed{Tuple{Bijectors.Exp,Bijectors.Exp}})

Notice that it we get nested compositions here.

Personally I'd like ∘ to encourage a flat structure, so we instead get:

julia> map(typeof, (b ∘ b ∘ b).ts)
(Bijectors.Exp, Bijectors.Exp, Bijectors.Exp)

The above was accomplished by simply introducing the following lines:

∘(b1::Bijector, b2::Bijector) = composel(b2, b1)  # <= already exists
∘(b1::Composed, b2::Bijector) = composel(b2, b1.ts...)       # new 
∘(b1::Bijector, b2::Composed) = composel(b2.ts..., b1)       # new
∘(b1::Composed, b2::Composed) = composel(b2.ts..., b1.ts...) # new

If the user really wants to use nested compositions rather than this flat structure, they can call composel and/or composer explicitly. I struggle to see a case where you'd like to group bijectors together but wouldn't already have access to this grouping outside of the composition but I'd still like to allow both possibilities. This solution does exactly that.

Other than just being "nice", it has two advantages over nesting:

Dispatching on different compositions is easier since you only need to consider Composed{Tuple{B1, B2, B3}} rather than all combinations of nestedness.
Helps with performance

Recursive implementations ain't so good

Currently we're using a recursive implementation for the following methods for Composed:

(cb::Composed)(x)
forward(cb::Composed, x)
logabsdetjac(cb::Composed, x)

For small compositions, this isn't really an issue but still sub-optimal. Especially in forward and logabsdetjac where we need intermediate values to split apart the forward result. This can't be done recursively without memory allocation for these intermediate values.

One alternative, which is further improved by encouraging a flat structure as described earlier, is to unroll the recursion using @generated:

@generated function forward_generated(cb::Composed{T}, x) where {T<:Tuple}
    expr = Expr(:block)
    push!(expr.args, :((y, logjac) = forward(cb.ts[1], x)))
    for i = 2:length(T.parameters)
        push!(expr.args, :(res = forward(cb.ts[$i], y)))
        push!(expr.args, :(y = res.rv))
        push!(expr.args, :(logjac += res.logabsdetjac))
    end
    push!(expr.args, :(return (rv = y, logabsdetjac = logjac)))

    return expr
end

For the code

b = Bijectors.Exp()
forward(b ∘ b ∘ b, randn())

we get the generated code

quote
    (y, logjac) = forward(cb.ts[1], x)
    res = forward(cb.ts[2], y)
    y = res.rv
    logjac += res.logabsdetjac
    res = forward(cb.ts[3], y)
    y = res.rv
    logjac += res.logabsdetjac
    return (rv = y, logabsdetjac = logjac)
end

Comparing this to the current recursive implementation to new forward_generated:

julia> using Revise, Bijectors, BenchmarkTools

julia> using Bijectors: Log, Exp

julia> b = Bijectors.Exp()
Exp()

julia> cb = inv(b) ∘ b
Composed{Tuple{Exp,Log}}((Exp(), Log()))

julia> x = randn()
1.59358111703819

julia> @btime Bijectors.forward($cb, $x)
  28.426 ns (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)

julia> @btime Bijectors.forward_gen($cb, $x)
  28.788 ns (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)

So not much gained here. But if we consider deeply nested compositions the difference becomes quite singificant:

julia> cb = foldl(∘, [inv(b) ∘ b for i = 1:50])
Composed{Tuple{Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log}}((Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log()))

julia> @btime Bijectors.forward($cb, $x)
  431.207 μs (201 allocations: 6.25 KiB)
(rv = 1.59358111703819, logabsdetjac = 0.0)

julia> @btime Bijectors.forward_gen($cb, $x)
  1.934 μs (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)

I've also consider @inline for the recursive implementation but for large cases compile-times are quite bad. So I think it's pretty clear which is the winner:)

Neat composition rules for extra neatness

# For extra neat-ness for a lot of bijectors:)
∘(b1::B, b2::Inversed{B}) where {B<:Bijector} = Identity()
∘(b1::Inversed{B}, b2::B) where {B<:Bijector} = Identity()

# Can also implement on case-by-case basis where we don't use `Inversed`
macro inverses(B1, B2)
    expr = Expr(:block)
    push!(expr.args, :(Base.:∘(b1::$B1, b2::$B2) = Identity()))
    push!(expr.args, :(Base.:∘(b1::$B2, b2::$B1) = Identity()))
    return expr
end

@inverses Log Exp  # super-easy to declare the inverses

# And these
∘(::Identity, ::Identity) = Identity()
∘(::Identity, b::Bijector) = b
∘(b::Bijector, ::Identity) = b

then

julia> using Bijectors: Log, Exp

julia> Log() ∘ Exp()
Bijectors.Identity()

EDIT: forgot to use @btime forward($cb, $x), etc. as @mohamed82008 pointed out.

Add SphereBijector and AngleBijector

Directional statistics deals with unit vectors and periodic variables. Directions.jl includes two such distributions: VonMises (angular) and VonMisesFisher (spherical). I'm planning to implement more (see https://discourse.julialang.org/t/rfc-taking-directional-orientational-statistics-seriously/31951). For these, we need a SphereBijector and an AngleBijector.

SphereBijector transforms an n-dimensional vector into an n-dimensional unit vector under the Euclidean norm. It's not really a bijector, since it only has a right inverse (the inclusion function), so its Jacobian has a determinant of 0. However, we can still give a logabsdetjac term that produces a uniform measure (using a standard multivariate normal kernel). See the Stan manual for details. I also have implementations of this transformation at https://github.com/salilab/HMCUtilities.jl/blob/c4602ac/src/constraint.jl#L469-L527 and tpapp/TransformVariables.jl#67.

AngleBijector simply converts cartesian coordinates of a 1-sphere (circle) to an angle using atan. When composed with SphereBijector and shift, it provides the necessary transformation for VonMises. Also composing it with scale lets one transform any periodic quantity.

I'm happy to implement these. But will you take non-bijective functions in Bijectors.jl?

Document new interface

Maybe we should add a page in our docs website (https://turing.ml) to document features in Bijectors.jl?

Showing Input-Output dimensions

Hi,

Really well done package! My post is not really code related, but I was wondering if it might be possible to show (in the container?) the input dimension and actual dimensions that you work with for the transformers.

Afaik, there are a fair amount of transformers that are fully described in a lower dimension than the actual Bijectors output (i.e. I think you use the Cholesky transform and then take the log on the diagonals for Covariance matrices, but keep the whole matrix as output).

As an example for a 3*3 Covariance Matrix, would something like Bijector{9,6} be possible?

	(sb::Stacked)(x::AbstractMatrix{<:Real}) = eachcolmaphcat(sb, x)
	function logabsdetjac(
	b::Stacked{<:Any, N},
	x::AbstractVector{<:Real}
	) where {N}
	init = sum(logabsdetjac(b.bs[1], x[b.ranges[1]]))
	init + sum(2:N) do i
	sum(logabsdetjac(b.bs[i], x[b.ranges[i]]))
	end
	end

	inv(b::Scale{T, D}) where {T, D} = Scale(inv(b.a); dim = Val(D))
	inv(b::Scale{<:AbstractVector, D}) where {D} = Scale(inv.(b.a); dim = Val(D))

turinglang / bijectors.jl Goto Github PK

bijectors.jl's People

Contributors

Stargazers

Watchers

Forkers

bijectors.jl's Issues

Problem

Possible solutions

1. Univariate, Multivariate, etc.

2. Fixed-size

3. ???

Nested compositions is bleh

Recursive implementations ain't so good

Neat composition rules for extra neatness

Recommend Projects

Recommend Topics

Recommend Org

1. `Univariate`, `Multivariate`, etc.