turinglang / bijectors.jl Goto Github PK
View Code? Open in Web Editor NEWImplementation of normalising flows and constrained random variable transformations
Home Page: https://turinglang.org/Bijectors.jl/
License: MIT License
Implementation of normalising flows and constrained random variable transformations
Home Page: https://turinglang.org/Bijectors.jl/
License: MIT License
I'm currently trying to build a model in Turing that utilizes the newly added LKJ distribution from Distributions.jl. I ran into issues when I attempted to sample the model:
ERROR: MethodError: no method matching bijector(::LKJ{Float64,Int64})
Closest candidates are:
bijector(::KSOneSided) at /Users/joshualeond/.julia/packages/Bijectors/bHaf6/src/transformed_distribution.jl:58
bijector(::Product{Discrete,T,V} where V<:AbstractArray{T,1} where T<:Distribution{Univariate,Discrete}) at /Users/joshualeond/.julia/packages/Bijectors/bHaf6/src/transformed_distribution.jl:39
bijector(::Product{Continuous,T,Tdists} where Tdists<:(FillArrays.Fill{T,1,Axes} where Axes) where T<:Distribution{Univariate,Continuous}) at /Users/joshualeond/.julia/packages/Bijectors/bHaf6/src/compat/distributionsad.jl:16
It appears that the LKJ distribution maybe needs to be added here in Bijectors:
Line 202 in 1f3b581
However, after I simply added LKJ to this line I hit the following error:
ERROR: MethodError: no method matching getlogp(::LKJ{Float64,Int64}, ::Cholesky{Float64,Array{Float64,2}}, ::Array{Float64,2})
Closest candidates are:
getlogp(::MatrixBeta, ::Any, ::Any) at C:\Users\dunjos0\.julia\dev\Bijectors\src\Bijectors.jl:231
getlogp(::Wishart, ::Any, ::Any) at C:\Users\dunjos0\.julia\dev\Bijectors\src\Bijectors.jl:236
getlogp(::InverseWishart, ::Any, ::Any) at C:\Users\dunjos0\.julia\dev\Bijectors\src\Bijectors.jl:239
So it looks like there needs to be a new getlogp
method in Bijectors for the LKJ. I see that these getlogp
methods follow closely to the logkernel
methods in Distributions.jl but am unsure what needs to be adjusted to make sure it works nicely with Turing.
Right now the behavior when Roots.find_zero()
(particularly with PlanarLayer
and RadialLayer
) fails is to stop everything. It would be nice to have a helper message on what to correct/improve to make the model more stable or to have a fallback which makes approximations but is more stable!
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
It seems the @debug
statements when expanded are causing a type instability and performance hit. ref: #17
Currently the Bijector
interface doesn't "officially" support batch computations. With this I mean, what do you do when a SimplexBijector
receives an AbstractMatrix
rather than an AbstractVector
as is expected? In some cases it's fairly straight forward to do so, e.g. the SimplexBijector
only makes sense when applied to a vector so when we receive a matrix we know it's a batch and we can just do the right thing. In other cases what to do with a Real
vs. AbstractVector{<:Real}
is not so clear, e.g. Exp
elementwise is a well-defined bijection on Real
, AbstractVector
, etc. so what do we do in logabsdetjac(b::Exp, x)
when x isa AbstractArray
? Is x
a batch of inputs so we should return [y[i] for i = 1:length(x)]
, or is it intended as a vector and we should return sum(y)
as is need for the distribution MvLogNormal
?
To drive the point home, what do we do in this situation:
julia> struct Scale{T}
a::T
end
julia> (b::Scale)(x) = b.a .* x
julia> b = Scale(2.0)
Scale{Float64}(2.0)
julia> x = randn()
0.2414826706733002
julia> b(x)
0.4829653413466004
julia> logabsdetjac(b::Scale, x) = log(abs(b.a))
logabsdetjac (generic function with 1 method)
julia> logabsdetjac(b, x)
0.6931471805599453
julia> x = rand(2) # suppose we want to scale vectors elementwise instead
2-element Array{Float64,1}:
0.7910530695591607
0.03333662406026572
julia> b(x) # transform still works nicely
2-element Array{Float64,1}:
1.5821061391183213
0.06667324812053144
julia> logabsdetjac(b, x)
0.6931471805599453
julia> x = rand(2) # batch of 2
2-element Array{Float64,1}:
0.546411899713563
0.16872866221657734
julia> b(x) # transform works on batch as expected
2-element Array{Float64,1}:
1.092823799427126
0.3374573244331547
julia> logabsdetjac(b, x) # NO! Want this to return [logabsdetjac(b, x[1]), logabsdetjac(b, x[2])]!
0.6931471805599453
In this example we could use the parametric type T
of Scale{T}
to decide what to do, but what do we do when T <: AbstractVector
and logabsdetjac
is given an AbstractMatrix
? Is Scale
intended to act on a matrix by scaling each column elementwise or is it intended to act on a vector and this matrix is actually a batch?
At the time of writing, some of the bijectors support batch computation by "accident", e.g. SimplexBijector
there is only one thing to do when given an AbstractMatrix
, but it's crucial that we have support for such a feature for all bijectors for it to be of any use. If not, a lot of incompatibilities will arise when users try to compose two bijectors, one supporting batch-computation and one does not; then you end up with size-mismatches in the logabsdetjac
field.
Univariate
, Multivariate
, etc.We could introduce something similar to Distributions.jl's Univariate
, Multivariate
, etc. The example of Scale
would then instead be
struct Scale{O, T}
a::T
end
(b::Scale)(x) = b.a .* x
logabsdetjac(b::Scale{Univariate}, x::Real) = log(abs(b.a))
logabsdetjac(b::Scale{Univariate}, x::AbstractVector) = log(abs(b.a)) .* ones(length(x))
logabsdetjac(b::Scale{Multivariate}, x::AbstractVector{<:Real}) = sum(log(abs(b.a)))
logabsdetjac(b::Scale{Multivariate}, x::AbstractMatrix{<:Real}) = sum(log(abs(b.a))) .* ones(size(x, 2))
and so on.
As suggested by @xukai92, this can also be used to specify the device location if we just use aliases like Univariate = Real
, etc. This way one could have specific implementations for CuArray
, etc.
The main issue I see is that we'll then have to force this upon all bijectors, some for which it is redundant information as they are only well-defined on a single specific space.
A super-simple solution is to force each bijector should to the dimensions of the output upon construction. I personally don't like it as it can be very inflexible and in most cases the bijection is well-defined on a particular space, so the dimension can be quite redundant.
] test
raise:
PlanarLayer: Test Failed at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:30
Expression: ≈((inv(flow))(flow(z)), z, rtol = 0.2)
Evaluated: [2.2075135426840555 -0.18182425891670018 … 1.3994120484164432 0.9248803865772497; -1.5719067963233282 -0.1836889984451685 … 0.7022959819078269 0.27282017311337303] ≈ [1.9716204287413468 -0.15707698104964826 … 1.3606109850208474 0.4787377009574211; -1.3961992590989196 -0.20212227601885754 … 0.7311973736313007 0.6051343777576946] (rtol=0.2)
Stacktrace:
[1] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[3] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:23
PlanarLayer: Test Failed at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:31
Expression: ≈((inv(flow) ∘ flow)(z), z, rtol = 0.2)
Evaluated: [2.2075135426840555 -0.18182425891670018 … 1.3994120484164432 0.9248803865772497; -1.5719067963233282 -0.1836889984451685 … 0.7022959819078269 0.27282017311337303] ≈ [1.9716204287413468 -0.15707698104964826 … 1.3606109850208474 0.4787377009574211; -1.3961992590989196 -0.20212227601885754 … 0.7311973736313007 0.6051343777576946] (rtol=0.2)
Stacktrace:
[1] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:31
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[3] top-level scope at /home/yiyuezhuo/.julia/packages/Bijectors/66bcj/test/norm_flows.jl:23
Test Summary: | Pass Fail Total
PlanarLayer | 11 2 13
Strangely, this test passed in Julia 1.4.1.
I guess it should be attributed to seed (seed!(1)
before tests), since even in Julia 1.4.1, which passed the test, will raise similar errors when run corresponding code multiple times or change seed (ex: seed!(3)
). Is it valid to specify a new seed to make this test pass? But this will lead to subtle version compatible problems.
With SampleFromPrior
and DefaultContext
:
julia> @model function bernoulli_mixture(x)
w ~ Dirichlet(2, 1.0)
p ~ DiscreteNonParametric([0.3, 0.7], w)
x ~ Bernoulli(p)
end
julia> vi = VarInfo();
julia> bernoulli_mixture(false)(vi, SampleFromPrior(), DefaultContext())
false
julia> bernoulli_mixture(false)(vi, SampleFromPrior(), DefaultContext())
ERROR: MethodError: no method matching eps(::Type{Real})
Closest candidates are:
eps(::Dates.Time) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Dates/src/types.jl:387
eps(::Dates.Date) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Dates/src/types.jl:386
eps(::Dates.DateTime) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Dates/src/types.jl:385
...
Stacktrace:
[1] logpdf_with_trans(::Dirichlet{Float64}, ::Array{Real,1}, ::Bool) at /home/philipp/.julia/packages/Bijectors/bHaf6/src/Bijectors.jl:124
[2] assume(::Random._GLOBAL_RNG, ::SampleFromPrior, ::Dirichlet{Float64}, ::VarName{:w,Tuple{}}, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:142
[3] _tilde at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:59 [inlined]
[4] tilde at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:23 [inlined]
[5] tilde_assume(::Random._GLOBAL_RNG, ::DefaultContext, ::SampleFromPrior, ::Dirichlet{Float64}, ::VarName{:w,Tuple{}}, ::Tuple{}, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/context_implementations.jl:52
[6] macro expansion at ./REPL[23]:2 [inlined]
[7] ##evaluator#453(::Random._GLOBAL_RNG, ::Model{var"###evaluator#453",(:x,),Tuple{Bool},(),ModelGen{var"###generator#454",(:x,),(),Tuple{}}}, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::SampleFromPrior, ::DefaultContext) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/compiler.jl:356
[8] evaluate_threadunsafe at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/model.jl:157 [inlined]
[9] (::Model{var"###evaluator#453",(:x,),Tuple{Bool},(),ModelGen{var"###generator#454",(:x,),(),Tuple{}}})(::Random._GLOBAL_RNG, ::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::SampleFromPrior, ::DefaultContext) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/model.jl:136
[10] (::Model{var"###evaluator#453",(:x,),Tuple{Bool},(),ModelGen{var"###generator#454",(:x,),(),Tuple{}}})(::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::SampleFromPrior, ::Vararg{Any,N} where N) at /home/philipp/.julia/packages/DynamicPPL/QgcLg/src/model.jl:126
[11] top-level scope at REPL[38]:1
[12] eval(::Module, ::Any) at ./boot.jl:330
[13] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:86
[14] run_backend(::REPL.REPLBackend) at /home/philipp/.julia/packages/Revise/AMRie/src/Revise.jl:1023
[15] top-level scope at REPL[2]:0
The same thing does not happen with LikelihoodContext
:
julia> vi = VarInfo();
julia> bernoulli_mixture(false)(vi, SampleFromPrior(), LikelihoodContext())
false
julia> bernoulli_mixture(false)(vi, SampleFromPrior(), LikelihoodContext())
false
We should add test which test agains the use of Flux.Tracker
for AD. While playing around with the transformations I encountered some issues while using Flux.Tracker
and simplex distributions.
Currently, these distributions are broken due to FluxML/Zygote.jl#873. As discussed in #155, the corresponding tests are marked as broken. The issue will be fixed upstream.
See also: TuringLang/DistributionsAD.jl#145
Hi,
The Normalizing flow example uses Tracker, a discontinued AD package.
I am trying to fit a NF using Zygote, but I have some problems.
Example:
@model function gen()
x ~ Exponential(0.5)
y ~ Normal(0, x)
end
s = sample(gen(), NUTS(), 500)
train_data = hcat(s[:x].data, s[:y].data)' |> Array
b = PlanarLayer(2) ∘ LeakyReLU{Float64, 1}(0.05) ∘ RadialLayer(2) ∘ LeakyReLU{Float64, 1}(0.05) ∘ PlanarLayer(2)
d = MvNormal(zeros(2), ones(2))
tb = transformed(d, b);
loss(tb :: Bijectors.TransformedDistribution, x :: Matrix{Float64}) = begin
return sum(-logpdf(tb, x))
end
function nf_train(tb, x, opt, ps, epochs)
@showprogress for i ∈ 1:epochs
train_loss, back = Zygote.pullback(() -> loss(tb, x), ps)
gs = back(one(train_loss))
Flux.update!(opt, ps, gs)
end
end
nf_train(tb, train_data, ADAM(), Zygote.Params(Flux.params(b)), 10)
I get the error:
Mutating arrays is not supported
on
gs = back(one(train_loss))
Any way I can make this work?
Thanks
Having some trouble with taking the gradient wrt. parameters of a Bijector
using Zygote.jl when the computation involves a inversion. MVEs:
julia> using Zygote
julia> using Bijectors
julia> using Bijectors: Shift
julia> Zygote.gradient(θ -> sum(Shift(θ[1:1])(zeros(1))), randn(1)) # <= works
([1.0],)
julia> Zygote.gradient(θ -> sum(inv(Shift(θ[1:1]))(zeros(1))), randn(1)) # <= fails
ERROR: MethodError: no method matching adjoint(::Shift{Array{Float64,1},1})
Closest candidates are:
adjoint(::Missing) at missing.jl:100
adjoint(::IRTools.Inner.CFG) at /var/home/tef30/.julia/packages/IRTools/YplhY/src/passes/passes.jl:29
adjoint(::Number) at number.jl:193
...
Stacktrace:
[1] (::Zygote.var"#1325#1326"{Shift{Array{Float64,1},1}})(::NamedTuple{(:a,),Tuple{FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/lib/array.jl:354
[2] (::Zygote.var"#3416#back#1327"{Zygote.var"#1325#1326"{Shift{Array{Float64,1},1}}})(::NamedTuple{(:a,),Tuple{FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at /var/home/tef30/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
[3] (::typeof(∂(#5)))(::Float64) at ./REPL[5]:1
[4] (::Zygote.var"#38#39"{typeof(∂(#5))})(::Float64) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:36
[5] gradient(::Function, ::Array{Float64,1}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:45
[6] top-level scope at REPL[5]:1
This is also an issue in the case where we use Inverse
:
julia> Zygote.gradient(θ -> Bijectors.Logit(θ[1], θ[2])(0.5), [0.0, 1.0])
([-2.0, -2.0],)
julia> Zygote.gradient(θ -> inv(Bijectors.Logit(θ[1], θ[2]))(0.5), [0.0, 1.0])
ERROR: MethodError: no method matching adjoint(::Inverse{Bijectors.Logit{Float64},0})
Closest candidates are:
adjoint(::Missing) at missing.jl:100
adjoint(::IRTools.Inner.CFG) at /var/home/tef30/.julia/packages/IRTools/YplhY/src/passes/passes.jl:29
adjoint(::Number) at number.jl:193
...
Stacktrace:
[1] (::Zygote.var"#1325#1326"{Inverse{Bijectors.Logit{Float64},0}})(::NamedTuple{(:orig,),Tuple{NamedTuple{(:a, :b),Tuple{Float64,Float64}}}}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/lib/array.jl:354
[2] (::Zygote.var"#3416#back#1327"{Zygote.var"#1325#1326"{Inverse{Bijectors.Logit{Float64},0}}})(::NamedTuple{(:orig,),Tuple{NamedTuple{(:a, :b),Tuple{Float64,Float64}}}}) at /var/home/tef30/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
[3] #9 at ./REPL[21]:1 [inlined]
[4] (::typeof(∂(#9)))(::Float64) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface2.jl:0
[5] (::Zygote.var"#38#39"{typeof(∂(#9))})(::Float64) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:36
[6] gradient(::Function, ::Array{Float64,1}) at /var/home/tef30/.julia/packages/Zygote/S1EoU/src/compiler/interface.jl:45
[7] top-level scope at REPL[21]:1
Hi,
I've played around with the invlink
function for simplex distributions and checked the correctness of the current implementation against the code used in Stan. (https://github.com/stan-dev/math/blob/502487c511594ccb93eb979df5b8fe163becb417/stan/math/rev/mat/fun/simplex_constrain.hpp)
The reimplementation I used is:
function invlink_simplex(y::AbstractVector{T}) where {T<:Real}
N = length(y)
stick = one(T)
x = zeros(T, length(y))
for k in 1:(N-1)
lk = map(T, log(N - k))
z = logistic(y[k] - lk)
x[k] = stick * z
stick -= x[k]
end
x[N] = stick
return x
end
and I found quite a bit of speed improvement if this implementation is used...
K = 1000
y = rand(K)
d = Dirichlet(K, 1.0)
@btime invlink_simplex(y);
@btime invlink(d, y);
77.077 μs (1 allocation: 7.94 KiB)
369.305 μs (1 allocation: 7.94 KiB)
Should I make a PR for this or do we think there are stability issues with the code?
This repo needs travis and appveyor files.
See:
and comments on:
I have opened a PR here to add support for callable struct:
If ReverseDiff supports callable struct. It's possible to define adjoint/grad on callable struct, like (ib::Inverse{<:CorrBijector})(y::AbstractMatrix{<:Real})
directly. So some "internal" functions used to support AD, such as _inv_link_chol_lkj
and _link_chol_lkj
can be removed to simplify code.
Unlike Scale
, Exp
or Log
, Logit
is currently defined as a subtype of Bijector{0}
struct Logit{T<:Real} <: Bijector{0}
a::T
b::T
end
This avoids this to be applied to vectors and matrices easily. We should extend it to Bijectors{N}
in a similar way to other bijectors.
Was testing these against the UnivariateDistribution
fallback in Expectations.jl
, but it fails because these methods aren't defined.
Perhaps we can add a documentation on the main APIs in the readme.md
file, in a way similar to MCMCChain
.
Here are some related julia packages:
Other related packages
Related issue:
logpdf(t::Transform, l::Distribution, x)
Some things that probably can be improved as a follow-up to the initial implementation (but might require changes in other packages such as AD backends):
At the moment, if you call forward(b, x)
you get back a named tuple (rv = b(x), logabsdetjac = logabsdetjac(b, x))
. This was introduced in the first PR for the new interface.
When I started out working on this, using rv
made sense as the transform was always related to some Distribution
. Now a Bijector
can be used as a standalone transform to do more than just transform random variables, e.g. constrained-to-unconstrained transformations in optimization. Therefore I suggest we rename rv
.
A couple of suggestions (in no particular order):
y
since throughout the entire package it's a recurring theme where x
is "un-transformed" and y
denotes transformed. Also, in forward
, x
is the input. This also corresponds with the (x = x, y = b(x), ...)
returned by calling forward(d::TransformedDistribution)
. Though could also be up for discussion in this issue.val
or value
res
or result
Any suggestions?
Consider the following code:
using Bijectors
d0 = (Beta(), MvNormal(2, 1.0))
d1 = (MvNormal(2, 1.0),)
sb0 = Stacked(bijector.(d0), [1:1, 2:3])
sb1 = Stacked(bijector.(d1), [1:2])
logabsdetjac(sb0, [.5, 1, 2]) # <- does not throw
logabsdetjac(sb1, [.5, 1]) # <- throws
The last line throws an error:
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
[1] _empty_reduce_error() at ./reduce.jl:299
[2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:342
[3] reduce_empty(::Base.MappingRF{Bijectors.var"#26#27"{Stacked{Tuple{Identity{1}},1},Array{Float64,1}},typeof(Base.add_sum)}, ::Type{Int64}) at ./reduce.jl:329
[4] reduce_empty_iter at ./reduce.jl:355 [inlined]
[5] mapreduce_empty_iter(::Function, ::Function, ::UnitRange{Int64}, ::Base.HasEltype) at ./reduce.jl:351
[6] _mapreduce(::Bijectors.var"#26#27"{Stacked{Tuple{Identity{1}},1},Array{Float64,1}}, ::typeof(Base.add_sum), ::IndexLinear, ::UnitRange{Int64}) at ./reduce.jl:400
[7] _mapreduce_dim at ./reducedim.jl:318 [inlined]
[8] #mapreduce#620 at ./reducedim.jl:310 [inlined]
[9] mapreduce at ./reducedim.jl:310 [inlined]
[10] _sum at ./reducedim.jl:724 [inlined]
[11] #sum#628 at ./reducedim.jl:720 [inlined]
[12] sum at ./reducedim.jl:720 [inlined]
[13] logabsdetjac(::Stacked{Tuple{Identity{1}},1}, ::Array{Float64,1}) at .../.julia/packages/Bijectors/98OWP/src/bijectors/stacked.jl:102
[14] top-level scope at REPL[16]:1
[15] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088
The cause seems to be here:
Bijectors.jl/src/bijectors/stacked.jl
Lines 96 to 105 in dfca913
because typeof(sb1)
is Stacked{Tuple{Identity{1}},1}
so N = 1
and 2:1
is empty.
A possible solution is:
function logabsdetjac(
b::Stacked{<:Any, N},
x::AbstractVector{<:Real}
) where {N}
init = sum(logabsdetjac(b.bs[1], x[b.ranges[1]]))
mapreduce(+, 2:N; init = init) do i
sum(logabsdetjac(b.bs[i], x[b.ranges[i]]))
end
end
then mapreduce
doesn't throw because an explicit init is given.
The current function name logpdf_with_trans
is a bit unclear. Can we rename it to something more self-explanatory, e.g. logpdf_with_jacobian
?
Currently we've done nothing to improve the "visual inspection" of objects. At the moment it can be very difficult for the user to inspect a Bijector
. And with PR #44 we're going to also have dimensionality in the type which makes it even more important to easily be able to inspect a Bijector
.
Sooo let's have another poll! Which of the following styles do you prefer:
Dims=1
for "container" types, e.g. Composed
and Stacked
julia> Bijectors.Exp()
Exp{0}()
julia> Bijectors.Logit(0.0, 1.0)
Logit{Float64}(a=0.0, b=1.0)
julia> Bijectors.composel([PlanarLayer(10) for i = 1:3]...)
Composed{..., Dims=1}(ts=(PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [-1.10143; 1.33204; … ; -0.725184; -0.279187]
u: [-0.755879; 1.6483; … ; 1.15966; -0.164425]
b: [0.09642]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [-0.913726; -2.03212; … ; 0.0719403; -0.0622856]
u: [-1.28574; -0.0194101; … ; -0.760258; 0.555373]
b: [-0.951921]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [1.14896; 0.710024; … ; -2.24741; -0.904142]
u: [0.917471; 1.07336; … ; -1.01782; 0.230862]
b: [0.535096]
)
))
julia> Bijectors.Exp()
Exp{0}()
julia> Bijectors.Logit(0.0, 1.0)
Logit{Float64}(a=0.0, b=1.0)
julia> Bijectors.composel([PlanarLayer(10) for i = 1:3]...)
Composed(ts=(PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [1.67265; -0.888749; … ; -1.3403; -0.0615207]
u: [0.741016; 0.355929; … ; -1.22854; 0.5851]
b: [1.00029]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [0.401001; -0.586282; … ; 0.359551; 0.736369]
u: [1.26738; -0.194372; … ; 1.13786; -0.0398834]
b: [1.08065]
)
, PlanarLayer{Array{Float64,2},Array{Float64,1}}(
w: [-1.01392; -0.70209; … ; -1.8464; 0.212027]
u: [-0.726696; -2.39618; … ; -0.169901; -0.526239]
b: [1.02857]
)
))
julia> Bijectors.Exp()
Exp{0}()
julia> Bijectors.Logit(0.0, 1.0)
Logit{Float64}(0.0, 1.0)
julia> Bijectors.composel([PlanarLayer(10) for i = 1:3]...)
Composed{Tuple{PlanarLayer{Array{Float64,2},Array{Float64,1}},PlanarLayer{Array{Float64,2},Array{Float64,1}},PlanarLayer{Array{Float64,2},Array{Float64,1}}},1}((PlanarLayer{Array{Float64,2},Array{Float64,1}}([0.700402; -0.489015; … ; -1.31095; -1.94269], [-0.199911; 1.38871; … ; -0.464837; 0.853309], [0.0885244]), PlanarLayer{Array{Float64,2},Array{Float64,1}}([-0.238407; -0.00701318; … ; 0.921103; 1.26528], [-0.17988; -0.0505573; … ; -0.886066; 1.18275], [0.467128]), PlanarLayer{Array{Float64,2},Array{Float64,1}}([0.231724; 0.39539; … ; -0.418577; -1.75923], [-0.055946; 1.37152; … ; 1.26864; -1.2355], [-0.636524])))
To vote: 👍 : for (1), 😕 for (2), and 👎 for (3).
Other suggestions welcome!
Seems to be an issue with Bijectors.jl's compat/distributionsad.jl
:
julia> using DistributionsAD, Bijectors
julia> dists = [Gamma(2, 3) for i = 1:2]
2-element Array{Gamma{Float64},1}:
Gamma{Float64}(α=2.0, θ=3.0)
Gamma{Float64}(α=2.0, θ=3.0)
julia> dist = product_distribution(dists)
Product{Continuous,Gamma{Float64},Array{Gamma{Float64},1}}(v=Gamma{Float64}[Gamma{Float64}(α=2.0, θ=3.0), Gamma{Float64}(α=2.0, θ=3.0)])
julia> x = rand(dist)
2-element Array{Float64,1}:
3.3962613472010568
4.504150811063049
julia> Bijectors.link(dist, x)
2-element Array{Float64,1}:
1.222675222849948
1.5049993740830023
julia> Bijectors.invlink(dist, Bijectors.link(dist, x))
2-element Array{Float64,1}:
1.222675222849948
1.5049993740830023
while on a project with Bijectors.jl only:
julia> using Bijectors
julia> dists = [Gamma(2, 3) for i = 1:2]
2-element Array{Gamma{Float64},1}:
Gamma{Float64}(α=2.0, θ=3.0)
Gamma{Float64}(α=2.0, θ=3.0)
julia> dist = product_distribution(dists)
Product{Continuous,Gamma{Float64},Array{Gamma{Float64},1}}(v=Gamma{Float64}[Gamma{Float64}(α=2.0, θ=3.0), Gamma{Float64}(α=2.0, θ=3.0)])
julia> x = rand(dist)
2-element Array{Float64,1}:
11.597241830334823
19.19914746393061
julia> Bijectors.link(dist, x)
2-element Array{Float64,1}:
11.597241830334823
19.19914746393061
julia> Bijectors.invlink(dist, Bijectors.link(dist, x))
2-element Array{Float64,1}:
11.597241830334823
19.19914746393061
(TestingBijectors) pkg> st
Project TestingBijectors v0.1.0
Status `/tmp/TestingBijectors/Project.toml`
[76274a88] Bijectors v0.6.6
Behavior is caused by the following when used with DistributionsAD.jl:
julia> @which Bijectors.link(dist, x)
link(dist::Product, x::AbstractArray{#s99,1} where #s99<:Real) in Bijectors at /home/tor/.julia/packages/Bijectors/to1z3/src/compat/distributionsad.jl:249
julia> @which Bijectors.invlink(dist, x)
invlink(d::Distribution{Multivariate,S} where S<:ValueSupport, y::Union{AbstractArray{#s25,1}, AbstractArray{#s25,2}} where #s25<:Real) in Bijectors at /home/tor/.julia/packages/Bijectors/to1z3/src/Bijectors.jl:427
where src/compat/distributionsad.jl:249
will call link
for each of the Univariate
components of the Product
dist
while src/Bijectors.jl:427
calls identity, treating dist
as Multivariate
.
Hi guys,
I am using the following package versions (updated to the current DistributionsAD module after last closed issue):
[76274a88] Bijectors v0.8.0
[31c24e10] Distributions v0.23.4
[ced4e74d] DistributionsAD v0.6.1
[f6369f11] ForwardDiff v0.10.10
[37e2e3b7] ReverseDiff v1.2.0
[e88e6eb3] Zygote v0.4.22
I have a problem using invlink
and Reverse Mode AD. When I only import ForwardDiff
and ReverseDiff
, there seems to be method error for the logpdf_with_trans evaluation for the InverseWishart
distribution (all other distributions that I tested worked). MWE is here:
using Distributions, DistributionsAD, Bijectors
using ForwardDiff, ReverseDiff#, Zygote
function get_logpost(distr::Distributions.Distribution)
function logposterior(θₜ::AbstractVector{<:Real})
θ = invlink(distr, reshape(θₜ, (2,2) ) )
lp = logpdf_with_trans(distr, θ, true) #method error here when using ReverseDiff
ll = logpdf(MvNormal([0., .0], θ), [2., 2.])
return ll + lp
end
end
theta_transformed = randn(4)
lposterior = get_logpost( InverseWishart(10, [0.15 0.05 ; 0.05 .15]) )
lposterior(theta_transformed)
ForwardDiff.gradient(lposterior, theta_transformed)
ReverseDiff.gradient(lposterior, theta_transformed) #error - MethodError: no method matching pullback(::DistributionsAD.ReverseDiff....
Now, if I additionally import Zygote, ReverseDiff gradient evaluation suddenly works. However, I get different solutions for ForwardDiff and ReverseDiff/Zygote unfortunately. I suspect the ForwardDiff solution to be correct, MWE here:
using Distributions, DistributionsAD, Bijectors
using ForwardDiff, ReverseDiff, Zygote
function get_logpost2(distr1::Distributions.Distribution, distr2::Distributions.Distribution)
function logposterior2(θₜ::AbstractVector{<:Real})
mu = invlink(distr1, θₜ[1:2] )
sigma = invlink(distr2, reshape(θₜ[3:6], (2,2) ) )
lp = logpdf_with_trans(distr1, mu, true)
lp += logpdf_with_trans(distr2, sigma, true)
ll = logpdf(MvNormal(mu, sigma), [2., 2.])
return ll + lp
end
end
theta_transformed = randn(6)
lposterior2 = get_logpost2( MvNormal([.1, .1]), InverseWishart(10, [0.15 0.05 ; 0.05 .15]) )
lposterior2(theta_transformed)
ForwardDiff.gradient(lposterior2, theta_transformed)
ReverseDiff.gradient(lposterior2, theta_transformed) #works now, but first 2 parameter with MvNormal prior have different solution.
Zygote.gradient(lposterior2, theta_transformed) #same solution as ReverseDiff
A Project.toml
file is required to get registered with the new Julia package management system.
Running the following code
using Turing
@model mdemo(d, N) = begin
Θ = Vector(undef, N)
for n=1:N
Θ[n] ~ d
end
end
# This leads to many numerical issues in gradients
dim = 5 # This works fine
dim2 = 50 # This breaks
d = Dirichlet(ones(dim2) / dim2)
Turing.setadbackend(:forward_diff)
chain = sample(mdemo(d, 1), HMC(5000, 0.1, 5))
Turing.setadbackend(:reverse_diff)
chain = sample(mdemo(d, 1), HMC(5000, 0.1, 5))
produces
julia> chain = sample(mdemo(d, 1), HMC(5000, 0.1, 5))
[ Info: Assume - `Θ` is a parameter
[HMC] Sampling... 0% ETA: 3:23:51
ϵ: 0.1
α: 0.9992054011165935
pre_cond: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0,...┌ Warning: Numerical error has been found in gradients.
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:114
┌ Warning: grad = Real[-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, NaN]
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:115
┌ Warning: Numerical error has been found in gradients.
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:114
┌ Warning: grad = Real[-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, NaN]
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:115
┌ Warning: Numerical error has been found in gradients.
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:114
┌ Warning: grad = Real[-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, NaN]
└ @ Turing ~/.julia/dev/Turing/src/core/ad.jl:115
There's an approach I've been thinking about for MeasureTheory.jl, and it could be nice to have it in Bijectors and avoid the type piracy :)
Say you have
julia> dist = MvNormal(zeros(2), ones(2))
DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
julia> f = PlanarLayer(2)
PlanarLayer{Array{Float64,1},Array{Float64,1}}([0.8187830879660829, -1.280004857469378], [-0.12761939163655306, -0.7795079813636753], [-0.025642876005072607])
As a Function
, f
takes arguments in R^2:
julia> f(zeros(2))
2-element Array{Float64,1}:
0.009244220896803523
0.010647769052268647
To get the pushforward, we need to do
julia> transformed(dist, f)
Bijectors.TransformedDistribution{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}},PlanarLayer{Array{Float64,1},Array{Float64,1}},Multivariate}(
dist: DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
transform: PlanarLayer{Array{Float64,1},Array{Float64,1}}([0.8187830879660829, -1.280004857469378], [-0.12761939163655306, -0.7795079813636753], [-0.025642876005072607])
)
But conceptually, a distribution or measure is a kind of container (since it's a monad). This makes me wonder about using broadcasting for this. In that case we could just write
f.(dist)
I'm not sure if there might be unintended consequences of doing it this way, but it wouldn't be the first case of broadcasting off the beaten path - see https://github.com/JuliaApproximation/QuasiArrays.jl
Another option would be to just add a method
(f::Bijector)(dist::Distribution) = transformed(dist, f)
But if it works, the broadcasting approach seems more natural, from a mathematical perspective. What do you think?
I've noticed a numeric issue when using this package with random walk algorithm. The code below is the minimal example to produce the bug:
using Distributions, Bijectors
dim = 10
dist = Dirichlet(dim, 2.0)
stepsize = 1e10
# Below gives DomainError with high probability
[logpdf_with_trans(dist, invlink(dist, link(dist, rand(dist)) .+ randn(dim) .* stepsize), true) for _ in 1:1_000]
I don't know if a stepsize of 10 is too high or not but it seems to be possible to have this kind of values during adaptation.
It seems reasonable to fix some seed when testing with random data input. This can give some consistency across platforms. I have seen this done in packages like IterativeSolvers.jl. Any reason not to?
The current Readme section on Functions is a bit confusing.
In 1.
we get a value of x = 0.7472542331020509
, but in 2.
the inverse gives z = 0.6543406780096065
and claims x == z
.
I find that depending on your luck (or precision), x
will not always equal z
:
julia> using Random; Random.seed!(42);
julia> x = rand(dist)
0.36888689965963756
julia> y = link(dist, x)
-0.5369949942509267
julia> z = invlink(dist, y)
0.36888689965963756
julia> x==z
true
julia> using Random; Random.seed!(43);
julia> x = rand(dist)
0.3885220158396991
julia> y = link(dist, x)
-0.45352911444046184
julia> z = invlink(dist, y)
0.38852201583969914
julia> x == z
false
Maybe use \approx
to show x ≈ z
?
I wish I could be more specific about the problems in the issue description, but I'm having a hard time codifying them. I've gotten singular exceptions, non-PD exceptions, and non-Hermitian matrix exceptions, neither of which seem to have an obvious cause to me.
MWE:
using Turing, LinearAlgebra
@model function ex(data)
# Set up covariance matrix
sigma ~ filldist(InverseGamma(2, 3), size(data, 2))
omega ~ LKJ(size(data, 2), 1.0)
Σ = Symmetric(diagm(sigma) * omega * diagm(sigma))
# Observations
for t in 1:size(data, 1)
data[t,:] ~ MvNormal(zeros(size(data, 2)), Σ)
end
end
# Parameters
dim = 10
n_obs = 1000
# Variances
sigma = rand(InverseGamma(2, 3), dim)
# Correlation matrix
omega = rand(LKJ(dim, 2.0))
Sigma = Symmetric(omega)
# Data
data = rand(MvNormal(zeros(dim), Sigma), n_obs)'
# Construct model
model = ex(data)
chain = sample(model, NUTS(), 100)
Running the example above gets me
ERROR: LoadError: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
[1] checkpositivedefinite at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:18 [inlined]
[2] cholesky!(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float6
4},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,
1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{Na
medTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{S
et{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},Arra
y{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int
64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.S
elector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega)
,Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1
}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}, ::Val{false}; check::Boo
l) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:219
[3] #cholesky#130 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined]
[4] cholesky at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined] (repeats 2 times)
[5] (::Bijectors.CorrBijector)(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{In
verseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},
1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{Dyna
micPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{F
loat64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64}
,Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tup
le{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array
{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple
{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{Dynam
icPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}) at /hom
e/cameron/.julia/dev/Bijectors/src/bijectors/corr.jl:68
[6] logabsdetjac at /home/cameron/.julia/packages/Bijectors/Fye2h/src/bijectors/corr.jl:101 [inlined]
[7] logpdf_with_trans(::LKJ{Float64,Int64}, ::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillA
rrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:om
ega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.Samp
lerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{
}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultCont
ext},Float64},Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{F
loat64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Flo
at64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarIn
fo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Ar
ray{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10}
,2}}, ::Bool) at /home/cameron/.julia/dev/Bijectors/src/Bijectors.jl:130
If I reduce dims
to 3, sampling basically just hangs forever. Occasionally it'll throw a singular exception, but I haven't been able to recreate it and so I don't have a callstack.
Any thoughts here?
Minor thing I noticed:
Bijectors.jl/src/transformed_distribution.jl
Line 134 in a3ece91
should be:
res = hcat([td.transform(rand(rng, td.dist)) for i = 1:num_samples]...)
Currently, if b isa Scale
then inv(b)
results in Scale(inv(b.a))
. After @willtebbutt educating me in performance and stability of matrix-inversion, it seems like it would be a better idea to do:
(ib::Inversed{Scale})(y) = Scale(inv(ib.orig.a))(y)
(ib::Scale{<:AbstractVector})(y) = Scale(inv.(ib.orig.a))(y)
# in case were we're scaling vectors using matrices => use `\` instead
function (ib::Inversed{<:Scale{<:AbstractMatrix, 1}})(y::AbstractVecOrMat)
return ib.orig.a \ y
end
For reference, the current implementation is
Bijectors.jl/src/bijectors/scale.jl
Lines 13 to 14 in dc736c8
This will also make Scale
compatible with GPU, since CuArrays.jl
does not yet have inv
implemented (see https://github.com/JuliaGPU/CuArrays.jl/issues/440)
Hi, it seems that parameter b in PlanarLayer is not Tracked. I have tried to reproduce the example in documentation and I get
using Bijectors
using Tracker
d = MvNormal(zeros(2), ones(2));
b = PlanarLayer(2, param);
flow = transformed(d, b)
Bijectors.TransformedDistribution{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}},PlanarLayer{TrackedArray{…,Array{Float64,1}},Float64},Multivariate}(
dist: DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
transform: PlanarLayer{TrackedArray{…,Array{Float64,1}},Float64}([1.1142736114526957, -1.8696189914068417] (tracked), [0.6892713124675772, 0.53047763560658] (tracked), -1.0403736727795068)
)
Bijectors is v0.8.1
Seems like there's an issue with type-inference when it comes to replace_diag
:
julia> using ReverseDiff, DistributionsAD, Bijectors
julia> using LinearAlgebra
julia> X = [0.3239552320287513 0.07451752964111555 -0.010516169325024586; 0.07451752964111555 0.8201186717610907 -0.16101202850282761; -0.010516169325024586 -0.16101202850282761 0.8397115540645331]
3×3 Array{Float64,2}:
0.323955 0.0745175 -0.0105162
0.0745175 0.820119 -0.161012
-0.0105162 -0.161012 0.839712
julia> ReverseDiff.gradient(X) do x
L = cholesky(x).L
A = Bijectors.replace_diag(exp, L)
return sum(LowerTriangular(A) * LowerTriangular(A)')
end
ERROR: MethodError: no method matching valtype(::Type{ReverseDiff.TrackedReal{Float64,Float64,O} where O})
Closest candidates are:
valtype(::ReverseDiff.TrackedReal{V,D,O} where O where D<:Real) where V at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:106
valtype(::Type{ReverseDiff.TrackedReal{V,D,O}}) where {V, D, O} at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:107
valtype(::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N where D) where V at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:108
...
Stacktrace:
[1] convert(::Type{ReverseDiff.TrackedReal{Float64,Float64,O} where O}, ::Int64) at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/tracked.jl:252
[2] zero(::Type{ReverseDiff.TrackedReal{Float64,Float64,O} where O}) at ./number.jl:242
[3] *(::LowerTriangular{ReverseDiff.TrackedReal{Float64,Float64,O} where O,Array{ReverseDiff.TrackedReal{Float64,Float64,O} where O,2}}, ::Adjoint{ReverseDiff.TrackedReal{Float64,Float64,O} where O,LowerTriangular{ReverseDiff.TrackedReal{Float64,Float64,O} where O,Array{ReverseDiff.TrackedReal{Float64,Float64,O} where O,2}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/triangular.jl:1947
[4] (::var"#3#4")(::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}) at ./REPL[6]:5
[5] ReverseDiff.GradientTape(::var"#3#4", ::Array{Float64,2}, ::ReverseDiff.GradientConfig{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}) at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/api/tape.jl:199
[6] gradient(::Function, ::Array{Float64,2}, ::ReverseDiff.GradientConfig{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}) at /home/tor/.julia/packages/ReverseDiff/uy0uk/src/api/gradients.jl:22 (repeats 2 times)
[7] top-level scope at REPL[6]:1
julia> ReverseDiff.gradient(X) do x
L = cholesky(x).L
# Convert to array because apparently `Diagonal(L)` isn't okay.
# Results in `Array{<:TrackedReal}`.
A = Array(L)
A = A + exp.(Diagonal(A)) - Diagonal(A)
return sum(LowerTriangular(A) * LowerTriangular(A)')
end
3×3 Array{Float64,2}:
11.1536 9.38521 11.7767
0.0 9.93215 9.95776
0.0 0.0 6.71535
NOTE: Need to import DistributionsAD.jl since this contains the impl of cholesky
.
Distributions.logpdf
is only defined for single samples (and not multiple samples) whereas logpdf_with_trans
is defined for sets of samples as well. To be consistent with the behaviour in Distributions, one should remove the definitions of logpdf_with_trans
for multiple samples and add a loglikelihood_with_trans
function that computes the sum of logpdf_with_trans
applied to multiple samples. This would also avoid all the unnecessary allocations in https://github.com/TuringLang/DynamicPPL.jl/blob/275ccc8791be7d5dc4eb74b6e0c19de247f56d74/src/context_implementations.jl#L264 and https://github.com/TuringLang/DynamicPPL.jl/blob/275ccc8791be7d5dc4eb74b6e0c19de247f56d74/src/context_implementations.jl#L278.
@JuliaRegistrator register()
Migrated from TuringLang/Turing.jl#542
#543 now resolves most of these issues, but there remain a couple of outstanding issues:
A few distributions (Dirichlet, Wishart, InverseWishart) don't have tests to ensure that their link functions are consistent with their logpdf_with_trans implementation. Due to the way that we transform between the constrained and unconstrained representations the Jacobian of the transformation is singular. This isn't a fundamental issue, but I'm not completely sure what the best way to resolve it is beyond directly hacking away at stuff.
The logpdf of the Kolmogorov distribution returns -Inf for x close to zero. I assume this is a numerical stability issue, so I've disabled testing for that for now as it messes everything up, and is probably on the Distributions.jl end.
I've made an effort to make things in the interface type-stable, but there's one bijector(d::Distribution)
call that is currently not type-stable: Truncated
.
For bijector(d::Truncated)
we need to decide which bijectors to compose depending on whether or not it's only lower-bounded, upper-bounded or both.
Also, at the moment, some other univariate distributions (all the ones defined in union TransformDistributions
) also have unstable bijector calls because they are also using the implementation as Truncated
, e.g. ArcSine
, but these we can just implement on a case-by-case basis to get type-stability in all cases except Truncated.
In practice, I don't think type-stability of bijector(d::Distribution)
is that important, as one will usually construct the distributions at the very beginning of the computation. Even so, for the cases where this is not the case it would be nice to have type-stability.
Also, in the case of Truncated{<:Distribution}
the union is of size 3 which can possibly be dealt with by union-splitting since default split-size is 4.
julia> for d in uni_dists
c = @code_typed bijector(d)
if (c.second isa Union) || (c.second == Any)
println("-----------------")
println(d, ": ", c.second)
println("(", minimum(d), ", ", maximum(d), ")")
println(bijector(d))
end
end
-----------------
Arcsine{Float64}(a=2.0, b=4.0): Union{Identity, Logit{Float64}, Composed}
(2.0, 4.0)
Bijectors.Logit{Float64}(2.0, 4.0)
-----------------
Biweight{Float64}(μ=0.0, σ=1.0): Union{Identity, Logit{Float64}, Composed}
(-1.0, 1.0)
Bijectors.Logit{Float64}(-1.0, 1.0)
-----------------
Cosine{Float64}(μ=0.0, σ=1.0): Union{Identity, Logit{Float64}, Composed}
(-1.0, 1.0)
Bijectors.Logit{Float64}(-1.0, 1.0)
-----------------
Epanechnikov{Float64}(μ=0.0, σ=1.0): Union{Identity, Logit{Float64}, Composed}
(-1.0, 1.0)
Bijectors.Logit{Float64}(-1.0, 1.0)
-----------------
Levy{Float64}(μ=0.0, σ=1.0): Union{Identity, Composed{Tuple{Shift{Float64},Log}}}
(0.0, Inf)
Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}((Bijectors.Shift{Float64}(-0.0), Bijectors.Log()))
-----------------
Pareto{Float64}(α=1.0, θ=1.0): Union{Identity, Composed{Tuple{Shift{Float64},Log}}}
(1.0, Inf)
Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}((Bijectors.Shift{Float64}(-1.0), Bijectors.Log()))
-----------------
Truncated(Normal{Float64}(μ=0.0, σ=1.0), range=(-Inf, 2.0)): Union{Identity, Logit{Float64}, Composed}
(-Inf, 2.0)
Composed{Tuple{Bijectors.Scale{Float64},Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}}}((Bijectors.Scale{Float64}(-1.0), Composed{Tuple{Bijectors.Shift{Float64},Bijectors.Log}}((Bijectors.Shift{Float64}(2.0), Bijectors.Log()))))
As implemented, link
is not bijective for Dirichlet distributions:
julia> dist = Dirichlet([1,1])
Dirichlet{Float64}(alpha=[1.0, 1.0])
julia> link(dist, [0.5,0.5])
2-element Array{Float64,1}:
0.0
0.0
julia> invlink(dist,[0.0,0.0])
2-element Array{Float64,1}:
0.5
0.5
julia> invlink(dist,[0.0,1.0])
2-element Array{Float64,1}:
0.5
0.5
Because values must add to one, a Dirichlet with n values only has n-1 degrees of freedom, so the bijection needs to be to ℝⁿ⁻¹. There's an implementation in TransformVariables.jl that seems to work correctly.
It seems likely this is behind the troubles described in TuringLang/Turing.jl#935
Currently there a couple of annoyances when implementing a new Bijector
:
(b::Bijector)(x::T)
cannot be implemented for abstract types on Julia <1.3. This means that we have to implement batch-computations on a case-by-case basis, which is both annoying and sometimes difficult to do in a AD-friendly + type-stable way (we have a bunch of mapvcat
and eachcolmaphcat
methods to do this, which is an unnecessary complication for a newcomer).transform(b, x)
method as the "evaluation" method, as this would allow us to have more generic implementations for batching, etc. But we decided not to do that, as it also felt clunky.forward(b::Bijector, x)
is supposed to allow the user to share computation between the evaluation, i.e. (b::MyBijector)(x)
, and logabsdetjac(b, x)
. Buuut it's annoying to have to first implement (b::MyBijector)(x)
and logabsdetjac(b, x)
, which are mandatory, and then have to go through these methods to figure out what is shared and then copy-paste certain parts to a transform
method, etc.Since we're in Julia, my first idea is of course to throw a macro at problem! I'm thinking introduce transfrom
but make it -super-easy for the user to define everything in one go. I.e. something along the lines of:
struct MyBijector <: Bijector{0} end
@bijector function forward(b::MyBijector, x::Real)
# Shared computation
z = exp(x)
rv = begin
# `(b::Bijector)(x)` goes here
z
end
logabsdetjac = begin
# `logabsdetjac(b, x)` goes here
log(z)
end
end
which is then transformed into something along the lines of
quote
function (Bijectors).transform(b::MyBijector, x::Real)
z = exp(x)
return z
end
(b::MyBijector)(x) = (Bijectors).transform(b::MyBijector, x::Real)
function (Bijectors).logabsdetjac(b::MyBijector, x::Real)
z = exp(x)
return log(z)
end
function (Bijectors).forward(b::MyBijector, x::Real)
z = exp(x)
rv = z
logabsdetjac = log(z)
return (rv = rv, logabsdetjac = logabsdetjac)
end
end
Then the only thing that is left for the user to implement is the inverse evaluation.
Also, I do have a somewhat "dirty" implementation ready (from which the above output was generated + MacroTools.prettify
): https://gist.github.com/torfjelde/8675bba686afdf693476ae1c70f516d3.
This would then allow us to easily transition to transform
, thus ensuring compatibility with Julia <1.3 but still using more generic methods, i.e. transform(b::Bijector{0}, x::AbstractVector) = b.(x)
. It would make it super-easy to share computation in forward
. Finally, we could start thinking about adding in complementary inplace methods, e.g. transform!(b::Bijector, x, out)
, logabsdetjac!(b::Bijector, x, out)
, etc, as a next step.
The only question is: are we overcomplicating things here? Is there an easier way of achieving what we want?
If someone has some time, a softplus
Bijector would be cool to have.
@JuliaRegistrator register()
@JuliaRegistrator register()
I've re-visted some ideas I had earlier on but didn't have quite the knowledge nor time to explore. As a result I think I've come up with some really nice improvements to the Composed
bijector.
The changes suggested here already exists on a local branch of mine and tests are passing without issues. So if people seem happy with this I'll make a PR asap.
We're currently not doing anything to "flatten" the compositions, e.g.
julia> using Bijectors
julia> b = Bijectors.Exp()
Bijectors.Exp()
julia> map(typeof, (b ∘ b ∘ b).ts)
(Bijectors.Exp, Composed{Tuple{Bijectors.Exp,Bijectors.Exp}})
Notice that it we get nested compositions here.
Personally I'd like ∘
to encourage a flat structure, so we instead get:
julia> map(typeof, (b ∘ b ∘ b).ts)
(Bijectors.Exp, Bijectors.Exp, Bijectors.Exp)
The above was accomplished by simply introducing the following lines:
∘(b1::Bijector, b2::Bijector) = composel(b2, b1) # <= already exists
∘(b1::Composed, b2::Bijector) = composel(b2, b1.ts...) # new
∘(b1::Bijector, b2::Composed) = composel(b2.ts..., b1) # new
∘(b1::Composed, b2::Composed) = composel(b2.ts..., b1.ts...) # new
If the user really wants to use nested compositions rather than this flat structure, they can call composel
and/or composer
explicitly. I struggle to see a case where you'd like to group bijectors together but wouldn't already have access to this grouping outside of the composition but I'd still like to allow both possibilities. This solution does exactly that.
Other than just being "nice", it has two advantages over nesting:
Composed{Tuple{B1, B2, B3}}
rather than all combinations of nestedness.Currently we're using a recursive implementation for the following methods for Composed
:
(cb::Composed)(x)
forward(cb::Composed, x)
logabsdetjac(cb::Composed, x)
For small compositions, this isn't really an issue but still sub-optimal. Especially in forward
and logabsdetjac
where we need intermediate values to split apart the forward
result. This can't be done recursively without memory allocation for these intermediate values.
One alternative, which is further improved by encouraging a flat structure as described earlier, is to unroll the recursion using @generated
:
@generated function forward_generated(cb::Composed{T}, x) where {T<:Tuple}
expr = Expr(:block)
push!(expr.args, :((y, logjac) = forward(cb.ts[1], x)))
for i = 2:length(T.parameters)
push!(expr.args, :(res = forward(cb.ts[$i], y)))
push!(expr.args, :(y = res.rv))
push!(expr.args, :(logjac += res.logabsdetjac))
end
push!(expr.args, :(return (rv = y, logabsdetjac = logjac)))
return expr
end
For the code
b = Bijectors.Exp()
forward(b ∘ b ∘ b, randn())
we get the generated code
quote
(y, logjac) = forward(cb.ts[1], x)
res = forward(cb.ts[2], y)
y = res.rv
logjac += res.logabsdetjac
res = forward(cb.ts[3], y)
y = res.rv
logjac += res.logabsdetjac
return (rv = y, logabsdetjac = logjac)
end
Comparing this to the current recursive implementation to new forward_generated
:
julia> using Revise, Bijectors, BenchmarkTools
julia> using Bijectors: Log, Exp
julia> b = Bijectors.Exp()
Exp()
julia> cb = inv(b) ∘ b
Composed{Tuple{Exp,Log}}((Exp(), Log()))
julia> x = randn()
1.59358111703819
julia> @btime Bijectors.forward($cb, $x)
28.426 ns (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)
julia> @btime Bijectors.forward_gen($cb, $x)
28.788 ns (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)
So not much gained here. But if we consider deeply nested compositions the difference becomes quite singificant:
julia> cb = foldl(∘, [inv(b) ∘ b for i = 1:50])
Composed{Tuple{Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log}}((Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log()))
julia> @btime Bijectors.forward($cb, $x)
431.207 μs (201 allocations: 6.25 KiB)
(rv = 1.59358111703819, logabsdetjac = 0.0)
julia> @btime Bijectors.forward_gen($cb, $x)
1.934 μs (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)
I've also consider @inline
for the recursive implementation but for large cases compile-times are quite bad. So I think it's pretty clear which is the winner:)
# For extra neat-ness for a lot of bijectors:)
∘(b1::B, b2::Inversed{B}) where {B<:Bijector} = Identity()
∘(b1::Inversed{B}, b2::B) where {B<:Bijector} = Identity()
# Can also implement on case-by-case basis where we don't use `Inversed`
macro inverses(B1, B2)
expr = Expr(:block)
push!(expr.args, :(Base.:∘(b1::$B1, b2::$B2) = Identity()))
push!(expr.args, :(Base.:∘(b1::$B2, b2::$B1) = Identity()))
return expr
end
@inverses Log Exp # super-easy to declare the inverses
# And these
∘(::Identity, ::Identity) = Identity()
∘(::Identity, b::Bijector) = b
∘(b::Bijector, ::Identity) = b
then
julia> using Bijectors: Log, Exp
julia> Log() ∘ Exp()
Bijectors.Identity()
EDIT: forgot to use @btime forward($cb, $x)
, etc. as @mohamed82008 pointed out.
Directional statistics deals with unit vectors and periodic variables. Directions.jl includes two such distributions: VonMises
(angular) and VonMisesFisher
(spherical). I'm planning to implement more (see https://discourse.julialang.org/t/rfc-taking-directional-orientational-statistics-seriously/31951). For these, we need a SphereBijector
and an AngleBijector
.
SphereBijector
transforms an n-dimensional vector into an n-dimensional unit vector under the Euclidean norm. It's not really a bijector, since it only has a right inverse (the inclusion function), so its Jacobian has a determinant of 0. However, we can still give a logabsdetjac
term that produces a uniform measure (using a standard multivariate normal kernel). See the Stan manual for details. I also have implementations of this transformation at https://github.com/salilab/HMCUtilities.jl/blob/c4602ac/src/constraint.jl#L469-L527 and tpapp/TransformVariables.jl#67.
AngleBijector
simply converts cartesian coordinates of a 1-sphere (circle) to an angle using atan
. When composed with SphereBijector
and shift
, it provides the necessary transformation for VonMises
. Also composing it with scale
lets one transform any periodic quantity.
I'm happy to implement these. But will you take non-bijective functions in Bijectors.jl?
Maybe we should add a page in our docs website (https://turing.ml) to document features in Bijectors.jl
?
Hi,
Really well done package! My post is not really code related, but I was wondering if it might be possible to show (in the container?) the input dimension and actual dimensions that you work with for the transformers.
Afaik, there are a fair amount of transformers that are fully described in a lower dimension than the actual Bijectors output (i.e. I think you use the Cholesky transform and then take the log on the diagonals for Covariance matrices, but keep the whole matrix as output).
As an example for a 3*3 Covariance Matrix, would something like Bijector{9,6} be possible?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.