Comments (12)
What's the current status of this?
I've started working on making the scope of Bijectors.jl a bit wider in #183 by:
- Allowing non-bijective transformations
- Improving support for non-real inputs.
So there's progress but it's not quite there yet. This issue is one of the motivations for that PR.
I've recently been reading about manifold-valued continuous normalizing flows and it appears that they work well without the need to have non-bijective bijectors, see for example here: https://arxiv.org/abs/2006.10254 . They use dynamic trivializations instead. The problem with it is that now instead of a single bijector we have a whole family of them that we need to pick from -- but maybe that's fine?
First off, personally I would be very happy to accomodate whatever is required to implement bijectors for manifolds:) I welcome excuses to look at differential geometry, so if there are cool stuff we could do with Bijectors.jl + Manifolds.jl, I'm very keen!
I just briefly skimmed parts of the paper and looked at the referenced paper that introduces dynamic trivializations, and AFAIK it seems like this could be defined by (p, X) ↦ ϕₚ(X)
with ϕₚ: TₚM → M
e.g. the exponential map, right? This could probably already be implemented using a NamedCoupling
which essentially uses one or more components of the input (p
in this case) to produce the parameters of the bijector which is then applied to some other part of the input (X
in this case). Then you could use a sequence of these + ODE solvers to define MODE
? I.e. this could be done already? Happy to explain more in detail wrt. implementation if I have understood correctly.
Though I'll admit I'm still a bit confused of how one can use retractions in place of the exponential map, I guess one "issue" here is that retractions aren't generally bijective, e.g. the retraction from R^n to S^(n - 1) as mentioned in this issue, hence we'd need support for non-bijective transformations to support the use of retractions in place of exp.
EDIT: Nvm, just read the definition of refraction. Knew about the standard topological definition, but couldn't quite see how to define this for TₚM
to M
. Makes more sense now:)
from bijectors.jl.
We already have a similar non-bijector to the ones you are proposing in Bijectors.jl for transforming an n-dimensional point to an (n-1)-dimensional simplex. But the way we define it is as a bijector + a projection on the simplex (or a truncation). If the ones you propose are just pure projection operators, perhaps another package might be more appropriate. This might be relevant https://github.com/JuliaNLSolvers/ManifoldProjections.jl.
I also wonder why we need to add these functions to Bijectors.jl to use them with the bijectors defined in Bijectors.jl. Can't we just mix them using anonymous functions? If it's logabsdetjac
you are after, perhaps we can define a fallback method for logabsdetjac
for arbitrary functions that makes use of Tracker
. We can also provide a way using traits to have logabsdetjac
drop the zero eigenvalues of the Jacobian before calculating the determinant. If the user defines a method dropzeros(::typeof(f)) = true
for some user-defined f
, then logabsdetjac
drops the zero eigenvalues of the Jacobian of f
before computing the logdet
. @torfjelde does this sound reasonable or doable?
from bijectors.jl.
Fair point. My ultimate goal is to use directional/angular distributions in Turing, at least the two in Distributions, but ideally arbitrary ones. If there's a better way to do that than to add them to Bijectors, I'm happy to do it.
from bijectors.jl.
Can't we just mix them using anonymous functions?
Hmm, unclear to me how to do this 😕 Do you have an idea of how to implement this?
If it's logabsdetjac you are after, perhaps we can define a fallback method for logabsdetjac for arbitrary functions that makes use of Tracker.
Isn't this just ADBijector
with forward-evaluation defined? Though I understand you want it to work with anonymous functions.
We can also provide a way using traits to have logabsdetjac drop the zero eigenvalues of the Jacobian before calculating the determinant. If the user defines a method dropzeros(::typeof(f)) = true for some user-defined f, then logabsdetjac drops the zero eigenvalues of the Jacobian of f before computing the logdet.
Hmm, so I kind of like this idea, but again it's unclear to me how to do it. Seems like just dropping all zeros can do a lot of non-intended stuff, i.e. in the SimplexBijector
you don't want to drop all zeros, you just want to drop the zero the last row and column. And if we can't find a way to use anonymous functions, it seems kind of redundant since then this could just be hard-coded into the logabsdetjac
method for that particular Bijector
?
Now, to what you're talking about @sethaxen, this is something I've been thinking about whether or not we should commit to doing, but haven't gotten around to writing the issue (so thank you!).
For the SimplexBijector
we made a slight exception due to it's prevalence. I think to handle this properly though for many-to-one mappings f
(and thus non-invertible), e.g. f = tan
, we need to integrate over the mass of the preimage of the function. Say you have y = f(x)
and { xᵢ } = f⁻¹(y)
, then you can define the pdf as p(y) = ∑ᵢ p(xᵢ) / |det 𝒥(f, xᵢ)|
. Another example is defining something like AbsBijector
.
The big issue is what to do when you compose these many-to-one and one-to-one mappings. I think we'd have to implement the inverse of the many-to-one as returning a Tuple
of values, and then for the subsequent bijections we actually have to invert all the values in these tuples. We could do all this by adding a abstract type ManyToOneBijector <: AbstractBijector
and implement the interactions with <:Bijector
.
I'm definitively not opposed to this, but it seems like there's quite a bit of work that needs to be done to do this properly 😕
from bijectors.jl.
Well, it all depends on what we want the transformation for. My assumption due to my need is that we are seeking a transformation f: x ↦ y
for x ∈ X
and y ∈ Y
for the purposes of enhancing sampling efficiency for a variable y ∼ π
. To do this we need a distribution ϕ
on X
such that if x ∼ ϕ
, then y = f(x) ∼ π
.
If f
is surjective, then many ϕ
may satisfy this requirement. We only need one. If f
is bijective, then there's exactly one ϕ
, and we can obtain its log density using the area formula (it's a factor of the square root metric density, which for a square Jacobian is the logabsdetjac
term.).
If the transformation is surjective-only, then we need some other way of finding ϕ
. A general rule is that if we can find a measure on X
that pushes forward to the uniform measure on Y
, then we can augment the log density of π
with the log measure to obtain ϕ
. Note that we can't solve this with integration; it's the opposite problem.
This is exactly the motivation for the spherical case above. Normalization is surjective. We know we can draw uniform samples on a sphere by sampling from a multivariate normal distribution and then normalizing. We can actually draw from any multivariate normal, but we choose as a matter of convenience the standard case. Consequently, the normal kernel on Euclidean space provides the necessary term to build ϕ
on X
.
Because f
is surjective, it does not have a left-inverse. It does, however, have a right-inverse that is only necessary to get a valid initial point on x
from a valid point on y = f(x)
. If g: Y → Z
is surjective, then g ∘ f
is also surjective and thus right-invertible by composing the right inverses, and we can still do all of the above by adding the log measure terms.
However, if the goal is to generate truly invertible transformations for some other purpose, then none of the above applies. However, the above describes the requirements needed for the kinds of distributions used by PPLs like Stan and (I think) would be useful in Turing.
Addendum: Some other things we can do with these transformations: We can sample within a bounded volume in ℝⁿ
by sampling in free ℝⁿ⁺¹
, normalizing to the unit sphere in ℝⁿ⁺¹
, projecting to the unit ball in ℝⁿ
, and deforming the ball to the bounded volume. 2 of those 3 composed transformations are surjective, and they give us a powerful set of transformations for certain variables, particularly those that appear in physical models.
Apologies if I've made any notational/terminology errors.
from bijectors.jl.
I don't think it's terribly relevant, but on the off-chance it's useful, here's some early work on adding maps to Manifolds.jl, ultimately with the goal of pushing forward/pulling back objects such as differential forms and distributions: JuliaManifolds/Manifolds.jl#28
from bijectors.jl.
TL;DR: All in all, I agree with you: I don't think we should enforce transformations to be invertible. Now the question is whether or not we should just add a isinvertible
trait or if we should add a abstract type ManyToOneBijector <: AbstractBijector
type. What do you guys think? I personally prefer the abstract type since it will allow us to have different behaviour when composing a invertible and non-invertible transformation, etc.
My assumption due to my need is that we are seeking a transformation f: x ↦ y for x ∈ X and y ∈ Y for the purposes of enhancing sampling efficiency for a variable y ∼ π. To do this we need a distribution ϕ on X such that if x ∼ ϕ, then y = f(x) ∼ π.
I can see this use-case, but it doesn't seem to me like this is a push-forward of a distribution by f? I.e. it doesn't put the same amount of mass on the preimage of f as it does on it's image. Because of this, it seems to me like this wouldn't be case where the change of variables should be used?
(it's a factor of the square root metric density, which for a square Jacobian is the logabsdetjac term.)
👍 for bringing differentiable geometry into the discussion:)
A general rule is that if we can find a measure on X that pushes forward to the uniform measure on Y, then we can augment the log density of π with the log measure to obtain ϕ. Note that we can't solve this with integration; it's the opposite problem.
Sorry, what are you referring to when you say "log measure"? In particular, I don't quite understand what "we can augment the log density of π with the log measure to obtain ϕ" means.
I understand the overall idea though, and this shouldn't be a problem in the current Bijectors.jl. This would just be a matter of overloading the logpdf
for a TransformedDistribution
for these combinations of distributions and bijectors. It doesn't seem like you could make this something "general", since the notion of pushing-forward to uniform measure only works for particular combinations. I think what I outlined before with summing over different subsets can be used for non-invertible map in more general cases (and I as I note below, it seems to me that the Gaussian → uniform on sphere is a particular case of this).
We know we can draw uniform samples on a sphere by sampling from a multivariate normal distribution and then normalizing.
So this is kind of what I'm referring to though, no? To find the probability of a particular point on the sphere, you can compute the probability of the corresponding line in R^d wrt. the Gaussian distribution. This is sort of what I'm referring to above, though in this case we're "summing" over an infinite set of points, i.e. integrating.
I don't think it's terribly relevant, but on the off-chance it's useful, here's some early work on adding maps to Manifolds.jl, ultimately with the goal of pushing forward/pulling back objects such as differential forms and distributions:
That is great! Earlier I was considering whether or not we should try to accomodate for potential cases where you are mapping between manifolds rather than just real spaces. Didn't end up doing it, as I was afraid it would be a bit too general for what the purpose of Bijectors.jl is. Also, we don't need to worry about pushing-forward forms and tangents:) I absolutely love the idea though; definitively going to follow your work on this!:)
from bijectors.jl.
Response to comments in #168 .
It might make sense to have a Transformation{InSpace, OutSpace} and then have Bijector{0} <: Transformation{0, 0}.
This is something along the lines of what I hand in mind to ease the transition.
We can also expand the definition of space/domain similar to TransformVariables.jl domains.
Wait, what does TransformVariables.jl do? Are you referring to the as(Real, ...
) and so on?
I am wondering if the type parameter N can be dropped and the batch computation issue solved in a different way, e.g. by wrapping inputs similar to ColVecs and RowVecs in KernelFunctions.
This can be done, yes. There's a couple of downsides though:
- It's a bit annoying for the end-user.
- We still have to make assumptions about the underlying storage structure.
- We need a more general implementation than just
ColVecs
andRowVecs
as we can be working with arbitrary dimensionalities and whatnot.
An added benefit to it though, is that we could potentially remove the dimensionality from the definitions of the bijectors, e.g.:
struct Batched{T, N}
val::T
end
value(x) = x
value(x::Batched) = x.val
# TODO: implement iterator interface
# General impl
# DOWNSIDE: Means that definitions such as `transform(b::CustomBijector, x)` will be ambiguous.
# (Also, `transform` doesn't exist as of right now, but imagine this to be `(b::Bijector)(x)` for the moment.)
function transform(b::Bijector, xs::Batched)
return Batched(map(b, xs))
end
# Specify further on a per-bijector basis, e.g. `Exp` without dimensionality:
struct Exp end
transform(::Exp, x::AbstractArray) = exp.(x)
transform(::Exp, x::Batched{<:AbstractArray{<:Real}}) = Batched(exp.(value(x)))
logabsdetjac(b::Exp, xs::AbstractArray) = sum(logabsdetjac.(b, xs.val))
function logabsdetjac(b::Exp, xs::Batched{<:AbstractVector{<:Real, N}}) where {N}
# Assume last dimension is batch-dimension
return sum(logabsdetjac.(b, xs.val); dims=N)
end
I still think that the restriction that the output is of the same type as the input is strange and limiting (e.g. it is problematic for the improvements of the LKJ bijector mentioned in #134 (comment)).
Are you referring to the current approach or potential "new" approach? At the moment, we make no restrictions on the input and output types beyond the dimensionality; this is why we went with only putting the dimensionality in the bijector, not specific types. This is also why, if we decide to add actual representations of input- and output-spaces beyond just dimensionality, then we need to be really careful as it can easily just make life way more difficult by introducing a bunch of type-instability and whatnot.
A bijective function requires that the mathematical dimension of the in- and outputs are the same but this dimension is not the same as N in AbstractArray{N} and in particular the in- and outputs may not be of the same structure or size.
100%. But just for the record, I don't think anyone in this discussion thinks that:) This was done because it works really well in most standard cases. The idea was always to move away from it at some point; it was just a matter of time.
it will have almost no composability with other bijectors because compositions require the spaces to be working on the same dimensionality.
Similar to regular Julia functions, one could just drop this requirement and stop trying to reason about outputs/inputs when composing bijectors and throw an error at runtime if they are not compatible.
We might be able to avoid this with a redo of how we handle batching, but the reason why we did this is because we ran into a bunch of errors which didn't throw errors, e.g. if you mistakenly construct a bijector with the wrong dimensionality and call logabsdetjac
on it, you'll see that the output is of the wrong size and you'll catch it. But if it's inside of a composition you might never see it because we might use .+
to add the outputs from logabsdetjac
for two different bijectors. By forcing dimensionality to be the same, this will never happen.
from bijectors.jl.
Wait, what does TransformVariables.jl do? Are you referring to the as(Real, ...) and so on?
Yes but I don't like the whole as
thing. I like the ability to specify domains though. I don't think even TransformVariables fully embraces the concept of domains.
from bijectors.jl.
Yes but I don't like the whole as thing. I like the ability to specify domains though. I don't think even TransformVariables fully embraces the concept of domains.
Yeah, that's why I asked. IMO TransformVariables.jl is a more of a "neat thing" rather than something that's integral to the design of the package, e.g. doesn't have anything to do with compositions of transformations and whatnot.
from bijectors.jl.
What's the current status of this? I've recently been reading about manifold-valued continuous normalizing flows and it appears that they work well without the need to have non-bijective bijectors, see for example here: https://arxiv.org/abs/2006.10254 . They use dynamic trivializations instead. The problem with it is that now instead of a single bijector we have a whole family of them that we need to pick from -- but maybe that's fine?
from bijectors.jl.
What's the current status of this?
I've started working on making the scope of Bijectors.jl a bit wider in #183 by:
- Allowing non-bijective transformations
- Improving support for non-real inputs.
So there's progress but it's not quite there yet. This issue is one of the motivations for that PR.
I've recently been reading about manifold-valued continuous normalizing flows and it appears that they work well without the need to have non-bijective bijectors, see for example here: https://arxiv.org/abs/2006.10254 . They use dynamic trivializations instead. The problem with it is that now instead of a single bijector we have a whole family of them that we need to pick from -- but maybe that's fine?
First off, personally I would be very happy to accomodate whatever is required to implement bijectors for manifolds:) I welcome excuses to look at differential geometry, so if there are cool stuff we could do with Bijectors.jl + Manifolds.jl, I'm very keen!
Oh, nice! I'm going to try soon something along the lines of MODE. Manifolds.jl already has parametrizations so I think the main missing piece is calculation of logabsdetjac
. It would be nice if I could just use AD for that but one has to be very careful to handle AD on manifolds correctly.
I just briefly skimmed parts of the paper and looked at the referenced paper that introduces dynamic trivializations, and AFAIK it seems like this could be defined by
(p, X) ↦ ϕₚ(X)
withϕₚ: TₚM → M
e.g. the exponential map, right? This could probably already be implemented using aNamedCoupling
which essentially uses one or more components of the input (p
in this case) to produce the parameters of the bijector which is then applied to some other part of the input (X
in this case). Then you could use a sequence of these + ODE solvers to defineMODE
? I.e. this could be done already? Happy to explain more in detail wrt. implementation if I have understood correctly.
Yes, the idea is to just compute in charts and then switch charts when appropriate. I'm not sure what's the right design though -- each part separately is simple enough but they can be put together in many different ways. I'll very likely have some questions when I start implementing this 😉 . I'm wondering a bit how useful these chart bijectors would be in general. We can have them as NamedCoupling
s but their parameters would have to be determined dynamically during a forward or reverse pass -- and, to my limited knowledge, it's not something one expects from a bijector/normalizing flow layer.
Though I'll admit I'm still a bit confused of how one can use retractions in place of the exponential map, I guess one "issue" here is that retractions aren't generally bijective, e.g. the retraction from R^n to S^(n - 1) as mentioned in this issue, hence we'd need support for non-bijective transformations to support the use of retractions in place of exp.
EDIT: Nvm, just read the definition of refraction. Knew about the standard topological definition, but couldn't quite see how to define this for
TₚM
toM
. Makes more sense now:)
Yes, retractions in (a large part of) Riemannian optimization are different from topological retractions, and it often causes confusion. Retractions can be bijective between R^n (or an open subset of R^n) and an open neighborhood of a point.
from bijectors.jl.
Related Issues (20)
- Remove heavy usage of `@generated` HOT 2
- Add `rng` to have the reproducibility in `PlanarLayer` and `RadialLayer`
- `logpdf` of `UnivariateTransformed` produces `StackOverflowError` HOT 7
- Zygote is broken for `Stacked` bijectors HOT 5
- filldist, up1 not defined HOT 6
- Adding bijectors for OrderStatistic and JointOrderStatistics HOT 1
- Add API function to retrieve size of bijector output from bijector input HOT 1
- rational quadratic flows not supporting Float32 input HOT 1
- What to do with `CorrBijector` ? HOT 1
- Improve `PDVecBijector`
- Matrix factorization bijectors HOT 4
- Domain Error for VecCholeskyBijector bijector when calling logabsdetjac HOT 4
- Question on simplex bijector implementation HOT 9
- Can't apply Bijectors.ordered to TDist() and MvTDist() HOT 1
- Incorrect bijector for heterogeneous Product distribution HOT 3
- Radial flow to a simplex HOT 5
- Stackoverflow in custom bijector HOT 2
- Missing implementation of `Bijectors.bijector` for `arraydist` distributions. HOT 1
- Bijectors.ordered and MvLogNormal interaction .. only supported for unconstrained distributions. HOT 1
- `TruncatedBijectors` not defined in `Distributions` extension
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bijectors.jl.