Comments (10)
Sorry for the very late reply!
@sethaxen, thanks for the clarification. So do I understand this correctly: when estimating a model with a (log) posterior
l(x)
, wherex
is somehow constraint (eg unit sphere), I could sample in some unconstrained spaceZ
with a functionx = g(z)
and then usel(g(z)) + correction for transform
?
Yes! This is correct.
I am OK to give up transformations being bijections in this package, but I want to understand it first, so suggestions for reading materials are welcome. In particular,
1. strictly speaking, this is an identification issue, and some samplers don't like that (don't know about NUTS though),
It's only an identification issue if the chosen g
and correction
induces non-identifiability in Z
, and yes, that would then be a problem for NUTS. This is not a problem for points on a sphere, but it is for points on a hemisphere. See below.
2. usual convergence diagnostics (eg Rhat) on the raw `z` would be nonsensical.
I don't think this is any more nonsensical for this z
than when working with bijectors. Rhat on z
checks for consistency of between- and within-chain variance on Z
, which is useful as a check, but ultimately we care more about convergence in terms of x
anyways. R-hat on a parameter constrained to some manifold can be a bit strange to interpret anyways and perhaps nonsensical (e.g. R-hat of the zero triangle of a Cholesky factor will be a NaN
), but that's just how it is. It probably makes more sense to check R-hat for transforms of manifold-valued parameters. e.g. if one wants to report angle from a unit vector to some reference, R-hat of that angle would probably be more useful than R-hat of any of the coordinates of the unit vector.
This particular approach is I believe a direct consequence of the co-area formula in geometric measure theory, but I unfortunately haven't seen any very accessible explanations of it for this use. So here's a more intuitive explanation in terms of familiar operations. Suppose we have a density
We know that discarding a coordinate in MCMC is equivalent to marginalizing out those coordinates in the target distribution. Similarly, we can augment our distribution. So let
Now, suppose we have a bijective map
In practice, we end up using a map
I have a few ideas for under what circumstances this approach is likely to be useful, but I've never seen a paper that discussed this approach in general terms.
Now for a few examples
The unit sphere
Let
which is just a standard multivariate normal on
In this particular case, if VonMisesFisher([0, 1], 1)
distribution in z
space:
I've noticed with low-dimensional unit vectors, it's much more likely (due to curse of dimensionality) to get low w
values, so the geometry has high curvature when
The unit hemi-sphere
Let the unit hemisphere be the unit sphere but with the constraint that
EDIT: a much better approach is to first transform the first coordinate of exp
and then apply the same transformation as with the unit sphere. i.e. let
from transformvariables.jl.
Wait, @sethaxen has a very elegant fix for this in #67 (which I was stupidly ignoring at the time, apologies), waiting for his permission to port the code.
from transformvariables.jl.
It is explained in the Stan manual.
from transformvariables.jl.
Dear @tpapp, the closest I get in the stan manual is the chapter on Unit Vectors but I don't understand how that explains the implementation of
The specific issue i struggle to understand is that the domain of the UnitVector(n)
transform is only a half-sphere as the last dimension of the transformed vector is always positive
I'm trying to handle angles in an inference problem as described in the stan manual https://mc-stan.org/docs/2_18/stan-users-guide/unit-vectors-and-rotations.html, i.e. I was hoping to do something like
t = UnitVector(2)
cos_θ, sin_θ = transform(t, [1.234])
θ = atan(sin_θ, cos_θ)
But because of the half-plane issue, the range of θ is [0,π] and not [-π,π].
Is this intended behavior? If so, do you have any suggestions on how to handle angles?
best
Jon Eriksen
from transformvariables.jl.
Sorry for closing it too hastily, and thanks for persisting, I can replicate the bug (I think the range in 2d is (-pi/2,pi/2)
though).
I think that a constant is off in the calculations, and we should map from r = (-1, 1)
and use its absolute value and the sign. However, I need to check the algebra. If you get to it first don't hesitate to make a PR, or just send me notes and I will code it up.
(Incidentally, I think Stan just uses Marsaglia's method, with an extra df, so it is not much help if we want a bijection).
from transformvariables.jl.
This is actually a dup of #66, but not closing either in favor of another; I will think about a solution and close them at the same time.
from transformvariables.jl.
I did some reading about this and doing it "uniformly" seems to be a hard problem. However, that is not needed for out purposes, we merely need a bijection. That said, it having nice numerical properties is useful.
#67 is what Stan uses, but it is not a bijection.
I will test out the quick fix I mention above, and if that does not work try spherical coordinates.
from transformvariables.jl.
I did some reading about this and doing it "uniformly" seems to be a hard problem. However, that is not needed for out purposes, we merely need a bijection. That said, it having nice numerical properties is useful.
#67 is what Stan uses, but it is not a bijection.
Strictly speaking, one does not need a bijection. All one needs is to draw samples in an unconstrained latent space with a transformation to constrained space and a log-density correction so that the resulting transformed samples target the correct distribution. For bijective functions, that log-density correction is a logabsdetjac (more generally, logdetsqrtmetric), but there are corrections for non-bijective transformations, which is what Stan uses here. The caveat is that if you have a non-bijective transformation, then you can only define a right-inverse, so the latent unconstrained space must be the ground truth. i.e. instead of mapping from x
in constrained space to z
in latent space to draw a sample, mapping back to x
, then mapping back to z
, instead sample z
in latent space and map from z
to x
only when computing log-density or when returning a draw to the user, keeping the original z
as the starting point for the next transition.
There are ample other cases where it makes sense to have non-bijective transformations. e.g. a user wants to sample a point in a disk. One way to do this is to sample a point on a sphere, with a non-bijective projection that discards one of the axes. The resulting distribution is non-uniform on the disk, so there's a log-density correction that makes it uniform.
I will test out the quick fix I mention above, and if that does not work try spherical coordinates.
There is no chart on the sphere that completely covers it. Every chart has singularities, and if the typical set is localized near a singularity, this will cause divergences. This is, I believe, why Stan chooses a non-bijective transformation here, because the geometry then has no singularities and is well-behaved. It's actually least well-behaved I think for low-dimensional vectors, where in the latent space one can move a short distance away from the origin and suddenly a different step size is needed to step the same distance on the surface of the sphere. But due to concentration of measure, for a high-dimensional multivariate normal, the samples concentrate to the surface of a hypersphere anyways, so this parameterization actually produces a really nice geometry for sampling.
from transformvariables.jl.
@sethaxen, thanks for the clarification. So do I understand this correctly: when estimating a model with a (log) posterior l(x)
, where x
is somehow constraint (eg unit sphere), I could sample in some unconstrained space Z
with a function x = g(z)
and then use l(g(z)) + correction for transform
?
I am OK to give up transformations being bijections in this package, but I want to understand it first, so suggestions for reading materials are welcome. In particular,
- strictly speaking, this is an identification issue, and some samplers don't like that (don't know about NUTS though),
- usual convergence diagnostics (eg Rhat) on the raw
z
would be nonsensical.
from transformvariables.jl.
@sethaxen, thanks for the detailed answer (sorry to see that MathJax is kind of broken now, hopefully it gets fixed). And sorry for the late reply, I am still digesting this. What I still do not understand is
change-of-variables formula to compute the logdetjac (with a tweak since the Jacobian is now non-square)
ie where the
from transformvariables.jl.
Related Issues (20)
- inverse of NamedTuple transformations from different ordering/superset
- Problems in calculating the log absolute Jacobian determinant of the CorrCholeskyFactor transformation HOT 2
- Transformations involving `StaticArray` is not type stable HOT 1
- TransformVariables: a subset of optics functionality? HOT 7
- Vectorizing scalar transforms? HOT 3
- Autodiff with constrained variable with Zygote HOT 8
- inverse does not check for inputs of special arrays HOT 1
- Feature idea: function to check if values lay in domains of transformation HOT 1
- UnitVector only transforms to positive hemisphere HOT 2
- Source links in docs point to julialang HOT 1
- Proposed transformation: Loading matrix for factor analysis HOT 2
- Inverse transformation for empty vectors fails HOT 1
- [not an issue] transform for a given distribution HOT 5
- TagBot trigger issue HOT 23
- Ordered transformations HOT 2
- transform of asℝ₊ variable turns Float32 into Float64 HOT 1
- `inverse` fails for transformation with nested named tuples HOT 1
- Feature Idea: flat transform HOT 5
- Feature Idea: scaling factor for ShiftedExp HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformvariables.jl.