Comments (8)
I will have a look at the notebooks.
Apparently Edward has SGHMC implemented?
https://github.com/blei-lab/edward/blob/master/edward/inferences/sghmc.py
and there is also a custom TF implementation
https://github.com/gergely-flamich/BVAE/blob/b6be2223062f6e71ec4657ab631092d62bfe353f/code/sghmc.py
What was the acceptance rate on latent sampling?
I agree that cost-wise it should be the same (except maybe memory).
My hesitation was only based on dimensionality of the problem, and especially with satellites (where also the satellites in same halo might be more correlated?) but I might just being skeptical for no good reason.
from dhod.
Are you referring to Fig 4 of the paper and saying that the stochasticity in PS at the same HOD params is the problem? Then traditional MH (i.e. emcee) way its done right now will suffer from the same issue, right?
Sampling the latent parameters i.e. the "activation" of every galaxy would blow up your parameter space from 5 to N_halos*31.
Unless I am missing something, sampling this, esp. in hierarchical problem, will not be easy.
Taking same analogy as yours - why HMC does has not been able to sample initial density+cosmo parameters :)
For now - it might be worth thinking if its easier/computationally less expensive to sample in this high dimensional space vs just sampling 10 catalog realizations at every step to take their mean.
For starters, can simply take average of 2 and see how much it reduces the variance. Or some clever trick like anti-correlated phases for reducing cosmic variance.
On a different note, for this particular problem, VI?
This is HOD parameters. At the end of the day, I think people only care about the mean values of these parameters to generate catalogs and not the full posterior/variance? Chang or Simone might be able to comment on this better, but I don't think I have seen people sample HOD parameters and then take the variance into account when generating mock catalogs.
from dhod.
Right, so, I agree that the "conventional" is I think also flawed unless you run several sims to obtain the "mean" for the likelihood. We discussed in the past and we kind of agreed it was ok, or at least what people are doing, so not a misrepresentation of a "reference point",
But what worries me a little bit with HMC is that I'm not sure the algorithm is guaranteed to work correctly if the gradients and likelihood are noisy (actually it doesn't converge to the target distribution unless you introduce a friction step: https://arxiv.org/abs/1402.4102)
I also think sampling over 1 million params or more is not that big of a deal here, because they should be mostly uncorrelated variables in the posterior, the activation of one particular central is not strongly correlated to the activation of a particular other one.
from dhod.
And in the example above, where I only do the sampling of the centrals, looks like it's working fine
from dhod.
I should try to do the same thing with the "conventional" approach to compare I guess....
from dhod.
Yep, so I went ahead and quickly coded up the equivalent of our "traditional" method on this toy model where I only sample the centrals. The comparison between the two approaches should be fair, same forward model, just in one case the "likelihood" is stochastic, in the other is is not because we track all latent variables.
And surprise surprise... the stochastic HMC sampling is not working well, very hard to get a good acceptance rate. This is the best result I've achieved so far :-/
Here are my two twin test notebooks:
@modichirag or @bhorowitz if you want to take a look and check whether I'm doing something obviously wrong with the "conventional" sampling, be my guest :-)
And otherwise, if you think about it, tracking all latent variables comes at pretty much no extra cost. The gradients are already computed almost up to the stochastic variables already. The only extra cost is storing the "state vector" of the HMC but it's like a small factor time the size of the catalog, nothing super expansive.
from dhod.
ok, well... after more fine tuning... I guess the stochastic HMC doesnt look that bad:
But takes a lot more sample points, couldn't get the acceptance rate above ~10%
from dhod.
I'll try to upgrade my toy model to sampling as well satellite position and activation, and we'll see what happens.
Regarding SGHMC yeah it was apparently in edward ^^' but not in edward2 as far as I can see. I found a few implementations in TF online, but didn't get a chance to try it. If you want to give it a go @modichirag let me know, maybe it would outperform the current solution in the stochastic hmc. Also I haven't tried to average over several realisations in the stochastic hmc one, I guess this should help.
from dhod.
Related Issues (16)
- RelaxedBernoulli samples not differentiable HOT 5
- Implementation of NFW radial distribution HOT 13
- Adding contributions HOT 7
- Remove heavy TF model from git history HOT 1
- Implement and sample from Zheng2007 HOT 2
- Improve the README
- Add caching to GitHub CI HOT 1
- Proper inference of HOD parameters HOT 1
- Optimize HOD parameters to fit power spectrum HOT 1
- Compute a CIC density field from galaxy catalog HOT 7
- Implement a differentiable correlation function
- Make all sampling functions batch capable HOT 4
- Adding missing contributors HOT 6
- New versions of TFP cause NaN gradients for RelaxedBernoulli near p=0 and p=1 HOT 3
- Set default values for Zheng07 params to halotools values HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dhod.