Hi, I’m trying to approximate the posterior predictive distribution

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Posterior Predictive Distribution about bayesian-neural-networks HOT 5 CLOSED

javierantoran commented on May 21, 2024

Posterior Predictive Distribution

from bayesian-neural-networks.

Comments (5)

stratisMarkou commented on May 21, 2024

Hi @CBird210 and thanks for bringing this up here. From looking at the code it seems that for MC dropout, the method get_weight_samples does not sample the weights but instead gets the raw weight values without turning any of them off. For bayes-by-backprop, the weights are in fact sampled. Any ideas on what's happening in MC dropout @JavierAntoran?

from bayesian-neural-networks.

JavierAntoran commented on May 21, 2024

Hi @CBird210 @stratisMarkou,

As @stratisMarkou said, for MC dropout we return the raw weight values. The MC dropout posterior is composed of delta functions at every parameter value and delta functions at 0. Thus, sampling the weights would randomly return some weight values and some zeros.

The get_weight_samples function was written to get insight into approximate inference behavior by allowing us to plot a histogram of weight values (see top right plot in https://javierantoran.github.io/assets/poster_advml.pdf). For bayes-by-backprop we actually sample weights as this allows us to represent weight posterior variance in the above histogram. For MC dropout, sampling would not tell us much about the range of the learned weights as dropout probabilities are fixed, not learned. Perhaps get_weight_samples is a poor naming choice. I chose it because all of the other approximate inference methods have a function with that exact name, allowing for easy plug-in replacement of approximate inference methods in experiments.

@CBird210 if you call the all_sample_eval function, specifying the parameter "Nsamples", you will get a vector of Nsamples different predictions from the model.

from bayesian-neural-networks.

CBird210 commented on May 21, 2024

Hi,

Thank you so much for getting back to me so quickly!

I noticed that get_weight_samples also seems to give me the exact same numbers if I train the same network twice using Bayes By Backprop. This is just confusing me as it uses the function sample_weights which looks like it should be giving different answers each time. I’m sorry if this is a mistake on my end, could you help me with some clarification?

all_sample_eval looks like it is doing exactly what I needed. However, I noticed that when I use all_sample_eval and just specify Nsamples, the MC Dropout code gives me results over a group of 16 numbers in MNIST while Bayes By Backprop gives me results over a group of 100 numbers in MNIST. Do have an idea of how I could get results from all_sample_eval for the two methods on the same group of data (I’m trying to do a direct comparison of the posterior predictive distribution computed by both)?

Also, when trying to draw parallels between the code and the source material, I’m having a little trouble with parts of the Bayes by Backprop paper. Could you maybe point me in the direction of where in the code steps 4-7 of their algorithm (in section 3.2) are taking place? Once again, I’m new to Python so apologies if this is really obvious.

Thanks again for your help!

from bayesian-neural-networks.

JavierAntoran commented on May 21, 2024

Hi,

I noticed that get_weight_samples also seems to give me the exact same numbers if I train the same network twice using Bayes By Backprop.

This should not happen. You probably have fixed some random seed in your code or you may be mistakenly loading the same saved model for both runs?

I noticed that when I use all_sample_eval and just specify Nsamples, the MC Dropout code gives me results over a group of 16 numbers in MNIST while Bayes By Backprop gives me results over a group of 100 numbers in MNIST.

Nsamples controls how many MonteCarlo sampled are drawn when approximating the posterior predictive. In order to control which data is being evaluated, you need to ensure that your inputs (x, y) are the same. From your comment, its sounds like you are running different batch sizes.

Could you maybe point me in the direction of where in the code steps 4-7 of their algorithm (in section 3.2) are taking place?

Sure. Note that step 4 is written in a bit of a strange way in the paper. For me, that step is more clear in equation 8. In our code, that occurs in lines 198-208

for i in range(samples):
        out, tlqw, tlpw = self.model(x, sample=True)
        mlpdw_i = F.cross_entropy(out, y, reduction='sum')
        Edkl_i = (tlqw - tlpw) / self.Nbatches
        mlpdw_cum = mlpdw_cum + mlpdw_i
        Edkl_cum = Edkl_cum + Edkl_i

mlpdw = mlpdw_cum / samples
Edkl = Edkl_cum / samples

loss = Edkl + mlpdw

Note that there is a sign discrepancy between their algorithm and our optimisation as Pytorch minimises a loss as opposed to maximising a value function.

steps 5-7 occur through automatic differentiation with:

loss.backward()
self.optimizer.step()

Hope this helps!
Javier

from bayesian-neural-networks.

CBird210 commented on May 21, 2024

Sorry for late reply! This was very useful - thank you!

from bayesian-neural-networks.

Posterior Predictive Distribution about bayesian-neural-networks HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent