maxbiostat / opinion_pooling Goto Github PK

View Code? Open in Web Editor NEW

5.0 6.0 2.0 271.45 MB

Bits and pieces on (logarithmic) opinion pooling

TeX 62.33% R 31.19% Stan 6.48%

r manuscript probability-distributions logarithmic-pooling hierarchical-prior opinion-pooling bayesian-melding

opinion_pooling's Introduction

Opinion pooling and log-linear mixtures (logarithmic pooling)

Bits and pieces on (logarithmic) opinion pooling.

Bayesian inference for the weights in logarithmic pooling

Luiz Max de Carvalho¹, Daniel Villela², Flavio Coelho¹ and Leonardo Bastos ²

¹ School of Applied Mathematics, Getulio Vargas Foundation (FGV), Rio de Janeiro -- RJ, Brazil.

² Program for Scientific Computing, Oswaldo Cruz Foundation, Rio de Janeiro -- RJ, Brazil.

Abstract

Combining distributions is an important issue in decision theory and Bayesian inference. Logarithmic pooling is a popular method to aggregate expert opinions by using a set of weights that reflect the reliability of each information source. However, the resulting pooled distribution depends heavily on set of weights given to each opinion/prior and thus careful consideration must be given to the choice of weights. In this paper we review and extend the statistical theory of logarithmic pooling, focusing on the assignment of the weights using a hierarchical prior distribution. We explore several statistical applications, such as the estimation of survival probabilities, meta-analysis and Bayesian melding of deterministic models of population growth and epidemics. We show that it is possible learn the weights from data, although identifiability issues may arise for some configurations of priors and data. Furthermore, we show how the hierarchical approach leads to posterior distributions that are able to accommodate prior-data conflict in complex models. [ArXiV]

See logPoolR for the core R routines employed here.

opinion_pooling's People

Contributors

Stargazers

Watchers

Forkers

gmendesb rexdouglass

opinion_pooling's Issues

include page numbers in manuscript(s)

currently our templates do not include this basic staple.

ask Raftery about transformation stuff

In EBEB 2016 Mike West said he didn't find the fact that LP is invariant under symmetric transforms (and not invariant under non-invertible transforms) very interesting.

I'm not sure.

Can ask Raftery.

@fccoelho @lsbastos @DVMath , your 2 cents?

cite West (1984)

check Gaussian derivation

Write a script to assert if our formulas are correct.

How to compare the priors?

Professor Raftery suggested using integrated likelihoods. Problem is: once we calculate them, what are going to do with them? Calculate prior odds?

Mrs @lsbastos and @fccoelho, o poder e' de voces!

Ass. Capitao Planeta

background

define LP before the section on its properties

check exponential family calculations are correct

Given in the appendix.
They seem correct, but it'd be nice to work out a simple example, like Beta or Gaussian to see whether the formulae lead to the right answer. Should do for entropy and KL.

prior sensitivity analysis for survival probability example

We should think of a simple PSA for the hierarchical priors on this example and maybe the bowhead example as well.

Move Appendix to a separate file

After the writing is concluded, just move the Appendix to a different file [as per BA guidelines], taking care of preserving the cross-referencing.

Should we generalise the examples of Beta & Gamma to the exponential family?

In their BA paper, Rufo et al. (2012) derive their results for the exponential family. Should we follow them and derive [the properties of] our hierarchical prior for the expo family too? @lsbastos & @fccoelho ?

describe SpIR for `alpha`

We extended the algorithm in Poole & Raftery to include a variable alpha. Need to describe that in the manuscript.

cite and discuss this thesis

https://iris.uniroma1.it/retrieve/handle/11573/1156672/798985/Tesi_dottorato_Marchetti.pdf

Lots of good stuff in it

Examples

Meta analysis;
Savchuk;
Simulated Savchuk: (i) simulated (ii) more data (ii) higher variance/lower variance;
Bowhead: (i) modified SpIR (ii) ~~simulation study~~;

fix case issues

KL order is wrong

We talk about KL(pi || f_i) everywhere but we need KL(f_i || pi).

MCMC sampler

If we have any hope for the hierarchical prior to be applicable in the real-world, we will need to devise a sampler that can incorporate the alphas and sample the model at the same time.

Currently we have working implementations in Stan, but that is just because the pooled prior happens to have a closed-form solution. The sampler has to be able to sample from the pooled prior even when it does not have a closed-form solution [which is the vast majority of interesting cases].

This article contains some useful information on a potential clog in this engine, the IMIS algorithm, that could be used to construct the pooled prior [or posterior, thanks to external Bayesianity].

Arxiv

Explore this trick to build priors on alpha

http://discourse.mc-stan.org/t/two-parameter-distribution-over-the-simplex/3737/7?u=maxbiostat

Examples

Should we keep both Beta and Gamma?

@lsbastos , @fccoelho ?

Format

@maxbiostat, what is the size limit of this abstract? is is and extended abstract or a short paper?

KL divergence needs to be re-written..

...and re-implemented.

Reason: Rufo et al, 2012, propose to minimise KL(pool || expert_i), which I think makes more sense than what we propose, that is KL(expert_i || pool).

change family to set

When we say family we may be implying some sort of connection between the experts' opinions, where none need exist. So the right name for $F(\boldsymbol\theta)$ is 'set of distributions'

Re-factor tpool and dpool

We need a smarter way to take in integration limits

cite these papers

http://www.mdpi.com/1099-4300/20/3/209

study frequentist properties of the posterior of weights

Following the suggestions of Eduardo Mendes (FGV), we should improve the experiments in the simulated example section to include draws from the generating process.

Claudio Struchiner (FGV/Fiocruz) also suggested doing something similar for the melding examples, specially the boarding school one.

Add my middle initial

Please add C. to my name in the manuscript, making it Flávio C. Coelho

Hierarchical priors

Aitchinson: (i) description (ii) moment matching;

power priors

We need to address the connection between our approach and the "power prior" approach of Ibrahim & Chen (2000) and Ibrahim et al. (2015).

proofreading

harmonise "log-normal" and "lognormal"

For the final draft.

cite Synthesis paper by Johnson and West

https://arxiv.org/abs/1803.01984

Pooling or stacking?

For background, see this discussion. I proposed LP could be a way of combining many posteriors. Aki Vehtari points to his paper on stacking distributions.

My question is: should someone look into comparing a choice of the weights in LP that minimises, say, the KL distance to the true model, a bit like Rufo, Martin & Perez did? My idea would be to take the examples from Aki's paper and directly compare LP and stacking.

Not something for now, just maybe a project to keep in mind for the future, @fccoelho @lsbastos @DVMath

equation (31) is wrong

there's a minus sign that shouldn't be there.

Re-do the calcs

fix folder renaming

'CODE' vs 'code' is screwing up the repo

fix "normal" examples

Currently some of the normal (Gaussian) examples are just copy-paste from the binomial ones. Gotta fix that.

Should we provide a figure for the priors of each expert?

If no, how to present this information? Table? @fccoelho @DVMath @lsbastos

Check equation for weights in modified SpIR is correct

@lsbastos , you derived the expression for the modified weights when alpha varies.I wrote the section on SpIR. Can you please check that equation \ref{eq:SpIRweights} is correct?

Is it possible to choose alpha such that the integrated likelihood of the combined prior is bigger than the maximum integrated likelihood of the original priors?

Would be interesting to prove the result in general.

Dominance analysis

For all of the examples considered so far (Beta, Gamma, Normal) the entropy (and sometimes the KL too) optimisation problem is quite unstable, tending to one of the "corners" of the simplex. This happens because the entropy function is "dominated" by a parameter and thus any solution that assigns alpha_j = 1 for some j such that parameter_j = max(parameters) will be the best one.

To see this, consider the normal example. Since the entropy depends only on the variance, the distribution with larger variance will tend to dominate the optimisation problem, leading to the trivial solution (0, ..., 1, 0, ...). where the j-th position is a 1 and everybody else gets weight 0.

It would thus be interesting to analyse the entropy functions for the distributions considered to get insight into how much a parameter dominates the entropy.

The final solution, I think, is propose a constrained optimisation problem, where, for example, one would obtain the maximum entropy distribution with a given mean.

reparametrisation of the log-pool

As previously noted, neither the max_ent nor the KL(pool | expert_i) approaches lead to unique solutions.
In a future study, we could look into some of the ideas here (about linear mixtures) to create a unique mapping between the component distributions and the weights.

interpret results

We need to look at the two tables and the figure and tell a consistent story about the differences/similarities between them.
@DVMath and @fccoelho are invited to give their pitacos as well.

resolve uniqueness claims

fix references

See Professor Genest's corrections to LC note. We're missing volume and edition numbers, stuff like that.

port the WSC 2015 paper to their template

Needed to ensure smooth publication in the proceedings

link to PDF on README is broken

cite this other paper

https://pdfs.semanticscholar.org/a57f/80cac0249fe3aba3719470e2813892945124.pdf

dynamic model example

@fccoelho , @lsbastos and @DVMath :

I propose to use the model in Carvalho, Bastos & Struchiner (2015)

It has three levels:

1. The hyperparameters of the thermodynamic functions;
1. The thermodynamic functions r(T) and K(T);
1. The population size P(t, T)

Problems:
Figure out a sensible propagation strategy: it probably doesn't make any sense to propagate the uncertainty on the hyperparameters caused by uncertainty on P(t,T), although it is certainly desirable to study how the uncertainty about the hyperparameters influences the uncertainty on P(t,T).

An initial idea, thus, is to study only the prior on P(t,T) obtained by combining:

Induced prior on P(t,T) from the hyperparameters + induced prior on P(t,T) from the priors on the functions r(T) and K(T) and the actual prior on P(t,T).
This would be nice because we would be combining uncertainty from three levels and also because we could in principle apply our automatic methods to choose the weights (alpha).

As a bonus, we could look into the combined prior for r(T) and K(T) combining only:
(i) the induced priors from the hyperparameters and (ii) the actual priors on r(T) and K(T).

This paper has a nice (and complicated) example we can use

https://www.sciencedirect.com/science/article/pii/S0167587709001834