stan-dev / bayesplot Goto Github PK

View Code? Open in Web Editor NEW

416.0 36.0 80.0 349.67 MB

bayesplot R package for plotting Bayesian models

Home Page: https://mc-stan.org/bayesplot

License: GNU General Public License v3.0

R 100.00%

pandoc r-package statistical-graphics stan bayesian ggplot2 visualization mcmc

bayesplot's People

Contributors

Stargazers

Watchers

bayesplot's Issues

pandoc/pandoc-citeproc dependencies

I would suggest changing README.md to,

If you are not using RStudio and you get an error related to "pandoc" you will either need to install pandoc (e.g. brew install pandoc) and pandoc-citeproc (e.g. brew install pandoc-citeproc), or remove the argument build_vignettes=TRUE to avoid building the package vignettes.

or something that suggests you also need pandoc-citeproc otherwise R CMD build fails at the command line. The pandoc documentation was unclear that pandoc-citeproc requires a separate installation (i.e. it's not bundled in with brew install pandoc. (It's probably not that big of a deal if you don't mention it since I'm guessing most people are building packages via RStudio, but at least someone dealing with this issue might find their way here.)

Finalize documentation

Documentation to-do list:

make sure all functions and arguments are sufficiently documented
proofread all help pages
finish/improve/replace vignettes

Histograms (y-label and grouping)

1 - For histograms (e.g., in ppc_stat, and ppc_hist) it would be good to have a y-label which shows the counts (not sure what is plotted currently, counts or density). This is useful specially in ppc_stat_group, because it gives you an extra information about the size of each group.

2 - It would be good to have a ppc_hist_group function for comparing the distributions at a finer level.

Is GGally worth it?

Is GGally's pair plot useful enough to warrant using it for mcmc_pairs and keeping GGally in Suggests? Currently this is what mcmc_pairs uses, but it might be more trouble than it's worth. Working with ggpairs is pretty cumbersome, and although the plot looks ok, it isn't flexible enough to be easily extended to include functionality similar to rstan::pairs.stanfit, which is what really makes pairs plots very useful.

Define more helper functions to prepare input of mcmc functions

Right now, you repeat more or less the same code at the start of many mcmc_* functions. I think it makes sense to define some helper functions to do this.

Check that all plot text is correct (e.g. titles, axis labels, etc.)

Also add any necessary or useful text that's missing

Residuals by time plots

It would be very useful to have residuals by time plots, similar to ppc_ts_grouped plots.

figures don't show in vignettes posted on CRAN

For ppc_*_grouped functions that use facet wrap, allow a second group via facet_grid

Already in progress on group2 branch.

allow for additional plotting options

It would be great to let users customize more the plots from bayesplot to make them ready for presentations or other non-screen usage.

A functionality like rstan_gg_options would be great to have which allows for specification of options which are beyond ggplot theme options and go directly to the used geoms.

Allow to plot residuals in ppc_scatter*

Right now, the x-axis of the pp_scatter* plots are displaying y, but it users may also want to see the residuals (y - yrep) since this resembles the standard residual plots generated for instance by plot(<lm object>) (except that y is plotted on the y-axis there). Do you think it makes sense adding an option to show residuals instead of y itself?

mcmc combination plots (e.g. trace + histogram/density)

for ppc_stat allow functions (not just names of funcions)

if function name supplied then use that for labels, otherwise use 'T()' or something

PPCs for time series

support list of matrices (chains) as input for mcmc plots

see title

Default ggplot theme

Decide what the default ggplot theme should be. See if Andrew has any comments / strong preferences.

Change global theme of plots?

This is more like a questions: Is is possible to change the overall theme of the plots product by bayesplot?

Problems when calling mcmc plots with only one parameter

Minimal example:

mcmc_trace(data.frame(b = rnorm(100)))

Another error occurs when adding a chain column:

mcmc_trace(data.frame(b = rnorm(100), chain = 1))

add a color scheme sensitive to color blindness

Recycle transformations in mcmc_intervals

I was trying out the new stan_betareg and plotted my coefficients with mcmc_intervals.

I'd like to be able to apply a single transformation (e.g. arm::invlogit) to all parameters. It would be great if one could do:

mcmc_intervals(fit, regex_pars = "search_for_coefs_here",
               transformations = arm::invlogit)

without the need of putting together a named list.

Dependence on dplyr?

Currently the use of dplyr functions in the bayesplot source code is inconsistent. I think it's best to either use it as much as possible or not at all. If the former, then this can wait until after the initial release, but if dplyr is to be removed I'd rather not include it as a dependency in the initial release.

mcmc_recover_intervals extensions - bias + coverage

Hi!

I just discovered the intervals function which looks great! I know that I should put all of my parameters one the unit-scale, but in practice I sometimes don't do that (even thought I should, I know). For these circumstances it would be nice to plot things as bias. So instead of showing the true values along with the intervals I would like to see an option which would allow me plot the bias.

Of course, the concept of bias is shaky in a Bayesian world, but as long as I can be sure that my prior is weakly-informative, I would like to be able to do that.

Another very useful extensions (I am happy to open another issue) would be a plot of the coverage when I replicate things a lot of times.

BTW, these tools look awesome to me!

plots to facilitate comparing mcmc estimates to "true" parameter values

After fitting a model to simulated data (using fixed parameter values) plot a comparison of the MCMC estimates and the "true" parameter values to check that they are (approximately) recovered.

function for easily juxtaposing plots and enforcing common axis limits

When comparing plots (e.g. PPCs for two different models for y) it can often be important to make sure the plots use the same axis limits. It would be nice to have a function that takes plot objects and a single x/y axis limit specification and then displays all the plots using that same x/y axis specification.

remove facet labels for the ppc plots

They really don't provide any useful information at all.

Visual Predictive Check

It would be great to have a so-called visual predictive check. To exemplify it I include an example in the form of a simple R script.

This plot is very useful for models which have continuous regressors which are given by the design of the experiemnt at the same value for all the subjects in a data-set. For example, imagine I have many subjects in a clinical study and one measures at pre-defined time-points for all patients whatever is of interest. The plot then allows to compare the raw quantiles of the data at each time-point vs what the model predicts for these quantiles (with its uncertainty).

vpc_example_R.txt

vpc_mtcars.pdf

object of type 'closure' is not subsettable`

I get the error below when I try to plot posterior draws from an HMM.

posterior_array <- as.array(my_fit)
dimnames(posterior_array)
mcmc_areas(posterior_array, pars = c("mu[1]", "mu[2]"))

$iterations
NULL

$chains
 [1] "chain:1"  "chain:2"  "chain:3"  "chain:4"  "chain:5"  "chain:6"  "chain:7" 
 [8] "chain:8"  "chain:9"  "chain:10"

$parameters
 [1] "mu[1]"      "mu[2]"      "mu[3]"      "mu[4]"      "mu[5]"      "mu[6]"     
 [7] "mu[7]"      "mu[8]"      "mu[9]"      "sigma[1]"   "sigma[2]"   "sigma[3]"

Error in theme_get()[newitem_names] : object of type 'closure' is not subsettable

I tried extracting the stan fit object as a dataframe but still get error. Any idea what's going on?

Allow a column to separate chains in mcmc plots

It could make sense too add another argument naming a column in the data.frame / matrix (defaulting to NULL or so) that -- if present -- would allow separating chains. This way one would not be forced to use 3D arrays.

Improve input checking for arguments 'y' and 'yrep'

Right now, only vectors are excepted for y and matrices for yrep. In my point of view this may be too restrictive for a user facing function. For instance, one-dimensional arrays are not allowed as input for y although they look the same as vectors on the surface and users will wonder why there input is invalid.
Maybe one could try to internally coerce y and yrep to a vector / matrix respectively thus allowing more flexible input. However, I also understand if you want to keep the input more restrictive.

Also, it may be good to check that both y and yrep are really numeric before passing them to ggplot2. For instance, when I call ppc_resid with a character vector for y, I get the error message
`Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) :
non-numeric argument to binary operator``
which points in the right direction but isn't optimal from my point of view.

ggplot + grid implementation of rstan's pairs plot

This is basically done on the mcmc_pairs branch but still needs to be cleaned up and tested.

color scheme for trace plots lines

right now ggplot default discrete color scale is used

PPCs by group

Allow stratification by group, e.g. to do PPCs for each group in multilevel model

mcmc_trace: start iteration counting after warmup

This requires a column in the data.frame / matrix that enumerates the iterations (e.g. beginning with 1001, if 1000 warmup samples were used).

ppc_vs_x_grouped cannot handle argument 'time'

When you accidently pass the time argument to ppc_vs_x_grouped it is further passed to ppc_ts_grouped, although x is already used as the time argument here. This leads to duplicated assignments of the latter argument causing an error. Is it possible to just ignore the time argument in ppc_vs_x_grouped instead of passing it further?

ppc_hist_group

suggested in #39

Allow multivariate transformations of parameters in mcmc plots

Currently the transformations argument to the mcmc_* functions only handles univariate transformations (i.e., transformations of scalar parameters).

vignette images not rendering properly in versions on CRAN website

Corresponding issue for existing PR #50

More functionality for residual plots

1 - It would be useful to have grouped residuals. For instance, ppc_resid_hist_group. This can tell us about the distribution of residuals at a finer level which can provide some insight into the fit of the model.

2 - Currently, there is no way to change the x-axis in residual plots. For discrete variables on the x-axis, we might want to have a box-plot for the residuals for each point.

Fix typo in mcmc_combo default arguments: dense -> dens

see title

Check that plots look ok with all colors schemes

divergent transitions on the bottom of traceplots

Is there a way to add to stan_trace() the ability to display divergent transitions on the bottom of traceplots?

Thank you!

boxplots for posterior predictive checks

I think we need an option for pp_check that does boxplots in addition to the options for histograms and overlaid densities. So, a boxplot of the data on the left and then to its right ten or so boxplots of the posterior predictive realizations.

intervals plot

Add mcmc_intervals() function (see rstan::stan_plot) with options like

probability mass included in interval
point estimate (none, mean, median)
show densities
color by R-hat or effective sample size

prepare for ggplot2 update

https://blog.rstudio.org/2016/09/30/ggplot2-2-2-0-coming-soon/

Add a generic 'ppc' (or similar named) function

Likely developers of other packages will want to introduce a convenience function that generates the yreps and calls the pcc_* functions afterwards. As this will likely happen via S3 methods, it might be a good idea to put the corresponding generic in the ppcheck package to make sure that all packages use the same method name (to minimize confusion of users) and to avoid unnecessary function masking.

build ggplot objects in tests

In most of the tests ggplot objects are created but not "built". Some issues will not be detected otherwise.

Clean up PPC vignette

LOO predictive checks

Summary:

Add LOO predictive chesks

LOO probability integral transformation (PIT) predictive check
plot of LOO predictive intervals vs. observations

Description:

Calibration of the marginal predictions can be checked with probability integral transformation (PIT) checks. LOO improves the check by avoiding the double use of data. See Marginal predictive checks in BDA3 p. 152-153. In addition visual predictive checking can be made by plotting LOO predictive intervals and the observations.

Example code for LOO-PIT (I assume visual predictive checking is close to current ppc plot)

data(radon)
y<-radon$log_radon
# Fit the first model
modelA <- stan_lmer(
    log_radon ~ floor + log_uranium + floor:log_uranium + (1 + floor | county),
    data = radon,
    cores = 4,
    iter = 2000,
    chains = 4)

# probability integral transformation (PIT)
# this would be more accurate using conditional cdf's N(y_i|mu^{(s)},sigma^{(s)}
log_likA<-log_lik(modelA, parameter_name = "log_lik")
psisA<-psislw(-log_likA)
predsA<-posterior_predict(modelA)
pitA<-array(0,ncol(predsA))
library(matrixStats)
for (i in 1:ncol(predsA)) {
    pitA[i]<- exp(logSumExp(psisA$lw_smooth[(predsA[,i]<=y[i]),i]))
}

# LOO-PITs should have uniform distribution
par(mfrow=c(1,2))
qqplot(pitA,runif(10000))

Rootograms to assess model fit of count data models

I recently stumbled upon a blog post about rootograms (http://www.fromthebottomoftheheap.net/2016/06/07/rootograms/), which help in assessing model fit of count data models. Given that ppc_dens_overlay is not ideal in vizualizing discrete distributions, rootograms may be a nice alternative. What do you think?

Exchange axes in ppc_scatter* plots

This is of course just a matter of style, but personally I prefer / intuitively expect y to be on the y-axis. Feel free to close this issue, if you prefer the current use of the axes.

stan-dev / bayesplot Goto Github PK

bayesplot's People

Contributors

Stargazers

Watchers

Forkers

bayesplot's Issues

Summary:

Description:

Recommend Projects

Recommend Topics

Recommend Org