stan-dev / bayesplot Goto Github PK
View Code? Open in Web Editor NEWbayesplot R package for plotting Bayesian models
Home Page: https://mc-stan.org/bayesplot
License: GNU General Public License v3.0
bayesplot R package for plotting Bayesian models
Home Page: https://mc-stan.org/bayesplot
License: GNU General Public License v3.0
I would suggest changing README.md
to,
If you are not using RStudio and you get an error related to "pandoc" you will either need to install pandoc (e.g.
brew install pandoc
) and pandoc-citeproc (e.g.brew install pandoc-citeproc
), or remove the argument build_vignettes=TRUE to avoid building the package vignettes.
or something that suggests you also need pandoc-citeproc otherwise R CMD build
fails at the command line. The pandoc documentation was unclear that pandoc-citeproc requires a separate installation (i.e. it's not bundled in with brew install pandoc
. (It's probably not that big of a deal if you don't mention it since I'm guessing most people are building packages via RStudio, but at least someone dealing with this issue might find their way here.)
Documentation to-do list:
1 - For histograms (e.g., in ppc_stat, and ppc_hist) it would be good to have a y-label which shows the counts (not sure what is plotted currently, counts or density). This is useful specially in ppc_stat_group, because it gives you an extra information about the size of each group.
2 - It would be good to have a ppc_hist_group function for comparing the distributions at a finer level.
Is GGally's pair plot useful enough to warrant using it for mcmc_pairs
and keeping GGally in Suggests
? Currently this is what mcmc_pairs
uses, but it might be more trouble than it's worth. Working with ggpairs
is pretty cumbersome, and although the plot looks ok, it isn't flexible enough to be easily extended to include functionality similar to rstan::pairs.stanfit
, which is what really makes pairs plots very useful.
Right now, you repeat more or less the same code at the start of many mcmc_*
functions. I think it makes sense to define some helper functions to do this.
Also add any necessary or useful text that's missing
It would be very useful to have residuals by time plots, similar to ppc_ts_grouped
plots.
Already in progress on group2
branch.
It would be great to let users customize more the plots from bayesplot to make them ready for presentations or other non-screen usage.
A functionality like rstan_gg_options would be great to have which allows for specification of options which are beyond ggplot theme options and go directly to the used geoms.
Right now, the x-axis of the pp_scatter* plots are displaying y, but it users may also want to see the residuals (y - yrep) since this resembles the standard residual plots generated for instance by plot(<lm object>)
(except that y is plotted on the y-axis there). Do you think it makes sense adding an option to show residuals instead of y itself?
if function name supplied then use that for labels, otherwise use 'T()' or something
see title
Decide what the default ggplot theme should be. See if Andrew has any comments / strong preferences.
This is more like a questions: Is is possible to change the overall theme of the plots product by bayesplot?
Minimal example:
mcmc_trace(data.frame(b = rnorm(100)))
Another error occurs when adding a chain column:
mcmc_trace(data.frame(b = rnorm(100), chain = 1))
I was trying out the new stan_betareg
and plotted my coefficients with mcmc_intervals
.
I'd like to be able to apply a single transformation (e.g. arm::invlogit
) to all parameters. It would be great if one could do:
mcmc_intervals(fit, regex_pars = "search_for_coefs_here",
transformations = arm::invlogit)
without the need of putting together a named list.
Currently the use of dplyr functions in the bayesplot source code is inconsistent. I think it's best to either use it as much as possible or not at all. If the former, then this can wait until after the initial release, but if dplyr is to be removed I'd rather not include it as a dependency in the initial release.
Hi!
I just discovered the intervals function which looks great! I know that I should put all of my parameters one the unit-scale, but in practice I sometimes don't do that (even thought I should, I know). For these circumstances it would be nice to plot things as bias. So instead of showing the true values along with the intervals I would like to see an option which would allow me plot the bias.
Of course, the concept of bias is shaky in a Bayesian world, but as long as I can be sure that my prior is weakly-informative, I would like to be able to do that.
Another very useful extensions (I am happy to open another issue) would be a plot of the coverage when I replicate things a lot of times.
BTW, these tools look awesome to me!
After fitting a model to simulated data (using fixed parameter values) plot a comparison of the MCMC estimates and the "true" parameter values to check that they are (approximately) recovered.
When comparing plots (e.g. PPCs for two different models for y
) it can often be important to make sure the plots use the same axis limits. It would be nice to have a function that takes plot objects and a single x/y axis limit specification and then displays all the plots using that same x/y axis specification.
They really don't provide any useful information at all.
It would be great to have a so-called visual predictive check. To exemplify it I include an example in the form of a simple R script.
This plot is very useful for models which have continuous regressors which are given by the design of the experiemnt at the same value for all the subjects in a data-set. For example, imagine I have many subjects in a clinical study and one measures at pre-defined time-points for all patients whatever is of interest. The plot then allows to compare the raw quantiles of the data at each time-point vs what the model predicts for these quantiles (with its uncertainty).
I get the error below when I try to plot posterior draws from an HMM.
posterior_array <- as.array(my_fit)
dimnames(posterior_array)
mcmc_areas(posterior_array, pars = c("mu[1]", "mu[2]"))
$iterations
NULL
$chains
[1] "chain:1" "chain:2" "chain:3" "chain:4" "chain:5" "chain:6" "chain:7"
[8] "chain:8" "chain:9" "chain:10"
$parameters
[1] "mu[1]" "mu[2]" "mu[3]" "mu[4]" "mu[5]" "mu[6]"
[7] "mu[7]" "mu[8]" "mu[9]" "sigma[1]" "sigma[2]" "sigma[3]"
Error in theme_get()[newitem_names] : object of type 'closure' is not subsettable
I tried extracting the stan fit object as a dataframe but still get error. Any idea what's going on?
It could make sense too add another argument naming a column in the data.frame / matrix (defaulting to NULL
or so) that -- if present -- would allow separating chains. This way one would not be forced to use 3D arrays.
Right now, only vectors are excepted for y
and matrices for yrep
. In my point of view this may be too restrictive for a user facing function. For instance, one-dimensional arrays are not allowed as input for y
although they look the same as vectors on the surface and users will wonder why there input is invalid.
Maybe one could try to internally coerce y
and yrep
to a vector / matrix respectively thus allowing more flexible input. However, I also understand if you want to keep the input more restrictive.
Also, it may be good to check that both y
and yrep
are really numeric before passing them to ggplot2. For instance, when I call ppc_resid
with a character vector for y
, I get the error message
`Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) :
non-numeric argument to binary operator``
which points in the right direction but isn't optimal from my point of view.
This is basically done on the mcmc_pairs
branch but still needs to be cleaned up and tested.
right now ggplot default discrete color scale is used
Allow stratification by group, e.g. to do PPCs for each group in multilevel model
This requires a column in the data.frame / matrix that enumerates the iterations (e.g. beginning with 1001, if 1000 warmup samples were used).
When you accidently pass the time
argument to ppc_vs_x_grouped
it is further passed to ppc_ts_grouped
, although x
is already used as the time
argument here. This leads to duplicated assignments of the latter argument causing an error. Is it possible to just ignore the time
argument in ppc_vs_x_grouped
instead of passing it further?
suggested in #39
Currently the transformations
argument to the mcmc_*
functions only handles univariate transformations (i.e., transformations of scalar parameters).
Corresponding issue for existing PR #50
1 - It would be useful to have grouped residuals. For instance, ppc_resid_hist_group. This can tell us about the distribution of residuals at a finer level which can provide some insight into the fit of the model.
2 - Currently, there is no way to change the x-axis in residual plots. For discrete variables on the x-axis, we might want to have a box-plot for the residuals for each point.
see title
Is there a way to add to stan_trace() the ability to display divergent transitions on the bottom of traceplots?
Thank you!
I think we need an option for pp_check
that does boxplots in addition to the options for histograms and overlaid densities. So, a boxplot of the data on the left and then to its right ten or so boxplots of the posterior predictive realizations.
Add mcmc_intervals()
function (see rstan::stan_plot
) with options like
Likely developers of other packages will want to introduce a convenience function that generates the yreps and calls the pcc_* functions afterwards. As this will likely happen via S3 methods, it might be a good idea to put the corresponding generic in the ppcheck package to make sure that all packages use the same method name (to minimize confusion of users) and to avoid unnecessary function masking.
In most of the tests ggplot objects are created but not "built". Some issues will not be detected otherwise.
Add LOO predictive chesks
Calibration of the marginal predictions can be checked with probability integral transformation (PIT) checks. LOO improves the check by avoiding the double use of data. See Marginal predictive checks in BDA3 p. 152-153. In addition visual predictive checking can be made by plotting LOO predictive intervals and the observations.
Example code for LOO-PIT (I assume visual predictive checking is close to current ppc plot)
data(radon)
y<-radon$log_radon
# Fit the first model
modelA <- stan_lmer(
log_radon ~ floor + log_uranium + floor:log_uranium + (1 + floor | county),
data = radon,
cores = 4,
iter = 2000,
chains = 4)
# probability integral transformation (PIT)
# this would be more accurate using conditional cdf's N(y_i|mu^{(s)},sigma^{(s)}
log_likA<-log_lik(modelA, parameter_name = "log_lik")
psisA<-psislw(-log_likA)
predsA<-posterior_predict(modelA)
pitA<-array(0,ncol(predsA))
library(matrixStats)
for (i in 1:ncol(predsA)) {
pitA[i]<- exp(logSumExp(psisA$lw_smooth[(predsA[,i]<=y[i]),i]))
}
# LOO-PITs should have uniform distribution
par(mfrow=c(1,2))
qqplot(pitA,runif(10000))
I recently stumbled upon a blog post about rootograms (http://www.fromthebottomoftheheap.net/2016/06/07/rootograms/), which help in assessing model fit of count data models. Given that ppc_dens_overlay
is not ideal in vizualizing discrete distributions, rootograms may be a nice alternative. What do you think?
This is of course just a matter of style, but personally I prefer / intuitively expect y to be on the y-axis. Feel free to close this issue, if you prefer the current use of the axes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.