Giter VIP home page Giter VIP logo

see's Introduction

easystats: An R Framework for Easy Statistical Modeling, Visualization, and Reporting

downloads total status lifecycle

What is easystats?

easystats is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.

However, there is not (yet) an unique “easystats” way of doing data analysis. Instead, start with one package and, when you’ll face a new challenge, do check if there is an easystats answer for it in other packages. You will slowly uncover how using them together facilitates your life. And, who knows, you might even end up using them all.

Installation

CRAN_Status_Badge insight status badge R-CMD-check

Type Source Command
Release CRAN install.packages("easystats")
Development r-universe install.packages("easystats", repos = "https://easystats.r-universe.dev")
Development GitHub remotes::install_github("easystats/easystats")

Finally, as easystats sometimes depends on some additional packages for specific functions that are not downloaded by default. If you want to benefit from the full easystats experience without any hiccups, simply run the following:

easystats::install_suggested()

Citation

To cite the package, run the following command:

citation("easystats")
To cite easystats in publications use:

  Lüdecke, Patil, Ben-Shachar, Wiernik, Bacher, Thériault, & Makowski
  (2022). easystats: Framework for Easy Statistical Modeling,
  Visualization, and Reporting. CRAN. Available from
  https://easystats.github.io/easystats/

A BibTeX entry for LaTeX users is

  @Article{,
    title = {easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting},
    author = {Daniel Lüdecke and Mattan S. Ben-Shachar and Indrajeet Patil and Brenton M. Wiernik and Etienne Bacher and Rémi Thériault and Dominique Makowski},
    journal = {CRAN},
    year = {2022},
    note = {R package},
    url = {https://easystats.github.io/easystats/},
  }

If you want to do this only for certain packages in the ecosystem, have a look at this article on how you can do so! https://easystats.github.io/easystats/articles/citation.html

Getting started

Each easystats package has a different scope and purpose. This means your best way to start is to explore and pick the one(s) that you feel might be useful to you. However, as they are built with a “bigger picture” in mind, you will realize that using more of them creates a smooth workflow, as these packages are meant to work together. Ideally, these packages work in unison to cover all aspects of statistical analysis and data visualization.

  • report: 📜 🎉 Automated statistical reporting of objects in R
  • correlation: 🔗 Your all-in-one package to run correlations
  • modelbased: 📈 Estimate effects, group averages and contrasts between groups based on statistical models
  • bayestestR: 👻 Great for beginners or experts of Bayesian statistics
  • effectsize: 🐉 Compute, convert, interpret and work with indices of effect size and standardized parameters
  • see: 🎨 The plotting companion to create beautiful results visualizations
  • parameters: 📊 Obtain a table containing all information about the parameters of your models
  • performance: 💪 Models’ quality and performance metrics (R2, ICC, LOO, AIC, BF, …)
  • insight: 🔮 For developers, a package to help you work with different models and packages
  • datawizard: 🧙 Magic potions to clean and transform your data

Frequently Asked Questions

How is easystats different from the tidyverse?

You’ve probably already heard about the tidyverse, another very popular collection of packages (ggplot, dplyr, tidyr, …) that also makes using R easier. So, should you pick the tidyverse or easystats? Pick both!

Indeed, these two ecosystems have been designed with very different goals in mind. The tidyverse packages are primarily made to create a new R experience, where data manipulation and exploration is intuitive and consistent. On the other hand, easystats focuses more on the final stretch of the analysis: understanding and interpreting your results and reporting them in a manuscript or a report, while following best practices. You can definitely use the easystats functions in a tidyverse workflow!

easystats + tidyverse = ❤️

Can easystats be useful to advanced users and/or developers?

Yes, definitely! easystats is built in terms of modules that are general enough to be used inside other packages. For instance, the insight package is made to easily implement support for post-processing of pretty much all regression model packages under the sun. We use it in all the easystats packages, but it is also used in other non-easystats packages, such as ggstatsplot, modelsummary, ggeffects, and more.

So why not in yours?

Moreover, the easystats packages are very lightweight, with a minimal set of dependencies, which again makes it great if you want to rely on them.

Documentation

Websites

Each easystats package has a dedicated website.

For example, website for parameters is https://easystats.github.io/parameters/.

Blog

In addition to the websites containing documentation for these packages, you can also read posts from easystats blog: https://easystats.github.io/blog/posts/.

Other learning resources

In addition to these websites and blog posts, you can also check out the following presentations and talks to learn more about this ecosystem:

https://easystats.github.io/easystats/articles/resources.html

Dependencies

easystats packages are designed to be lightweight, i.e., they don’t have any third-party hard dependencies, other than base-R packages or other easystats packages! If you develop R packages, this means that you can safely use easystats packages as dependencies in your own packages, without the risk of entering the dependency hell.

library(deepdep)

plot_dependencies("easystats", depth = 2, show_stamp = FALSE)

As we can see, the only exception is the {see} package, which is responsible for plotting and creating figures and relies on {ggplot2}, which does have a substantial number of dependencies.

Usage

Total downloads

Total insight datawizard parameters performance bayestestR effectsize correlation see modelbased report easystats
20,699,223 6,148,653 3,620,905 2,532,597 2,440,979 2,418,739 1,900,285 619,397 515,647 291,109 164,490 46,422

Trend

Contributing

We are happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. Pull Requests for contributions are encouraged.

Here are some simple ways in which you can contribute (in the increasing order of commitment):

  • Read and correct any inconsistencies in the documentation
  • Raise issues about bugs or wanted features
  • Review code
  • Add new functionality

Code of Conduct

Please note that the ‘easystats’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

see's People

Contributors

bwiernik avatar dominiquemakowski avatar etiennebacher avatar github-actions[bot] avatar indrajeetpatil avatar jdtrat avatar jeffreyrstevens avatar mattansb avatar pdwaggoner avatar rempsyc avatar strengejacke avatar tolgakurtuluss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

see's Issues

Axis Label Rotating Option

Per the other issue, here is my simple solution, at least to get the ball rolling on this option. Happy to update address (or abandon) accordingly, depending on your preferences, @DominiqueMakowski and @strengejacke :

## Note the change in the function call as well as in the code at the end of the function:

theme_modern <- function(plot.title.size=15, plot.title.face="plain", plot.title.space=20, legend.position = "right", axis.title.space = 20, legend.title.size = 13, legend.text.size = 12, axis.title.size = 13, axis.title.face = "plain", axis.text.size = 12, tags.size=15, tags.face="bold", rotate=TRUE) {
  
  
  # Remove legend title if necessary
  if (is.null(plot.title.size)) {
    plot.title.size <- element_text(size = plot.title.size, face = plot.title.face, margin=margin(0,0,plot.title.space,0))
  } else if (plot.title.size == "none") {
    plot.title.size <- element_blank()
  } else {
    plot.title.size <- element_text(size = plot.title.size, face = plot.title.face, margin=margin(0,0,plot.title.space,0))
  }
  
  # Remove legend title if necessary
  if (is.null(legend.title.size)) {
    legend.title.size <- element_text(size = legend.title.size)
  } else if (legend.title.size == "none") {
    legend.title.size <- element_blank()
  } else {
    legend.title.size <- element_text(size = legend.title.size)
  }
  
  # Remove axis title if necessary
  if (is.null(axis.title.size)) {
    axis.title.size <- element_text(size = axis.title.size, face = axis.title.face)
  } else if (axis.title.size == "none") {
    axis.title.size <- element_blank()
  } else {
    axis.title.size <- element_text(size = axis.title.size, face = axis.title.face)
  }
  
  # Remove axis text if necessary
  if (is.null(axis.text.size)) {
    axis.text.size <- element_text(size = axis.text.size)
  } else if (axis.text.size == "none") {
    axis.text.size <- element_blank()
  } else {
    axis.text.size <- element_text(size = axis.text.size)
  }
  
  # rotate x axis labels or not?
  if (rotate == TRUE) {
    axis.text.x <- element_text(angle=45, vjust=0.5, size=9)
  } else {
    axis.text.x <- element_text(angle=0, vjust=0.5, size=9)
  }
  
  theme_classic() +
    theme(
      plot.title = plot.title.size,
      legend.position = legend.position,
      legend.text = element_text(size = legend.text.size),
      legend.title = legend.title.size,
      legend.key = element_blank(),
      legend.spacing.x = unit(2, "pt"),
      axis.title = axis.title.size,
      axis.title.y = element_text(margin = margin(t = 0, r = axis.title.space, b = 0, l = 0)),
      axis.title.x = element_text(margin = margin(t = axis.title.space, r = 0, b = 0, l = 0)),
      axis.text = axis.text.size,
      axis.ticks = element_blank(),
      plot.tag = element_text(size = tags.size, face = tags.face),
      strip.background = element_blank(),
      strip.text = element_text(face="bold"),
      axis.text.x = axis.text.x
    )
}


## WORKING EXAMPLE
library(ggplot2)

p1.rotated.default <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + 
  geom_boxplot() + theme_modern() 
p1.rotated.default

p2.not.rotated <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + 
  geom_boxplot() + theme_modern(rotate=FALSE) 
p2.not.rotated

Use of within

@mattansb @DominiqueMakowski

Currently, plot.see_bayesfactor_models() uses within() to create variables. The help-file, however, states:

For programming however, i.e., in one's functions, more care is needed, and typically one should refrain from using with(), as, e.g., variables in data may accidentally override local variables, see the reference.

Should we re-write the code in a less beatufiul, non-piped way to avoid with() / within()?

heatmap-plot

Hi @strengejacke - apologies for my delay here. Yeah, I have some MWE for each. First, for the ePCP (expected proportion correctly predicted), we are looking for a higher value, bounded between 0 and 1, where 1 = a model that perfectly predicted all actual observations. Here is some sample code using mtcars:

library(OOmisc) # for ePCP
library(ggplot2)

logitmod <- glm(vs ~ mpg, data = mtcars); summary(logitmod)
logitmod2 <- glm(vs ~ mpg + cyl, data = mtcars); summary(logitmod2)

y <- mtcars$vs
pred1 <- predict(logitmod, type="response")
pred2 <- predict(logitmod2, type="response")

# ePCP, expected proportion of correct prediction
epcp1 <- ePCP(pred1, y, alpha = 0.05) # define observed values and obtain predicted values
epcp2 <- ePCP(pred2, y, alpha = 0.05)

epcpdata <- data.frame(rbind(epcp1, epcp2))
epcpdata$model <- c(1,2)
epcpdata$count <- factor(c(1,2), label = c("Model1","Model2"))

# Now the plot
ggplot(epcpdata, aes(x=model,y=ePCP,colour=count)) +
  geom_bar(position=position_dodge(), stat="identity", fill = "darkgray") +
  geom_errorbar(aes(ymin=lower, ymax=upper),
                width=.2,
                position=position_dodge(.9)) +
  labs(x="Model Specification",
       y="Expected Proportion of Correct Prediction",
       title = "Comparing ePCP between Model 1 and Model 2",
       colour="Model Specification") +
  theme_bw()

And next, for the heatmapFit, which is a little less flexible, the goal is to compare model predictions with smoothed empirical predictions. The 45-degree line references a perfect fit. Any deviance from that line suggests a loss in goodness of fit. The “p-value” legend shows if any deviance is statistically significant (dark color means statistical significance). Using the same models as an example from mtcars, it would look something like:

## heatmapfit
library(heatmapFit) # for heat maps

# quick rescale to bound predictions between 0 and 1
range <- function(x){
  (x-min(x))/(max(x)-min(x))
  }

pred1 <- range(pred1)
pred2 <- range(pred2)

heatmap.fit(y, pred1, reps = 1000, legend = TRUE)
heatmap.fit(y, pred2, reps = 1000, legend = TRUE)

** You can copy and paste this code to see the figures. Pretty clean, efficient renderings of model fit.

** And of course, ROC curves are always great, simple model fit checks. Let me know thoughts if you have them. Hope this helps a bit!

Originally posted by @pdwaggoner in easystats/performance#38 (comment)

scope

Based on this discussion.

@strengejacke feel free to change the name etc., I've decided to push the creation of this package to add in a few things that I need for a paper I am writing. Hope it's ok with you 😉

pairwise contrasts visualisation: an idea (lighthouse plot)

Context

Plotting pairwise contrasts is tricky because the "significance" of the contrasts is often quite poorly reflected by the CIs of the estimated means.

Russ (from emmeans) suggested the following:

  • The way SAS does it is to plot each pair of means in a 2D plot, along with the identity line and CIs (scaled by sqrt(1/2)) for the differences of the means going perpendicular to the identity line, centered at each point. A difference is statistically significant if the interval does not cross the identity line. I programmed that up, but I find it very hard to label the plot effectively and don't care much for it.
  • You can try emmeans::plot(emm, comparisons = TRUE) where emm is the result of an emmeans() call. This adds red arrows to the plot which indicate significant differences when the arrows don't overlap. This uses an ad hoc algorithm, and it is not guaranteed to be possible, especially when the SEs of the differences vary widely. Feedback from users has not been all that enthusiastic, and I'm not sure I like it so much either, but to me, when it works, it's better than the SAS approach.
  • If you do emmeans::CLD(emm), it adds a column to the summary of emm showing a compact letter display. Two means that share the same letter are significantly different. This works, but it is very black-and-white: with alpha = .05, P values slightly above or below .05 make a difference, but there's no difference between a P value of .051 and one of .987, or between .049 and .00001 [note that recently Russ decided not to support this method anymore]

Here's how emmeans is currently doing it:

image

The blue bars are confidence intervals for the EMMs, and the red arrows are for the comparisons among them. If an arrow from one mean overlaps an arrow from another group, the difference is not significant, based on the adjust setting (which defaults to "tukey"). (Note: Don’t ever use confidence intervals for EMMs to perform comparisons; they can be very misleading.)

Proposition

However, I am not entirely satisfied with these ways of representing contrasts, which I found quite unintuitive or complex. I feel like there is a gap in the available ways of representing estimated contrasts. As I am often using them, I tried to imagine different ways of representing the differences between factor levels.

I currently have an idea in mind, but I can't even say if it's a good solution, as I have troubles making the plot I have in mind in the first place. Get ready for some paint 🎨

The idea would be to represent at the same time means and intervals (via pointrange). Then, each pairwise contrast would be shown as a "ray of light" (a triangle basically) starting from the lefthand level, and spanning over the CI of the difference + the mean of the left level.

For the contrast between levels A and B, there would be a triangle with one corner located at the mean of A, and it would "project" over the interval of the CI of the difference at B.

image

In dark red are the estimated means and their CIs and in light green would be the CI_high and CI_low of the difference (added to the mean of A). The dark green line would be the "mean" difference.

This could be interpreted as: for one this given value of A, B would be in the green area. I.e., independently of the variability of the means, B would be always (in this case) bigger than A.

For multiple levels, I have the means:

library(tidyverse)
library(rstanarm)
library(estimate)
library(see)

model <- stan_glm(Sepal.Width ~ Species, data=iris)
contrasts <- estimate_contrasts(model)
means <- estimate_means(model)

means %>%
  ggplot() +
  geom_pointrange(data=means, aes(x=Species, y=Median, ymin=CI_low, ymax=CI_high))

Created on 2019-04-17 by the reprex package (v0.2.1)

But then I have no clear idea how to approach something like this:

image

(values are arbitrary)

I don't if the best way to achieve this is through geom_polygon or geom_ribbon.

Does this idea makes any sense?

Improve lighthouse plot

  • Add median/mean difference
  • Enable the plotting of individual draws as segments instead of CIs

pizza (and other BF) plots

In bayestestR I mentioned Wagenmakers' pizza analogy for the BFs.

It's not a secret that my favourite dish is pizza.

I thought there a package for everything in R. Thus I looked for some pizza themes for pie charts...

didn't find one

Then I thought, this is something for see!

@mattansb what do you say about plot methods for the BFs objects that would return pizza pies (pie charts with a theme_pizza)?? Wouldn't that be the coolest thing in the world?

CRAN or too soon?

Even though this is currently a relatively small and preliminary package, maybe we could think about releasing it on CRAN. Thus, we could start using it in vignettes etc. What do you think?

Plotting data from `emmGrid`

  • ci
  • equivalence_test
  • estimate_density
  • hdi
  • rope
  • p_direction

This would also mean that the plotting functions in see need to be updated to use as.data.frame(as.matrix(as.mcmc.emmGrid(x, names = FALSE))), right?

From here

Finalize draft for plot.point_estimate()

I like this idea. Maybe something like: which_estimates(c("All", "Median", "Mean", "MAP"))

We could derive this from bayestestR::point_estimate(), where this argement is named centrality. We could then save the option as attribute, and reading that attribute in the plot-method.

(0502ac0)

theme_solarized()

Something like ggthemes::theme_solarized(light = FALSE), but with a more contrastful color for axis labels, and maybe more beatiful legend?

Here's the ggthemes-theme:

library(easystats)
model <- rstanarm::stan_glm(mpg ~ wt + gear + cyl + disp, data = mtcars)
result <- equivalence_test(model)

plot(result) + ggthemes::theme_solarized(light = F) + scale_fill_material()

Created on 2019-05-16 by the reprex package (v0.2.1)

@DominiqueMakowski what would you say? Can you quickly copy/re-use the code from the dark-theme?

ncol, ncols, n_cols or n_columns?

Currently in plots() we use ncol, but in plot.bayestestR_hdi we use n_columns. We might wanna harmonize this. For what it's worth, I think ncol and nrow are the most commonly used in R.

A JDSSV Paper?

Hi @DominiqueMakowski and @strengejacke - There was a brand new journal just announced called "Journal of Data Science, Statistics, and Visualisation". I found out about it through the ASA listserv. They announced a call for the first round of papers to be published. There are no APCs and its open source! What do you guys think about preparing and submitting a paper covering an introduction and implementation of see? It could be a really cool thing to get easystats in on the ground level of a new journal related to what we all do.

Here is the homepage: https://jdssv.org/index.php/jdssv

Let me know if you're on board. If so, I would be happy to take the lead on the paper and start drafting. If not, no worries.

plotting functions not working

Describe the bug
can't reproduce many of the plots from: https://easystats.github.io/see/articles/bayestestR.html, eg, priors = TRUE doesn't work; p_significance not working either, yields: Error in plot.window(...) : need finite 'xlim' values

To Reproduce
Steps to reproduce the behaviour:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behaviour
I expected to be able to reproduce the [see?] plots but cannot

Screenshots
Screen Shot 2019-11-05 at 2 24 31 PM

Specifiations (please complete the following information):
everything is 100% up to date

geom_point2 and shapes

Hi,
thanks for the package, I especially like the color layout and plot themes.
However, I ran into problems using geom_point2

When I run this code


library(ggplot2)
library(see)

p1 <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species)) + 
  geom_point() + theme_modern()

p2 <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species)) + 
  geom_point2() + theme_modern()

p3 <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species, shape = Species)) + 
  geom_point() + theme_modern()

p4 <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species, shape = Species)) + 
  geom_point2() + theme_modern()


plots(p1, p2, p3, p4, ncol = 2, tags = T)

it produces the following plot

Rplot

You can see that plot B produces more diamond like points, compared with A.
Also, geom_point2 seems to not support shapes (plot D). Regulard geom_point works fine (plot C).


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_Germany.1252  LC_CTYPE=English_Germany.1252    LC_MONETARY=English_Germany.1252 LC_NUMERIC=C                     LC_TIME=English_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] see_0.1.0     ggplot2_3.1.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       rstudioapi_0.10  magrittr_1.5     tidyselect_0.2.5 insight_0.3.0    munsell_0.5.0    colorspace_1.4-1 R6_2.4.0         rlang_0.3.4      plyr_1.8.4       dplyr_0.8.1     
[12] tools_3.5.2      grid_3.5.2       gtable_0.3.0     withr_2.1.2      digest_0.6.19    lazyeval_0.2.2   assertthat_0.2.1 tibble_2.1.1     crayon_1.3.4     gridExtra_2.3    purrr_0.3.2     
[23] ggridges_0.5.1   glue_1.3.1       labeling_0.3     compiler_3.5.2   pillar_1.4.0     scales_1.0.0     pkgconfig_2.0.2

Equivalence test plot: tails distortion

library(see)
library(bayestestR)
library(rstanarm)

junk <- capture.output(model <- rstanarm::stan_glm(mpg ~ wt + gear + cyl + disp, data = mtcars))
plot(equivalence_test(model, ci=0.5))
#> Warning: Possible multicollinearity between disp and wt (r = 0.63), disp
#> and cyl (r = 0.71). This might lead to inappropriate results. See 'Details'
#> in '?equivalence_test'.
#> Picking joint bandwidth of 0.0467
#> Warning: Removed 7996 rows containing non-finite values
#> (stat_density_ridges).

Created on 2019-05-17 by the reprex package (v0.2.1)

Currently, if I understand, we start by keeping the samples of the posterior within the CI. Then, these remaining samples are passed to density_ridges which computes the density and plots it.

Unfortunately, the density estimation of a subsample done by density_ridges gives a distorted vision (especially in the tails, which is usually of interest) of the actual distribution, even compared to a regular geom_density (which is already distorted):

library(see)
library(bayestestR)
library(rstanarm)
library(ggplot2)
library(ggridges)

df <- data.frame(x = c(rnorm(10000), rnorm(10000)), 
                y = rep(c("a", "b"), each=10000))

df <- df[df$x < 1.5 & df$x > -1.5, ] # Let's say this is my HDI bounds
       

ggplot(df, aes(x = x, y=y)) +
  geom_density_ridges()
#> Picking joint bandwidth of 0.109

ggplot(df[df$y == "a",], aes(x = x)) +
  geom_density()

Created on 2019-05-17 by the reprex package (v0.2.1)

One of the ways to address it is to do the density estimation beforehand, then select the HDI section, and pass that to geom_ridgeline:

library(see)
library(bayestestR)
library(rstanarm)
library(ggplot2)
library(ggridges)al
library(dplyr)


df <- data.frame(x = c(rnorm(10000), rnorm(10000)), 
                 y = rep(c("a", "b"), each=10000))

df2 <- rbind(
  bayestestR::estimate_density(df[df$y == "a", "x"]) %>% 
    dplyr::mutate(group="a"),
  bayestestR::estimate_density(df[df$y == "b", "x"]) %>% 
    dplyr::mutate(group="b")
)
df2 <- df2[df2$x < 1.5 & df2$x > -1.5, ] # Let's say this is my HDI bounds

ggplot(df2, aes(x = x, y=group, height=y)) +
  geom_ridgeline()

Created on 2019-05-17 by the reprex package (v0.2.1)

Plot Cooks distance

This is some great stuff - I really like the new check_* family. In this vein, I wrote a function to check for influential observations via cook's distance. Its in a similar flavor as your measures @strengejacke , and also includes options to print names of outliers, drop outliers (and write a new df), and plot cook's distance.

I would welcome any thoughts here or whether this is even a desired "check". Thanks all!

check_outliers <- function(model, print_outliers = TRUE, drop_outliers = FALSE, plot = TRUE) {
  data <- insight::get_data(model)
  n <- nrow(data)
  
  #calculate cook's distance
  cook <- stats::cooks.distance(model)
  o <- length(as.numeric(names(cook)[(cook > (4/n))]))
  outliers <- ifelse(o >= 1, 1, 0)
  
  # check for outliers via cook's distance
  if (outliers >= 1) {
    insight::print_color(sprintf("Warning: Outliers detected.\n"), 'red')
  } else insight::print_color(sprintf("OK: No outliers detected.\n"), 'green')
  
  #option to print names of outliers
  if (print_outliers == TRUE) {
    print((names(cook)[(cook > (4 / n))]))
  }
  
  # option to drop outliers and write a new data frame (- outliers)
  if (drop_outliers == TRUE) {
    data_dropped <- data[!(cook > (4 / n)), ]
    return(data_dropped)
  }
  
  # option to plot cook's distance values, with line denoting likely outliers
  if (plot == TRUE){
    plot(cook, pch = 19, 
         main = "Check for Influential Observations\n(Cook's Distance)",
         ylab = "Cook's Distance Value")
    abline(h = (4 / n), 
           col = "red",
           cex = 4)
    text(x = 1:length(cook) + 1, 
         y = cook, 
         labels = ifelse(cook > (4 / n), 
                         names(cook), ""), 
         col = "red")
  }
}

# example
mt1 <- mtcars[, c(1,3)]  # select only mpg and disp (continuous)
oo <- data.frame(mpg = c(37, 40), # create some fake outliers
                 disp = c(300, 400))
mt2 <- rbind(mt1, oo)  # attach outliers to main df
model <- lm(disp ~ mpg, data = mt2) # fit model with outliers

# call the function
check_outliers(model, print_outliers = TRUE, drop_outliers = FALSE, plot = TRUE)
#> Warning: Outliers detected.
#> [1] "1" "2"

Created on 2019-06-03 by the reprex package (v0.3.0)

Originally posted by @pdwaggoner in easystats/performance#31 (comment)

Vignette / docs / examples

I think some of the plot-functions can be described in more details. Even me as package maintainer is struggling with some of the arguments ;-) especially for plot.bayesfactor_models().

Improve plots()

  • Remove annoying tableGrobs textual output
  • Improve alignment (?)

DOI

What about creating a DOI for this so it can be properly cited in bayestestR's paper (see here)?

plot point_estimate

library(bayestestR)
library(see)

posterior <- distribution_beta(100, 0.4, 0.2)
plot(point_estimate(posterior))
#> Error in unit(rep(1, nrow), "null"): 'x' and 'units' must have length > 0

Created on 2020-01-03 by the reprex package (v0.3.0)

😢

plot for check_model()

what about this? Any suggestion for different themes / color scales? You need latest performance and see from GitHub...

library(lme4)
#> Loading required package: Matrix
library(see)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(performance)

data(sleepstudy)
sleepstudy$mygrp <- sample(1:5, size = 180, replace = TRUE)
sleepstudy$mysubgrp <- NA
for (i in 1:5) {
  filter_group <- sleepstudy$mygrp == i
  sleepstudy$mysubgrp[filter_group] <-
    sample(1:30, size = sum(filter_group), replace = TRUE)
}

sleepstudy$x1 <- rnorm(nrow(sleepstudy))
sleepstudy$x2 <- runif(nrow(sleepstudy))

m <- lmer(
  Reaction ~ Days + x1 + x2 + (1 | mygrp / mysubgrp) + (1 | Subject),
  data = sleepstudy
)
#> boundary (singular) fit: see ?isSingular

check_model(m)

Created on 2019-06-03 by the reprex package (v0.3.0)

plots() and tags-argument

I would say tags and tags_labels is redundant. Why not directly setting labels via tags, and if tags = NULL. no tags are shown.

plot parameters of EFA fails with labels

It works like a charm without labels but fails with :(

library(see)
params <- parameters::model_parameters(psych::fa(mtcars, nfactors = 3, rotate = "none"))
plot(params)

params <- parameters::model_parameters(psych::fa(mtcars, nfactors = 3, rotate = "none"), labels = as.character(1:ncol(mtcars)))
plot(params)
#> Error in abs(.data$y): non-numeric argument to mathematical function

Created on 2019-11-18 by the reprex package (v0.3.0)

Option to add priors to posterior plots

Now that priors can easily be extracted as distributions using simulate_priors, I wonder if it would be tough to give the option to add them in the posterior plots (so you can see for all parameters, the prior and the posterior)...

Axis labels in README

@DominiqueMakowski and @strengejacke the X axis labels in figures 1 and 2 under Multiple Plots in the README look a bit off. Not sure where these are stored (presumably they are png or eps files...). Could be worth a look to correct for interested users.

plot fails for rope/hdi?

Not sure why this now happens, but when I run

library(rstanarm)
library(bayestestR)

model <- rstanarm::stan_glm(Sepal.Length ~ Petal.Width * Species, data = iris)
result <- rope(model, ci = c(0.9, 0.95))

plot(result, data = model, rope_color = "red") +
  theme_modern()

I get an error:

Error: Discrete value supplied to continuous scale

The y-value for the plot-data is not numeric, but "Posterior". I think the source of the error is somewhere here:
https://github.com/easystats/see/blob/master/R/plot.hdi.R#L10

@DominiqueMakowski any ideas?

check_model() does not work with updated check_outlier() function

library(performance)
library(see)

# select only mpg and disp (continuous)
mt1 <- mtcars[, c(1, 3, 4)]
# create some fake outliers and attach outliers to main df
mt2 <- rbind(mt1, data.frame(mpg = c(37, 40), disp = c(300, 400), hp = c(110, 120)))
# fit model with outliers
model <- lm(disp ~ mpg + hp, data = mt2)
check_model(model)

how_to_plot methods?

With the current structure, which makes the plotting function per se quite small and transparent (often consisting of just the ggplot chunk, as the bulk of the work is done in data_plot), it should be rather fairly easy to add a how_to_plot() method, that would actually show (print) the plot code. Then the user could copy this code and tweak it to his needs.

How does that sound?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.