easystats / performance Goto Github PK

View Code? Open in Web Editor NEW

948.0 25.0 88.0 645.99 MB

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)

Home Page: https://easystats.github.io/performance/

License: GNU General Public License v3.0

R 95.61% TeX 4.39%

r2 aic performance models loo r easystats mixed-models statistics hacktoberfest

performance's Introduction

performance

Test if your model is a good model!

A crucial aspect when building regression models is to evaluate the quality of modelfit. It is important to investigate how well models fit to the data and which fit indices to report. Functions to create diagnostic plots or to compute fit measures do exist, however, mostly spread over different packages. There is no unique and consistent approach to assess the model quality for different kind of models.

The primary goal of the performance package is to fill this gap and to provide utilities for computing indices of model quality and goodness of fit. These include measures like r-squared (R2), root mean squared error (RMSE) or intraclass correlation coefficient (ICC) , but also functions to check (mixed) models for overdispersion, zero-inflation, convergence or singularity.

Installation

The performance package is available on CRAN, while its latest development version is available on R-universe (from rOpenSci).

Type	Source	Command
Release	CRAN	`install.packages("performance")`
Development	R-universe	`install.packages("performance", repos = "https://easystats.r-universe.dev")`

Once you have downloaded the package, you can then load it using:

library("performance")

Tip

Instead of library(performance), use library(easystats). This will make all features of the easystats-ecosystem available.

To stay updated, use easystats::install_latest().

Citation

To cite performance in publications use:

citation("performance")
#> To cite package 'performance' in publications use:
#> 
#>   Lüdecke et al., (2021). performance: An R Package for Assessment, Comparison and
#>   Testing of Statistical Models. Journal of Open Source Software, 6(60), 3139.
#>   https://doi.org/10.21105/joss.03139
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {{performance}: An {R} Package for Assessment, Comparison and Testing of Statistical Models},
#>     author = {Daniel Lüdecke and Mattan S. Ben-Shachar and Indrajeet Patil and Philip Waggoner and Dominique Makowski},
#>     year = {2021},
#>     journal = {Journal of Open Source Software},
#>     volume = {6},
#>     number = {60},
#>     pages = {3139},
#>     doi = {10.21105/joss.03139},
#>   }

Documentation

There is a nice introduction into the package on youtube.

The performance workflow

Assessing model quality

R-squared

performance has a generic r2() function, which computes the r-squared for many different models, including mixed effects and Bayesian regression models.

r2() returns a list containing values related to the “most appropriate” r-squared for the given model.

model <- lm(mpg ~ wt + cyl, data = mtcars)
r2(model)
#> # R2 for Linear Regression
#>        R2: 0.830
#>   adj. R2: 0.819

model <- glm(am ~ wt + cyl, data = mtcars, family = binomial)
r2(model)
#> # R2 for Logistic Regression
#>   Tjur's R2: 0.705

library(MASS)
data(housing)
model <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
r2(model)
#>   Nagelkerke's R2: 0.108

The different R-squared measures can also be accessed directly via functions like r2_bayes(), r2_coxsnell() or r2_nagelkerke() (see a full list of functions here).

For mixed models, the conditional and marginal R-squared are returned. The marginal R-squared considers only the variance of the fixed effects and indicates how much of the model’s variance is explained by the fixed effects part only. The conditional R-squared takes both the fixed and random effects into account and indicates how much of the model’s variance is explained by the “complete” model.

For frequentist mixed models, r2() (resp. r2_nakagawa()) computes the mean random effect variances, thus r2() is also appropriate for mixed models with more complex random effects structures, like random slopes or nested random effects (Johnson 2014; Nakagawa, Johnson, and Schielzeth 2017).

set.seed(123)
library(rstanarm)

model <- stan_glmer(
  Petal.Length ~ Petal.Width + (1 | Species),
  data = iris,
  cores = 4
)

r2(model)
#> # Bayesian R2 with Compatibility Interval
#> 
#>   Conditional R2: 0.953 (95% CI [0.941, 0.963])
#>      Marginal R2: 0.823 (95% CI [0.710, 0.898])

library(lme4)
model <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
r2(model)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.799
#>      Marginal R2: 0.279

Intraclass Correlation Coefficient (ICC)

Similar to R-squared, the ICC provides information on the explained variance and can be interpreted as “the proportion of the variance explained by the grouping structure in the population” (Hox 2010).

icc() calculates the ICC for various mixed model objects, including stanreg models.

library(lme4)
model <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
icc(model)
#> # Intraclass Correlation Coefficient
#> 
#>     Adjusted ICC: 0.722
#>   Unadjusted ICC: 0.521

…and models of class brmsfit.

library(brms)
set.seed(123)
model <- brm(mpg ~ wt + (1 | cyl) + (1 + wt | gear), data = mtcars)

icc(model)
#> # Intraclass Correlation Coefficient
#> 
#>     Adjusted ICC: 0.930
#>   Unadjusted ICC: 0.771

Model diagnostics

Check for overdispersion

Overdispersion occurs when the observed variance in the data is higher than the expected variance from the model assumption (for Poisson, variance roughly equals the mean of an outcome). check_overdispersion() checks if a count model (including mixed models) is overdispersed or not.

library(glmmTMB)
data(Salamanders)
model <- glm(count ~ spp + mined, family = poisson, data = Salamanders)
check_overdispersion(model)
#> # Overdispersion test
#> 
#>        dispersion ratio =    2.946
#>   Pearson's Chi-Squared = 1873.710
#>                 p-value =  < 0.001

Overdispersion can be fixed by either modelling the dispersion parameter (not possible with all packages), or by choosing a different distributional family (like Quasi-Poisson, or negative binomial, see (Gelman and Hill 2007)).

Check for zero-inflation

Zero-inflation (in (Quasi-)Poisson models) is indicated when the amount of observed zeros is larger than the amount of predicted zeros, so the model is underfitting zeros. In such cases, it is recommended to use negative binomial or zero-inflated models.

Use check_zeroinflation() to check if zero-inflation is present in the fitted model.

model <- glm(count ~ spp + mined, family = poisson, data = Salamanders)
check_zeroinflation(model)
#> # Check for zero-inflation
#> 
#>    Observed zeros: 387
#>   Predicted zeros: 298
#>             Ratio: 0.77

Check for singular model fits

A “singular” model fit means that some dimensions of the variance-covariance matrix have been estimated as exactly zero. This often occurs for mixed models with overly complex random effects structures.

check_singularity() checks mixed models (of class lme, merMod, glmmTMB or MixMod) for singularity, and returns TRUE if the model fit is singular.

library(lme4)
data(sleepstudy)

# prepare data
set.seed(123)
sleepstudy$mygrp <- sample(1:5, size = 180, replace = TRUE)
sleepstudy$mysubgrp <- NA
for (i in 1:5) {
  filter_group <- sleepstudy$mygrp == i
  sleepstudy$mysubgrp[filter_group] <-
    sample(1:30, size = sum(filter_group), replace = TRUE)
}

# fit strange model
model <- lmer(
  Reaction ~ Days + (1 | mygrp / mysubgrp) + (1 | Subject),
  data = sleepstudy
)

check_singularity(model)
#> [1] TRUE

Remedies to cure issues with singular fits can be found here.

Check for heteroskedasticity

Linear models assume constant error variance (homoskedasticity).

The check_heteroscedasticity() functions assess if this assumption has been violated:

data(cars)
model <- lm(dist ~ speed, data = cars)

check_heteroscedasticity(model)
#> Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.031).

Comprehensive visualization of model checks

performance provides many functions to check model assumptions, like check_collinearity(), check_normality() or check_heteroscedasticity(). To get a comprehensive check, use check_model().

# defining a model
model <- lm(mpg ~ wt + am + gear + vs * cyl, data = mtcars)

# checking model assumptions
check_model(model)

Model performance summaries

model_performance() computes indices of model performance for regression models. Depending on the model object, typical indices might be r-squared, AIC, BIC, RMSE, ICC or LOOIC.

Linear model

m1 <- lm(mpg ~ wt + cyl, data = mtcars)
model_performance(m1)
#> # Indices of model performance
#> 
#> AIC     |    AICc |     BIC |    R2 | R2 (adj.) |  RMSE | Sigma
#> ---------------------------------------------------------------
#> 156.010 | 157.492 | 161.873 | 0.830 |     0.819 | 2.444 | 2.568

Logistic regression

m2 <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
model_performance(m2)
#> # Indices of model performance
#> 
#> AIC    |   AICc |    BIC | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log | Score_spherical |   PCP
#> -----------------------------------------------------------------------------------------------------
#> 31.298 | 32.155 | 35.695 |     0.478 | 0.359 | 1.000 |    0.395 |   -14.903 |           0.095 | 0.743

Linear mixed model

library(lme4)
m3 <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
model_performance(m3)
#> # Indices of model performance
#> 
#> AIC      |     AICc |      BIC | R2 (cond.) | R2 (marg.) |   ICC |   RMSE |  Sigma
#> ----------------------------------------------------------------------------------
#> 1755.628 | 1756.114 | 1774.786 |      0.799 |      0.279 | 0.722 | 23.438 | 25.592

Models comparison

The compare_performance() function can be used to compare the performance and quality of several models (including models of different types).

counts <- c(18, 17, 15, 20, 10, 20, 25, 13, 12)
outcome <- gl(3, 1, 9)
treatment <- gl(3, 3)
m4 <- glm(counts ~ outcome + treatment, family = poisson())

compare_performance(m1, m2, m3, m4, verbose = FALSE)
#> # Comparison of Model Performance Indices
#> 
#> Name |   Model |  AIC (weights) | AICc (weights) |  BIC (weights) |   RMSE |  Sigma | Score_log | Score_spherical |    R2 | R2 (adj.) | Tjur's R2 | Log_loss |   PCP | R2 (cond.) | R2 (marg.) |   ICC | Nagelkerke's R2
#> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#> m1   |      lm |  156.0 (<.001) |  157.5 (<.001) |  161.9 (<.001) |  2.444 |  2.568 |           |                 | 0.830 |     0.819 |           |          |       |            |            |       |                
#> m2   |     glm |   31.3 (>.999) |   32.2 (>.999) |   35.7 (>.999) |  0.359 |  1.000 |   -14.903 |           0.095 |       |           |     0.478 |    0.395 | 0.743 |            |            |       |                
#> m3   | lmerMod | 1764.0 (<.001) | 1764.5 (<.001) | 1783.1 (<.001) | 23.438 | 25.592 |           |                 |       |           |           |          |       |      0.799 |      0.279 | 0.722 |                
#> m4   |     glm |   56.8 (<.001) |   76.8 (<.001) |   57.7 (<.001) |  3.043 |  1.000 |    -2.598 |           0.324 |       |           |           |          |       |            |            |       |           0.657

General index of model performance

One can also easily compute and a composite index of model performance and sort the models from the best one to the worse.

compare_performance(m1, m2, m3, m4, rank = TRUE, verbose = FALSE)
#> # Comparison of Model Performance Indices
#> 
#> Name |   Model |   RMSE |  Sigma | AIC weights | AICc weights | BIC weights | Performance-Score
#> -----------------------------------------------------------------------------------------------
#> m2   |     glm |  0.359 |  1.000 |       1.000 |        1.000 |       1.000 |           100.00%
#> m4   |     glm |  3.043 |  1.000 |    2.96e-06 |     2.06e-10 |    1.63e-05 |            37.67%
#> m1   |      lm |  2.444 |  2.568 |    8.30e-28 |     6.07e-28 |    3.99e-28 |            36.92%
#> m3   | lmerMod | 23.438 | 25.592 |    0.00e+00 |     0.00e+00 |    0.00e+00 |             0.00%

Visualisation of indices of models’ performance

Finally, we provide convenient visualisation (the see package must be installed).

plot(compare_performance(m1, m2, m4, rank = TRUE, verbose = FALSE))

Testing models

test_performance() (and test_bf, its Bayesian sister) carries out the most relevant and appropriate tests based on the input (for instance, whether the models are nested or not).

set.seed(123)
data(iris)

lm1 <- lm(Sepal.Length ~ Species, data = iris)
lm2 <- lm(Sepal.Length ~ Species + Petal.Length, data = iris)
lm3 <- lm(Sepal.Length ~ Species * Sepal.Width, data = iris)
lm4 <- lm(Sepal.Length ~ Species * Sepal.Width + Petal.Length + Petal.Width, data = iris)

test_performance(lm1, lm2, lm3, lm4)
#> Name | Model |     BF | Omega2 | p (Omega2) |    LR | p (LR)
#> ------------------------------------------------------------
#> lm1  |    lm |        |        |            |       |       
#> lm2  |    lm | > 1000 |   0.69 |     < .001 | -6.25 | < .001
#> lm3  |    lm | > 1000 |   0.36 |     < .001 | -3.44 | < .001
#> lm4  |    lm | > 1000 |   0.73 |     < .001 | -7.77 | < .001
#> Each model is compared to lm1.

test_bf(lm1, lm2, lm3, lm4)
#> Bayes Factors for Model Comparison
#> 
#>       Model                                                    BF
#> [lm2] Species + Petal.Length                             3.45e+26
#> [lm3] Species * Sepal.Width                              4.69e+07
#> [lm4] Species * Sepal.Width + Petal.Length + Petal.Width 7.58e+29
#> 
#> * Against Denominator: [lm1] Species
#> *   Bayes Factor Type: BIC approximation

Plotting Functions

Plotting functions are available through the see package.

Code of Conduct

Please note that the performance project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contributing

We are happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features.

Please follow contributing guidelines mentioned here:

https://easystats.github.io/performance/CONTRIBUTING.html

References

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge ; New York: Cambridge University Press.

Hox, J. J. 2010. Multilevel Analysis: Techniques and Applications. 2nd ed. Quantitative Methodology Series. New York: Routledge.

Johnson, Paul C. D. 2014. “Extension of Nakagawa & Schielzeth’s R2 GLMM to Random Slopes Models.” Edited by Robert B. O’Hara. Methods in Ecology and Evolution 5 (9): 944–46.

Nakagawa, Shinichi, Paul C. D. Johnson, and Holger Schielzeth. 2017. “The Coefficient of Determination R2 and Intra-Class Correlation Coefficient from Generalized Linear Mixed-Effects Models Revisited and Expanded.” Journal of The Royal Society Interface 14 (134): 20170213.

performance's People

Stargazers

Watchers

Forkers

nemochina2008 sbaram1 guhjy gravitytrope pratikbarjatya gentheo letaylor karthy257 tteo watsonwoo lixiaopi1985 yadevi nfultz sergejhorvat vincentarelbundock wangjiaxuan666 yangfanpeng chencaf tguillerme rmasiniexpert bwiernik shulp2211 genomicsnx sabangroo owain-s king8w demetrio92 andthewings pablobernabeu paplomatasp yanliangs norberello rohitpandey13 nanaakwasiabayieboateng arrendi jmgirard wardiam stichsiempre yonideep jeancarloszambrano carlosurielr statunizaga dedenistiawan sayanmitra jluchman fanzheng0352 rnaimehaom duydn viv-analytics krassowski restevesd lisheng-kuang adrianfz litzycastro nancymor ulysses-yx alm87-stat vechrischang rekyt sgisela945 vhteran pharmanuel carlosrabazo jsgro anfemosa filipeaosantos mmelendezb juandalopez117 cardel han-tun majid-soheili leungrhy indenkun carrico-lab pablo-alberto allenlile wangyq199 ebukin jenniferlopes alexandreokano gcostaneto spread0x kgrojasc bbolker toncri consultfiv fenggefeifei

performance's Issues

revise model_performance.lm()?

Since r2.lm() now returns DoF etc. as attributes, we might need to revise model_performance.lm().

CRAN submission 0.2.0

Anything to do? I think we have a nice set of new functions and features, I would say, if at all, we could add a vignette (#47).

Any bayestestR must be submitted before.

check_assumptions()

Maybe we should think of a new "class" of functions, prefixed with check_*, made for... checking stuff for our models. This could be a way of regrouping them together and making them more discoverable. They would become check_overdispersion(), check_zeroinflation() etc.

Based on recent questions that I had from students, other possible functions (not necesserily implemented in performance) come to my mind, such as check_normality and check_homogeneity (of variances), and other functions that would facilitates the checks to see if it's suited for parametric analyses... which would be very useful for students.

Originally posted by @DominiqueMakowski in #24 (comment)

Updating `model_performance()`

Now that we have mse(), rmse() and rse(), we might include one of these measures in model_performance() as well, at least the RMSE I would say.

Maybe error_rate() also for model_performace()-glm?

check_outlier

@pdwaggoner @strengejacke
That's a great idea, a awesome implementation here and in see 🎉 .

Few minor things that came through my mind:

check_outlier -> check_outliers? 🤔 (plural)
Add distance type as a method argument (allowing for other types of distances, e.g. Mahalanobis etc)
This function could be extended to dataframes and single vectors, for which it could return the outliers based on a traditional threshold of SD (e.g., 1.96 SD etc)
Make the threshold explicit and changeable with a threshold argument (currently is it for Cook's distance (4 / n) or 4, for example). We could provide a "default" mode that would pick a reasonable default depending on the distance method.
how should we deal with Bayesian models? since they usually have their own measure (the pareto k) should we wrap around this method?

I can't wait to have vignettes and posts presenting all of these cool functions ☺️

Consistent return values for R2-functions

We must think about the behaviour of some r-squared functions...

Do we want r2() to return different values for lm? Currently, DF etc. are also returned.
Do we want named vectors to be returned or unnamed?
Do we want a list to be returned for r2_*()?

I would say "yes" to 2 and 3, but not sure about 1?

Naming of functions

@DominiqueMakowski Since we have several functions that compute the model quality or performance, maybe we can find a common prefix for those as well? E.g.

performance_rmse(), performance_epcp(), performance_logloss() etc., or quality_rmse(), quality_epcp(), quality_logloss().

This would be in line with other "bundles" like r2_*(). item_*() and check_*(). What would you say?

ICC docs

Minor point, currently in the ICC docs it says:

can be interpreted as \dQuote{the proportion of the variance explained by the grouping structure in the population} \cite{(Hox 2010: 15)}

Here the definition is:

The intraclass correlation ρ can also be interpreted as the expected correlation between two randomly drawn units that are in the same group. (Hox, 2010, p. 15)

Should we also add this second definition (if correct) to the docs?

ranking of models and indices differences

This is an extension of #28

It would indeed be interesting to either have a separate function (e.g., rank_models), or an argument in compare_models (e.g., show_ranks), to "rank" the models, i.e., displaying "the best" at the top, and then the following, which indices could be displayed in terms of difference with the best model.

This presents several challenges:

encoding somewhere the directionality of each index (for instance, for R2 the bigger the better, whereas for AIC the smaller the better).
finding some sort of unique index of "best" model, for instance, the ratio of higher indices (as these are not always congruent). For instance, model 2 can have 4 out of 6 indices placing it as the best.
dealing with edge cases: what to do in the case of equality, etc.
some indices can be compared through NH testing, resulting in a p-value. What to do in these cases?

Revise readme for release

Since we have no vignettes yet, I would like to revise the readme a bit, starting with a short "about" paragraph, then maybe sections on

r2 / icc
check-functions
item-functions
model_performance()

Add additional information as attributes

For:

(closing issues #12 and #10 in favor of this one)

We need:

a helper function, like r2_details() or icc_details() or similar
a class-attribute for all r2_*() and icc() return values, so we know which details we can expect

Vignettes

I think we could start by adding two vignettes, one "Different R2s" discussing the differences and application of the R2 indices. We already have this information in the docs so it's mainly a reformatting. A second one could be "Comparing models" As I feel that this feature will need a bit of highlighting :)

I'll try making drafts

equivalence-test for frequentist models

When reading this article, that encourages to think of CI as compatible intervals, and that there's always a difference (even from NULL) etc., I though about implementing a frequentist-equivalent to the Bayesian equivalence_test() we have in bayestestR.

rope_range() would be identical, so we can call bayestestR::rope_range()
rope() will differ, because we have no posterior and no HDI, but just the CI. But we don't really need this function, just internally we need to calculate how much of the CI is inside the rope-range. We could probably just check the proportion of overlap of rope in CI.
equivalence_test() will get .lm, .lmer, .glm generics etc., that actually all do the same thing: 1) determine rope-range, 2) compute the xx%-CI, 3) check overlap of CI within rope-range.

This will of cource not completely overcome the dichotomia-problem of significant vs. non-significant, but it shows which predictors might be more under discussion, because their status is "undecided".

Package Maintainer

We have not defined a maintainer for this package yet...
I think we both have made substantial contributions to this package. Regarding the other packages, where I would in most cases see you as "main-maintainer", I can take maintainance over for this package, if you like. So we can share some "workload".

check_collinearity for interaction terms

Great! it works now!
I know I am at risk of going off-topic but may I ask why the VIF is calculated only for the main effects terms and not for the interaction term? For example:


> library(car)
> library(performance)
> beta1<- 0.5
> beta2<- -2
> pred1<- runif(n=1000,0,1)
> pred2<- runif(n=1000,0,1)
> Int<- 1
> logE.y<- Int + pred1*beta1 + pred2*beta2
> E.y<- exp(logE.y)
> resp<- rpois(n=1000,lambda=E.y)
> dati<- data.frame(resp=resp,pred1=pred1,pred2=pred2)
> 
> glm1<- glm(resp~pred1*pred2,data=dati,poisson)
> vif(glm1)
      pred1       pred2 pred1:pred2 
   2.791190    4.807468    6.259886 
> check_collinearity(glm1)
# Check for Multicollinearity

Low Correlation

 Predictor  VIF Increased SE
     pred1 2.79         1.67
     pred2 4.81         2.19

Originally posted by @LincolnPetersen in #55 (comment)

R2 MLM

See strengejacke/sjstats#65

r2 lmer error for singular fit

it seems that r2_nakagawa fails when singular fit. Better would be to warn and return NA? Unfortunately, don't have any reproducible example at the moment.

R2 bayes "fixed" -> "marginal"

Should we rename the current R2_Bayes_fixed to R2_Bayes_marginal so it is more consistent with the frequentist mixed models?

Indices

Possible indices to add:

Add tests for release

Add tests for

r2 fails for intercept only models

performance::r2(insight::download_model("lm_0"))

 Error in stats::pf(out$F, out$DoF, out$DoF_residual, lower.tail = FALSE) : 
  Non-numeric argument to mathematical function

Should we return NA instead of throwing error?

Rename functions

overdispersion() -> check_overdispersion()
zerocount() -> check_zeroinflation()

Install fails with Error "object ‘has_intercept’ is not exported by 'namespace:insight'"

I tried to install performance, but the install is currently failing with the following error:

Downloading GitHub repo easystats/performance@master
✔  checking for file ‘/private/var/folders/[...]/easystats-performance-122d3d0/DESCRIPTION’ ...
─  preparing ‘performance’:
✔  checking DESCRIPTION meta-information ...
─  installing the package to process help pages
         -----------------------------------
─  installing *source* package ‘performance’ ...
   ** using staged installation
   ** R
   ** byte-compile and prepare package for lazy loading
   Error: object ‘has_intercept’ is not exported by 'namespace:insight'
   Execution halted
   ERROR: lazy loading failed for package ‘performance’
─  removing ‘/private/var/folders/[...]/performance’

Possible to some bug got into the process recently?
(FYI, I am on macOS 10.14.4, running R 3.6.0)

ICC

Currently, icc() works for the packages lme4, * glmmTMB* and rstanarm. brms calculates the ICC in a different way, based on a variance-decomposition. For non-Gaussian models, this approach is also recommended by Stan team members. For non-Bayesian models (lme4/glmmTMB), the ICC for non-Gaussian models is based on some approximations of the variances (which is not perfect, but apparently the current state of the art).

Questions:

Do we want rstanarm models behave like lme4 or brms? The latter would mean that non-Gaussian models fitted with rstanarm would return a more "Bayesian correct" ICC.
For brms, we can't use the variance-calculation we use for the r2_nagakawa(). We could only calculate a comparable ICC for brms models when they are Gaussian and only have one random intercept. Do we want to cover this "edge case" by implementing extra code particularly for these types of models, or do we always want the variance decomposition for brms?

Let me give a comparison, only for the same brms-model

model4 <- brms::brm(Petal.Length ~ Petal.Width + (1 | Species), data = iris)

# Variance decomposition
performance::icc(model4)
#> $ICC_decomposed
#> [1] 0.7477364
#> 
#> $CI
#>      2.5%     97.5% 
#> 0.8543512 0.6156847 

# ICC only for linear models, with only random intercept
sjstats::icc(model4)
#> ## Species
#>           ICC: 0.94  HDI 89%: [0.86  1.00]

If you would use the current icc() for a mixed model from rstanarm, you get the ICC based on Nagakawa, as it would be returned for lme4 etc. as well:

performance::icc(model2)
#> $ICC_adjusted
#> [1] 0.8786005
#> 
#> $ICC_conditional
#> [1] 0.5603696

So, method 1) (Variance Decomposition) gives an ICC of .75, method 2) (simple ICC for linear models with random intercept only) gives ans ICC of .94 while method 3) (Nagakawa ICC) returns .88 and .56.

We could use method 1) or 3) for brms linear models with random intercept only, where method 1) is below (or between those two ICC values) and method 3) is above the values of method 2)...

What values should be returned? We have the CI for brms-models, but only one "ICC" (which is not exactly an ICC, but something comparable, the variance decomposition in a Bayesian way...)
How do we call what icc() returns for brms models? Still ICC, and explain this in the docs of icc()?

Further performance metrics to add

This would definitely fit.

Other things that come to my mind when I think of the scope:

Other indices used in the structural equation field (this bunch of guys). As they are computed by default by lavaan, it would mostly consist of extractors consistent with the easyverse.
Convenience methods for PCAs / Factor Analysis, returning the % of variance explained.
I think that in general, and in the future, the easyverse (that is a thing now :) methods could be useful in the machine learning world, where people struggle with different models/packages. Providing a unifying syntax for extracting, interpreting and understanding their models could be quite appreciated. Although this will probably wait for the help of a future contributor, expert in this kind of things. But providing the tools to bridge the regression world with the ML world could be quite cool.

Originally posted by @DominiqueMakowski in #14 (comment)

Pseudo R2 for pscl::zeroinfl() and pscl::hurdle()

Moved from strengejacke/sjstats#49

@tbx200 Due to re-organization of packages, all "model-performance" related stuff goes now into this package...

Unable to install from GitHub.

Hi there. I've been unable to install this package from GitHub. Code and errors from the install follows:

expand

> check_compiler()
[1] TRUE
> has_rtools()
[1] TRUE
> devtools::install_github("easystats/performance")
Downloading GitHub repo easystats/performance@master
√  checking for file 'C:\Users\Anthony\AppData\Local\Temp\Rtmp8mqayi\remotes3b2441315bf\easystats-performance-0d7ed43/DESCRIPTION' ... 
-  preparing 'performance': (376ms)
√  checking DESCRIPTION meta-information ... 
-  installing the package to process help pages
         -----------------------------------
-  installing *source* package 'performance' ...
   ** using staged installation
   ** R
   ** byte-compile and prepare package for lazy loading
   Error: object 'area_under_curve' is not exported by 'namespace:bayestestR'
   Execution halted
   ERROR: lazy loading failed for package 'performance'
-  removing 'C:/Users/Anthony/AppData/Local/Temp/RtmpUFphiY/Rinst2b743ec856ec/performance'
         -----------------------------------
   ERROR: package installation failed
Error in (function (command = NULL, args = character(), error_on_status = TRUE,  : 
  System command error

Output from Sys.getenv():

expand

> Sys.getenv()
__COMPAT_LAYER              RunAsAdmin
AGSDESKTOPJAVA              F:\Program Files (x86)\ArcGIS\Desktop10.6\
ALLUSERSPROFILE             C:\ProgramData
APPDATA                     C:\Users\Anthony\AppData\Roaming
BINPREF                     D:/Rtools/mingw_$(WIN)/bin/
CLICOLOR_FORCE              1
CommonProgramFiles          C:\Program Files\Common Files
CommonProgramFiles(x86)     C:\Program Files (x86)\Common Files
CommonProgramW6432          C:\Program Files\Common Files
COMPUTERNAME                UNICRON
ComSpec                     C:\WINDOWS\system32\cmd.exe
developer tools             D:\Working Folder\util\platform-tools
DISPLAY                     :0
DriverData                  C:\Windows\System32\Drivers\DriverData
FP_NO_HOST_CHECK            NO
GDAL_DATA                   D:/R/R-3.6.0/library/sf/gdal
GFORTRAN_STDERR_UNIT        -1
GFORTRAN_STDOUT_UNIT        -1
HOME                        D:/Users/Anthony/Documents
HOMEDRIVE                   C:
HOMEPATH                    \Users\Anthony
LOCALAPPDATA                C:\Users\Anthony\AppData\Local
LOGONSERVER                 \\UNICRON
MSMPI_BIN                   C:\Program Files\Microsoft MPI\Bin\
MSYS2_ENV_CONV_EXCL         R_ARCH
NUMBER_OF_PROCESSORS        8
NVIDIAWHITELISTED           0x01
OneDrive                    C:\Users\Anthony\OneDrive
OS                          Windows_NT
PATH                        D:\R\R-3.6.0\bin\x64;D:\Rtools\bin;C:\Program
                            Files\Microsoft MPI\Bin\;C:\Program Files (x86)\Common
                            Files\Oracle\Java\javapath;C:\ProgramData\Oracle\Java\javapath;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;D:\Program
                            Files\Calibre2\;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
                            Files (x86)\QuickTime\QTSystem\;C:\Program Files
                            (x86)\NVIDIA
                            Corporation\PhysX\Common;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
                            Files (x86)\Windows Kits\10\Windows Performance
                            Toolkit\;C:\Windows\System32\OpenSSH\;C:\Program
                            Files\Intel\WiFi\bin\;C:\Program Files\Common
                            Files\Intel\WirelessCommon\;D:\Program Files\Microsoft
                            VS Code\bin;D:\Program Files\PuTTY\;C:\Program
                            Files\NVIDIA Corporation\NVIDIA NvDLISR;D:\Program
                            Files\SASHome\SASFoundation\9.4\ets\sasexe;D:\Program
                            Files\SASHome\Secure\ccme4;D:\Program
                            Files\SASHome\x86\Secure\ccme4;C:\Program Files\MiKTeX
                            2.9\miktex\bin\x64\;D:\Presence;D:\Rtools\mingw_64\bin;C:\Users\Anthony\AppData\Local\Microsoft\WindowsApps;E:\TDM-GCC-64\bin
PATHEXT                     .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PROCESSOR_ARCHITECTURE      AMD64
PROCESSOR_IDENTIFIER        Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
PROCESSOR_LEVEL             6
PROCESSOR_REVISION          3c03
ProgramData                 C:\ProgramData
ProgramFiles                C:\Program Files
ProgramFiles(x86)           C:\Program Files (x86)
ProgramW6432                C:\Program Files
PROJ_LIB                    D:/R/R-3.6.0/library/sf/proj
PSModulePath                C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules\
PUBLIC                      C:\Users\Public
QT_D3DCREATE_MULTITHREADED
                            1
R_ARCH                      /x64
R_COMPILED_BY               gcc 4.9.3
R_DOC_DIR                   D:/R/R-3.6.0/doc
R_HOME                      D:/R/R-3.6.0
R_LIBS_USER                 D:/Users/Anthony/Documents/R/win-library/3.6
R_PACKRAT_DEFAULT_LIBPATHS
                            D:/R/R-3.6.0/library
R_PACKRAT_SYSTEM_LIBRARY    D:/R/R-3.6.0/library
R_USER                      D:/Users/Anthony/Documents
RMARKDOWN_MATHJAX_PATH      D:/Program Files/RStudio/resources/mathjax-26
RS_LOCAL_PEER               \\.\pipe\28163-rsession
RS_RPOSTBACK_PATH           D:/Program Files/RStudio/bin/rpostback
RS_SHARED_SECRET            63341846741
RSTUDIO                     1
RSTUDIO_CONSOLE_COLOR       256
RSTUDIO_CONSOLE_WIDTH       80
RSTUDIO_MSYS_SSH            D:/Program Files/RStudio/bin/msys-ssh-1000-18
RSTUDIO_PANDOC              D:/Program Files/RStudio/bin/pandoc
RSTUDIO_SESSION_PORT        28163
RSTUDIO_USER_IDENTITY       Tony Kroeger
RSTUDIO_WINUTILS            D:/Program Files/RStudio/bin/winutils
SHIM_MCCOMPAT               0x810000001
SystemDrive                 C:
SystemRoot                  C:\WINDOWS
TEMP                        C:\Users\Anthony\AppData\Local\Temp
TERM                        xterm-256color
TMP                         C:\Users\Anthony\AppData\Local\Temp
USERDOMAIN                  UNICRON
USERDOMAIN_ROAMINGPROFILE   UNICRON
USERNAME                    Tony Kroeger
USERPROFILE                 C:\Users\Anthony
windir                      C:\WINDOWS

I've been able to install other packages from github previously without issue. TIA!

did we really...

... name that function r2_coxnell(), instead of r2_coxsnell()?

d'oh!

@DominiqueMakowski (just pinging, closing now...)

model_performance for Stan models

Looking at the blog-post, two things came to my mind.

Should we remove the CI from the R2-values? It really bloats the output.
Should we use a lighter theme for the code-chunks? Dark mode was cool for some months when Apple introduced it, but Apple is on the skids and no longer trend setting. I think light themes are better (at least for the code, the sidebar looks good!)

Fix links in `r2()`

https://easystats.github.io/performance/reference/r2.html

Thanks

R2 bayes: default to robust?

I think it would make sense to set robust to TRUE by default, since it's consistent with the remaining defaults used by in parameters, report bayestestR and such?

CRAN 0.2.0

@pdwaggoner @DominiqueMakowski waiting for your OK, then I'll submit.

r2-differences

Moved from strengejacke/sjstats#67 over here...

@hauselin due to the re-organization of packages, all "model-performance" related stuff will now be implemented in the performance package.

@DominiqueMakowski What do you think, can we make r2() let accept multiple model objects, and when the user passes multiple models, we can make an "anova"-like output? I.e. the r-squared values for all models, and an extra column indicating the difference(s)?

Workaround is_hurdle

Since insight::model_info() does not yet return $is_hurdle, this needs to be retrieved manually for this update and to be removed after CRAN update of insight.

Performance for logistic regression

Related to #1

they say (on wikipedia 😀):

"The Hosmer-Lemeshow test is for overall calibration error, not for any particular lack of fit such as quadratic effects. It does not properly take overfitting into account, is arbitrary to choice of bins and method of computing quantiles, and often has power that is too low."

"For these reasons the Hosmer-Lemeshow test is no longer recommended. Hosmer et al have a better one d.f. omnibus test of fit, implemented in the R rms package residuals.lrm function."

Other alternatives have been developed to address the limitations of the Hosmer-Lemeshow test. These include the Osius-Rojek test and the Stukel test, available in the R script AllGOFTests.R: www.chrisbilder.com/categorical/Chapter5/AllGOFTests.R

I don't really have any opinion, never used any of those...

Performance fails on rstanarm non "sampling" models

To be fixed once algorithm detection is implemented.

Marginal vs. conditional R2 for Bayesian models

It would be a great feature to have something for that. Options are:

Expand R2_nakagawa to work with Bayesian models
Adapt R2 MLM to work with Bayesian models
Create a function based on rstanarm::bayes_R2

For the last option, this thread could be relevant:

bayes_R2(model, re.form = NULL) vs bayes_R2(model, re.form = NA) is fine, but it shouldn’t be used to “test” a null hypothesis that the group-specific intercept and coefficients are irrelevant.

PCA to parameters?

Looking at the functions, I wonder if pca should be moved to parameters. As it is not technically related to performance, but rather used eventually to decrease the number of parameters of the model. Maybe it would be more discoverable there. What do you think?

check_distribution

Add more distributions:

negbin
...

check_collinearity

OK, thanks, I have managed to install the last version of performance by first installing (via github) bayestestR. Now the function check_collinearity() is available. However, I get an error message:


> check_collinearity(glmCompois)
Error in .subset2(x, i, exact = exact) : 
  attempt to select less than one element in get1index

Originally posted by @LincolnPetersen in glmmTMB/glmmTMB#473 (comment)

Complete docs

empty section \examples in check_overdispersion()
empty section \examples in check_zeroinflation()
empty section \value in rmse()

compare_models disfunction

I looked at the other issues and I think that this has not been raised. The function "compare_performance" shuffles the names and factors (AIC and so forth) around, so that the model that is first named in the function always stands in front of the lowest AIC.

Example:
mpd_1 <- lm(log(super_data$naturalized_proc+1)~super_data$mpd)
mpd_2 <- lm(log(super_data$naturalized_proc+1)~super_data$MPD.obs)
mpd_3 <- lm(log(super_data$naturalized_proc+1)~super_data$MPD.ses)

compare_performance(mpd_1,mpd_2,mpd_3)

output
name class AIC BIC R2 R2_adjusted RMSE
1 mpd_1 lm 2114.482 2127.295 0.15494980 0.1533463 1.775275
2 mpd_2 lm 2122.197 2135.010 0.14253483 0.1409078 1.788268
3 mpd_3 lm 2156.999 2169.812 0.08422591 0.0824882 1.848070

compare_performance(mpd_2,mpd_1,mpd_3)

output
name class AIC BIC R2 R2_adjusted RMSE
1 mpd_2 lm 2114.482 2127.295 0.15494980 0.1533463 1.775275
2 mpd_1 lm 2122.197 2135.010 0.14253483 0.1409078 1.788268
3 mpd_3 lm 2156.999 2169.812 0.08422591 0.0824882 1.848070

WAIC / DiC for Stan models

For model_performance()

Performance for ordinal models

Because in both, brms and rstanarm, the R^2 metric is not available for ordinal models (in contrast with elpd, loo or waic), model_performance(ordinal_model ) does not work.

Clean up my mess...

@DominiqueMakowski Due to this unfortunate setup of Github, cloud sync and two PCs for working at packages I tried to commit via webinterface, which completely messed up the file structure (with R files put in the root folder).

I now cloned the repo here, cleaned up everything and commited via GitHub app again (hope that my sync understands that it doesn't have revert the sync from GitHub after syncing from cloud at home...).

Please check the spelling, some of the performance_* functions have lower case, other upper case R.

btw, the referred paper (https://www.cambridge.org/core/services/aop-cambridge-core/content/view/92B052AADD9756C8BCC00527749E029D/S1047198700008524a.pdf/postestimation_uncertainty_in_limited_dependent_variable_models.pdf) also shows a way to implement this for multinomial models, we should keep this in mind... I did not fully understand yet how to do it, but it doesn't look that complicated.

Originally posted by @strengejacke in #38 (comment)

BF in compare_models

I am not really sure how, but it might be relevant to make compare_models compatible with bayesfactor_models