easystats / insight Goto Github PK

View Code? Open in Web Editor NEW

368.0 11.0 38.0 41.38 MB

:crystal_ball: Easy access to model information for various model objects

Home Page: https://easystats.github.io/insight/

License: GNU General Public License v3.0

R 100.00%

r insight predictors names random rstats easystats models hacktoberfest

insight's Introduction

insight

Gain insight into your models!

When fitting any statistical model, there are many useful pieces of information that are simultaneously calculated and stored beyond coefficient estimates and general model fit statistics. Although there exist some generic functions to obtain model information and data, many package-specific modelling functions do not provide such methods to allow users to access such valuable information.

insight is an R-package that fills this important gap by providing a suite of functions to support almost any model (see a list of the many models supported below in the List of Supported Packages and Models section). The goal of insight, then, is to provide tools to provide easy, intuitive, and consistent access to information contained in model objects. These tools aid applied research in virtually any field who fit, diagnose, and present statistical models by streamlining access to every aspect of many model objects via consistent syntax and output.

Installation

The insight package is available on CRAN, while its latest development version is available on R-universe (from rOpenSci) or GitHub.

Type	Source	Command
Release	CRAN	`install.packages("insight")`
Development	r-universe	`install.packages("insight", repos = "https://easystats.r-universe.dev")`
Development	GitHub	`remotes::install_github("easystats/insight")`

Once you have downloaded the package, you can then load it using:

library("insight")

Tip

Instead of library(insight), use library(easystats). This will make all features of the easystats-ecosystem available.

To stay updated, use easystats::install_latest().

Documentation

Built with non-programmers in mind, insight offers a broad toolbox for making model and data information easily accessible. While insight offers many useful functions for working with and understanding model objects (discussed below), we suggest users start with model_info(), as this function provides a clean and consistent overview of model objects (e.g., functional form of the model, the model family, link function, number of observations, variables included in the specification, etc.). With a clear understanding of the model introduced, users are able to adapt other functions for more nuanced exploration of and interaction with virtually any model object.Please visit https://easystats.github.io/insight/ for documentation.

Definition of Model Components

The functions from insight address different components of a model. In an effort to avoid confusion about specific “targets” of each function, in this section we provide a short explanation of insight’s definitions of regression model components.

Data

The dataset used to fit the model.

Parameters

Values estimated or learned from data that capture the relationship between variables. In regression models, these are usually referred to as coefficients.

Response and Predictors

response: the outcome or response variable (dependent variable) of a regression model.
predictor: independent variables of (the fixed part of) a regression model. For mixed models, variables that are only in the random effects part (i.e. grouping factors) of the model are not returned as predictors by default. However, these can be included using additional arguments in the function call, treating predictors are “unique”. As such, if a variable appears as a fixed effect and a random slope, it is treated as one (the same) predictor.

Variables

Any unique variable names that appear in a regression model, e.g., response variable, predictors or random effects. A “variable” only relates to the unique occurence of a term, or the term name. For instance, the expression x + poly(x, 2) has only the variable x.

Terms

Terms themselves consist of variable and factor names separated by operators, or involve arithmetic expressions. For instance, the expression x + poly(x, 2) has one variable x, but two terms x and poly(x, 2).

Random Effects

random slopes: variables that are specified as random slopes in a mixed effects model.
random or grouping factors: variables that are specified as grouping variables in a mixed effects model.

Aren’t the predictors, terms and parameters the same thing?

In some cases, yes. But not in all cases. Find out more by clicking here to access the documentation.

Functions

The package revolves around two key prefixes: get_* and find_*. The get_* prefix extracts values (or data) associated with model-specific objects (e.g., parameters or variables), while the find_* prefix lists model-specific objects (e.g., priors or predictors). These are powerful families of functions allowing for great flexibility in use, whether at a high, descriptive level (find_*) or narrower level of statistical inspection and reporting (get_*).

In total, the insight package includes 16 core functions: get_data(), get_priors(), get_variance(), get_parameters(), get_predictors(), get_random(), get_response(), find_algorithm(), find_formula(), find_variables(), find_terms(), find_parameters(), find_predictors(), find_random(), find_response(), and model_info(). In all cases, users must supply at a minimum, the name of the model fit object. In several functions, there are additional arguments that allow for more targeted returns of model information. For example, the find_terms() function’s effects argument allows for the extraction of “fixed effects” terms, “random effects” terms, or by default, “all” terms in the model object. We point users to the package documentation or the complementary package website, https://easystats.github.io/insight/, for a detailed list of the arguments associated with each function as well as the returned values from each function.

Examples of Use Cases in R

We now would like to provide examples of use cases of the insight package. These examples probably do not cover typical real-world problems, but serve as illustration of the core idea of this package: The unified interface to access model information. insight should help both users and package developers in order to reduce the hassle with the many exceptions from various modelling packages when accessing model information.

Making Predictions at Specific Values of a Term of Interest

Say, the goal is to make predictions for a certain term, holding remaining co-variates constant. This is achieved by calling predict() and feeding the newdata-argument with the values of the term of interest as well as the “constant” values for remaining co-variates. The functions get_data() and find_predictors() are used to get this information, which then can be used in the call to predict().

In this example, we fit a simple linear model, but it could be replaced by (m)any other models, so this approach is “universal” and applies to many different model objects.

library(insight)
m <- lm(
  Sepal.Length ~ Species + Petal.Width + Sepal.Width,
  data = iris
)

dat <- get_data(m)
pred <- find_predictors(m, flatten = TRUE)

l <- lapply(pred, function(x) {
  if (is.numeric(dat[[x]])) {
    mean(dat[[x]])
  } else {
    unique(dat[[x]])
  }
})

names(l) <- pred
l <- as.data.frame(l)

cbind(l, predictions = predict(m, newdata = l))
#>      Species Petal.Width Sepal.Width predictions
#> 1     setosa         1.2         3.1         5.1
#> 2 versicolor         1.2         3.1         6.1
#> 3  virginica         1.2         3.1         6.3

Printing Model Coefficients

The next example should emphasize the possibilities to generalize functions to many different model objects using insight. The aim is simply to print coefficients in a complete, human readable sentence.

The first approach uses the functions that are available for some, but obviously not for all models, to access the information about model coefficients.

print_params <- function(model) {
  paste0(
    "My parameters are ",
    toString(row.names(summary(model)$coefficients)),
    ", thank you for your attention!"
  )
}

m1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
print_params(m1)
#> [1] "My parameters are (Intercept), Petal.Width, thank you for your attention!"

# obviously, something is missing in the output
m2 <- mgcv::gam(Sepal.Length ~ Petal.Width + s(Petal.Length), data = iris)
print_params(m2)
#> [1] "My parameters are , thank you for your attention!"

As we can see, the function fails for gam-models. As the access to models depends on the type of the model in the R ecosystem, we would need to create specific functions for all models types. With insight, users can write a function without having to worry about the model type.

print_params <- function(model) {
  paste0(
    "My parameters are ",
    toString(insight::find_parameters(model, flatten = TRUE)),
    ", thank you for your attention!"
  )
}

m1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
print_params(m1)
#> [1] "My parameters are (Intercept), Petal.Width, thank you for your attention!"

m2 <- mgcv::gam(Sepal.Length ~ Petal.Width + s(Petal.Length), data = iris)
print_params(m2)
#> [1] "My parameters are (Intercept), Petal.Width, s(Petal.Length), thank you for your attention!"

Contributing and Support

In case you want to file an issue or contribute in another way to the package, please follow this guide. For questions about the functionality, you may either contact us via email or also file an issue.

List of Supported Models by Class

Currently, 226 model classes are supported.

supported_models()
#>   [1] "aareg"                   "afex_aov"               
#>   [3] "AKP"                     "Anova.mlm"              
#>   [5] "anova.rms"               "aov"                    
#>   [7] "aovlist"                 "Arima"                  
#>   [9] "averaging"               "bamlss"                 
#>  [11] "bamlss.frame"            "bayesQR"                
#>  [13] "bayesx"                  "BBmm"                   
#>  [15] "BBreg"                   "bcplm"                  
#>  [17] "betamfx"                 "betaor"                 
#>  [19] "betareg"                 "BFBayesFactor"          
#>  [21] "bfsl"                    "BGGM"                   
#>  [23] "bife"                    "bifeAPEs"               
#>  [25] "bigglm"                  "biglm"                  
#>  [27] "blavaan"                 "blrm"                   
#>  [29] "bracl"                   "brglm"                  
#>  [31] "brmsfit"                 "brmultinom"             
#>  [33] "btergm"                  "censReg"                
#>  [35] "cgam"                    "cgamm"                  
#>  [37] "cglm"                    "clm"                    
#>  [39] "clm2"                    "clmm"                   
#>  [41] "clmm2"                   "clogit"                 
#>  [43] "coeftest"                "complmrob"              
#>  [45] "confusionMatrix"         "coxme"                  
#>  [47] "coxph"                   "coxph.penal"            
#>  [49] "coxr"                    "cpglm"                  
#>  [51] "cpglmm"                  "crch"                   
#>  [53] "crq"                     "crqs"                   
#>  [55] "crr"                     "dep.effect"             
#>  [57] "DirichletRegModel"       "draws"                  
#>  [59] "drc"                     "eglm"                   
#>  [61] "elm"                     "epi.2by2"               
#>  [63] "ergm"                    "feglm"                  
#>  [65] "feis"                    "felm"                   
#>  [67] "fitdistr"                "fixest"                 
#>  [69] "flac"                    "flexsurvreg"            
#>  [71] "flic"                    "gam"                    
#>  [73] "Gam"                     "gamlss"                 
#>  [75] "gamm"                    "gamm4"                  
#>  [77] "garch"                   "gbm"                    
#>  [79] "gee"                     "geeglm"                 
#>  [81] "glht"                    "glimML"                 
#>  [83] "glm"                     "Glm"                    
#>  [85] "glmm"                    "glmmadmb"               
#>  [87] "glmmPQL"                 "glmmTMB"                
#>  [89] "glmrob"                  "glmRob"                 
#>  [91] "glmx"                    "gls"                    
#>  [93] "gmnl"                    "hglm"                   
#>  [95] "HLfit"                   "htest"                  
#>  [97] "hurdle"                  "iv_robust"              
#>  [99] "ivFixed"                 "ivprobit"               
#> [101] "ivreg"                   "lavaan"                 
#> [103] "lm"                      "lm_robust"              
#> [105] "lme"                     "lmerMod"                
#> [107] "lmerModLmerTest"         "lmodel2"                
#> [109] "lmrob"                   "lmRob"                  
#> [111] "logistf"                 "logitmfx"               
#> [113] "logitor"                 "logitr"                 
#> [115] "LORgee"                  "lqm"                    
#> [117] "lqmm"                    "lrm"                    
#> [119] "manova"                  "MANOVA"                 
#> [121] "marginaleffects"         "marginaleffects.summary"
#> [123] "margins"                 "maxLik"                 
#> [125] "mblogit"                 "mclogit"                
#> [127] "mcmc"                    "mcmc.list"              
#> [129] "MCMCglmm"                "mcp1"                   
#> [131] "mcp12"                   "mcp2"                   
#> [133] "med1way"                 "mediate"                
#> [135] "merMod"                  "merModList"             
#> [137] "meta_bma"                "meta_fixed"             
#> [139] "meta_random"             "metaplus"               
#> [141] "mhurdle"                 "mipo"                   
#> [143] "mira"                    "mixed"                  
#> [145] "MixMod"                  "mixor"                  
#> [147] "mjoint"                  "mle"                    
#> [149] "mle2"                    "mlm"                    
#> [151] "mlogit"                  "mmclogit"               
#> [153] "mmlogit"                 "mmrm"                   
#> [155] "mmrm_fit"                "mmrm_tmb"               
#> [157] "model_fit"               "multinom"               
#> [159] "mvord"                   "negbinirr"              
#> [161] "negbinmfx"               "nestedLogit"            
#> [163] "ols"                     "onesampb"               
#> [165] "orm"                     "pgmm"                   
#> [167] "phyloglm"                "phylolm"                
#> [169] "plm"                     "PMCMR"                  
#> [171] "poissonirr"              "poissonmfx"             
#> [173] "polr"                    "probitmfx"              
#> [175] "psm"                     "Rchoice"                
#> [177] "ridgelm"                 "riskRegression"         
#> [179] "rjags"                   "rlm"                    
#> [181] "rlmerMod"                "RM"                     
#> [183] "rma"                     "rma.uni"                
#> [185] "robmixglm"               "robtab"                 
#> [187] "rq"                      "rqs"                    
#> [189] "rqss"                    "rvar"                   
#> [191] "Sarlm"                   "scam"                   
#> [193] "selection"               "sem"                    
#> [195] "SemiParBIV"              "semLm"                  
#> [197] "semLme"                  "serp"                   
#> [199] "slm"                     "speedglm"               
#> [201] "speedlm"                 "stanfit"                
#> [203] "stanmvreg"               "stanreg"                
#> [205] "summary.lm"              "survfit"                
#> [207] "survreg"                 "svy_vglm"               
#> [209] "svychisq"                "svyglm"                 
#> [211] "svyolr"                  "t1way"                  
#> [213] "tobit"                   "trimcibt"               
#> [215] "truncreg"                "vgam"                   
#> [217] "vglm"                    "wbgee"                  
#> [219] "wblm"                    "wbm"                    
#> [221] "wmcpAKP"                 "yuen"                   
#> [223] "yuend"                   "zcpglm"                 
#> [225] "zeroinfl"                "zerotrunc"

Didn’t find a model? File an issue and request additional model-support in insight!

Citation

If this package helped you, please consider citing as follows:

Lüdecke D, Waggoner P, Makowski D. insight: A Unified Interface to Access Information from Model Objects in R. Journal of Open Source Software 2019;4:1412. doi: 10.21105/joss.01412

Code of Conduct

Please note that the insight project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

insight's People

Contributors

Stargazers

Watchers

insight's Issues

Get correlation component from gls?

library(insight)
library(nlme)

m <- gls(
  follicles ~ sin(2*pi*Time) + 
    cos(2*pi*Time), Ovary, 
  correlation = corAR1(form = ~ 1 | Mare)
)

find_formula(m)
#> $conditional
#> follicles ~ sin(2 * pi * Time) + cos(2 * pi * Time)
#> <environment: 0x00000000195dffa0>

m$call$model
#> follicles ~ sin(2 * pi * Time) + cos(2 * pi * Time)
m$call$correlation
#> corAR1(form = ~1 | Mare)

^{Created on 2019-02-04 by the reprex package (v0.2.1)}

Check documentation and DESCRIPTION file

We should check whether docs could be revised (wording, examples) and revise the DESCRIPTION file (especially the "Description" section).

How would you consider yourself in terms of contribution? I see myself as main package maintainer. You can add yourself as contributor.

Take random effects in gamm and gamm4 into account

Let me give an example:

Here's the model fit:

gamm4(y~s(x0)+x1+s(x2),data=dat,random=~(1|fac))

and here's the formula for the random part:

formula(br$mer)
#> y.0 ~ X - 1 + (1 | Xr) + (1 | Xr.0) + (1 | fac)

I'm not sure how to extract the information properly, and what exactly is needed here. Xr were not specified in the random effects formula.

For gamm4, we maybe could work with getME(br$mer, "flist").

link_inverse should return list only for mv models

To be consistent with other functions, link_inverse() should return a list for, say, stanmvreg models.

link_function() should return list

Related to this one:
#34

In general, we could get rid of the mv_response argument.

Name for function that returns "raw" terms?

I would like to add a function similar to find_terms(), however, terms should be returned "as is", i.e. as they were used in the formula. Do you have any clever name for this function?

library(insight)
m <- lm(mpg ~ I(cyl^2) + log(hp), data = mtcars)

find_terms(m)
#> $response
#> [1] "mpg"
#> 
#> $conditional
#> [1] "cyl" "hp"

find_???(m)
#> $response
#> [1] "mpg"
#> 
#> $conditional
#> [1] "I(cyl^2)" "log(hp)"

htest support

This is a follow up to discuss the htest case.

find_parameters and multivariate response models

Should we return lists for each formula? Current behaviour:

require("insight")
require("rstanarm")

data("pbcLong")
m1 <- stan_mvmer(
  formula = list(
    logBili ~ year + (1 | id),
    albumin ~ sex + year + (year | id)),
  data = pbcLong,
  chains = 1, cores = 1, seed = 12345, iter = 1000
)

find_parameters(m1)
#> $conditional
#> [1] "y1|(Intercept)" "y2|(Intercept)" "y1|year"        "y2|sexf"       
#> [5] "y2|year"        "y1|sigma"       "y2|sigma"      
#> 
#> $random
#>   [1] "b[y1|(Intercept) id:1]"  "b[y2|(Intercept) id:1]" 
#>   [3] "b[y2|year id:1]"         "b[y1|(Intercept) id:2]" 
#>   [5] "b[y2|(Intercept) id:2]"  "b[y2|year id:2]"        
#>   [7] "b[y1|(Intercept) id:3]"  "b[y2|(Intercept) id:3]" 
#>   [9] "b[y2|year id:3]"         "b[y1|(Intercept) id:4]" 
#>  [11] "b[y2|(Intercept) id:4]"  "b[y2|year id:4]"        
#>  [13] "b[y1|(Intercept) id:5]"  "b[y2|(Intercept) id:5]" 
#>  [15] "b[y2|year id:5]"         "b[y1|(Intercept) id:6]" 
#>  [17] "b[y2|(Intercept) id:6]"  "b[y2|year id:6]"        
#>  [19] "b[y1|(Intercept) id:7]"  "b[y2|(Intercept) id:7]" 
#>  [21] "b[y2|year id:7]"         "b[y1|(Intercept) id:8]" 
#>  [23] "b[y2|(Intercept) id:8]"  "b[y2|year id:8]"        
#>  [25] "b[y1|(Intercept) id:9]"  "b[y2|(Intercept) id:9]" 
#>  [27] "b[y2|year id:9]"         "b[y1|(Intercept) id:10]"
#>  [29] "b[y2|(Intercept) id:10]" "b[y2|year id:10]"       
#>  [31] "b[y1|(Intercept) id:11]" "b[y2|(Intercept) id:11]"
#>  [33] "b[y2|year id:11]"        "b[y1|(Intercept) id:12]"
#>  [35] "b[y2|(Intercept) id:12]" "b[y2|year id:12]"       
#>  [37] "b[y1|(Intercept) id:13]" "b[y2|(Intercept) id:13]"
#>  [39] "b[y2|year id:13]"        "b[y1|(Intercept) id:14]"
#>  [41] "b[y2|(Intercept) id:14]" "b[y2|year id:14]"       
#>  [43] "b[y1|(Intercept) id:15]" "b[y2|(Intercept) id:15]"
#>  [45] "b[y2|year id:15]"        "b[y1|(Intercept) id:16]"
#>  [47] "b[y2|(Intercept) id:16]" "b[y2|year id:16]"       
#>  [49] "b[y1|(Intercept) id:17]" "b[y2|(Intercept) id:17]"
#>  [51] "b[y2|year id:17]"        "b[y1|(Intercept) id:18]"
#>  [53] "b[y2|(Intercept) id:18]" "b[y2|year id:18]"       
#>  [55] "b[y1|(Intercept) id:19]" "b[y2|(Intercept) id:19]"
#>  [57] "b[y2|year id:19]"        "b[y1|(Intercept) id:20]"
#>  [59] "b[y2|(Intercept) id:20]" "b[y2|year id:20]"       
#>  [61] "b[y1|(Intercept) id:21]" "b[y2|(Intercept) id:21]"
#>  [63] "b[y2|year id:21]"        "b[y1|(Intercept) id:22]"
#>  [65] "b[y2|(Intercept) id:22]" "b[y2|year id:22]"       
#>  [67] "b[y1|(Intercept) id:23]" "b[y2|(Intercept) id:23]"
#>  [69] "b[y2|year id:23]"        "b[y1|(Intercept) id:24]"
#>  [71] "b[y2|(Intercept) id:24]" "b[y2|year id:24]"       
#>  [73] "b[y1|(Intercept) id:25]" "b[y2|(Intercept) id:25]"
#>  [75] "b[y2|year id:25]"        "b[y1|(Intercept) id:26]"
#>  [77] "b[y2|(Intercept) id:26]" "b[y2|year id:26]"       
#>  [79] "b[y1|(Intercept) id:27]" "b[y2|(Intercept) id:27]"
#>  [81] "b[y2|year id:27]"        "b[y1|(Intercept) id:28]"
#>  [83] "b[y2|(Intercept) id:28]" "b[y2|year id:28]"       
#>  [85] "b[y1|(Intercept) id:29]" "b[y2|(Intercept) id:29]"
#>  [87] "b[y2|year id:29]"        "b[y1|(Intercept) id:30]"
#>  [89] "b[y2|(Intercept) id:30]" "b[y2|year id:30]"       
#>  [91] "b[y1|(Intercept) id:31]" "b[y2|(Intercept) id:31]"
#>  [93] "b[y2|year id:31]"        "b[y1|(Intercept) id:32]"
#>  [95] "b[y2|(Intercept) id:32]" "b[y2|year id:32]"       
#>  [97] "b[y1|(Intercept) id:33]" "b[y2|(Intercept) id:33]"
#>  [99] "b[y2|year id:33]"        "b[y1|(Intercept) id:34]"
#> [101] "b[y2|(Intercept) id:34]" "b[y2|year id:34]"       
#> [103] "b[y1|(Intercept) id:35]" "b[y2|(Intercept) id:35]"
#> [105] "b[y2|year id:35]"        "b[y1|(Intercept) id:36]"
#> [107] "b[y2|(Intercept) id:36]" "b[y2|year id:36]"       
#> [109] "b[y1|(Intercept) id:37]" "b[y2|(Intercept) id:37]"
#> [111] "b[y2|year id:37]"        "b[y1|(Intercept) id:38]"
#> [113] "b[y2|(Intercept) id:38]" "b[y2|year id:38]"       
#> [115] "b[y1|(Intercept) id:39]" "b[y2|(Intercept) id:39]"
#> [117] "b[y2|year id:39]"        "b[y1|(Intercept) id:40]"
#> [119] "b[y2|(Intercept) id:40]" "b[y2|year id:40]"

We could split up this list into two lists with y1 and y2. Same would apply to multivariate response models for brms. Currently, everything works fine, it's rather a decision choice what we want.

Check if functions work with brms

Since I have revised the functions quite a lot compared to their sjstats origin, it might be helpful to do intensive tests for brms-models, as brms offers the most flexibility (hence, many exceptions and pitfalls).

For tidy_stan(), I have set up various models, see here:
http://rpubs.com/sjPlot/tidy_stan

Would be great if insight functions correctly return the expected values for those models.

CRAN badges

now that it's on CRAN, we could add CRAN badges on the README.

I would say that the version badge and the downloads / months is sufficient?

replace zi by component

functions like model_predictors(), that have a zi-argument, could get a component-argument instead, so the user can either return values excluding zi-part, only zi part or from both conditional and zi-part of a model.

colour tools

print_colour(): should we switch the order of arguments to allow the use of piping ("I am blue, dabadi dabada" %>% print_colour("blue"))

model_info: nobs and link_fun

Started working on the formatting function for model type names. In model_info(), $nobs and $link_fun appeared as salient, as I would have expected $n_obs and $link_function. Do you want me to help to change it or do you think it's too much and we should leave it like that ^^?

License

Naive question; what motivated the choice of the license?

CRAN submission

Ok, what would be the remaining goals before we initially submit to CRAN. beside some open issues like #4, #5, #17 and #19?

I would like to submit to CRAN soon, within this month probably, unless we have more major functionality to add, and go continue working on the next pkg, e.g. bayestestR or report. And I would start using insight in my other packages for the next release cycle...

passing CRAN checks
docs are fine

model_info: is_mixed

It would be convenient to have an is_mixed in the output. Should be easy to implement (test for a random part in the formula)?

get_variance names

I might be (too) picky, but it would be maybe better from an autocompletion / function grouping perspective, to have: get_variance_intercept, get_variance_slope etc. instead of get_intercept_variance and get_slope_variance. I understand that these functions, although being exported, are mainly thought to be used by get_variances, but still :)

Also, get_variances in plural form (with the s), although making total sense, and although we have precedent functions using the plural (get_priors, get_parameters...) sounds a bit uncommon to my ear. Is it because we do say "get the variance of these components" without using the plural?

Request for h2o model support

This is a very helpful package--thanks for making it! I use the h2o package a lot for my machine learning models. Could the models associated with that package--especially h2o::h2o.automl--be integrated into insight? Thanks!

model_terms() should return a list

See #1 (comment)

And maybe get a flatten-argument to mimic current behaviour.

Datasets

Do you think we should keep these datasets in the package?

Missing tests

Following model-objects could be tested as well, but I'm not sure when we should add those tests (maybe later, 0.2 or so?)

logo

what about the hexsticker with an image representing some kind of unruly monster (with written "model" on it) that is "opened" by a knight or a weapon or something (with written "insight") on it? Or is this metaphor too far-fetched haha?

model information

I suggest we start with the lowest level packages. If I understood it, a package for accessing model information (name needed, insight or inside?) would encompass sjstats functions like model_frame() or pred_vars() etc?

Do we want to use those names or reimplement the functions from scratch, using the code base, but new names and reducing dependencies? I would say the latter, maybe we can come up with new names then...

readme

Is there a reason why we have two readme files? And I just saw that citation("insight") returns only my name. Is that OK for you? I thought you could probably add your name as well...

Coulour_tools.R in insight

BayestestR, report and I believe other packages might want to use these colouring tools. As such, instead of repeating them in all packages maybe we should add them here...?

function for type of model

I am thinking about a function that would return the model type in a clear and "human" way. For instance, "linear model", "logistic model", "probit model", "Bayesian mixed logistic model" etc.,

There could be a variant (find_model(fit, short=FALSE)) that returns the "full" name in a consistent manner, e.g. "general linear model (poisson family with a log link)", "general linear mixed model (gamma family with an inverse link)" and a variant (short=TRUE) that returns the short names, i.e., "logistic model" instead of "general linear model (binomial family with a logit link)".

Besides its use in the future report package, it could be useful for users to add titles to plot, captions to table and so on. However, I think it has its place here, in the insight package. What do you think?

find_ (get_?) algorithm

Although the fitting algorithm plays an important role, it is often unreported/uncared about. Surprisingly, its access is not really straightforward.

What do you think about a function that does that?

Here's a draft:

#' @export
find_algorithm <- function(model, ...) {
  UseMethod("find_algorithm")
}


#' @export
find_algorithm.merMod <- function(model) {
  if(model@resp$REML == 0){
    algorithm <- "ML"
  } else{
    algorithm <- "REML"
  }

  out <- list(
    "algorithm" = algorithm,
    "optimizer" = as.character(model@optinfo$optimizer)
  )

  return(out)
  }



#' @export
find_algorithm.stanreg <- function(model) {

  info <- model$stanfit@sim

  out <- list(
    "algorithm" = model$algorithm,
    "chains" = info$chains,
    "iterations" = info$iter,
    "warmup" = info$warmup
  )

  return(out)
}

brms 2.8.0

"Introduce mvbind to eventually replace cbind in the formula syntax of multivariate models."

Might affect find_response() / get_response(), is_multivariate() etc.

Systematic testing

This night I was thinking about a more systematic and convenient way of testing our functions, and how it was tedious to come up with particular models, fitting them every time in tests and so on.

Then I had the idea of the circus package, basically a repo that would be used only for storing all varieties and particular cases of models. We could just download this package in testing, and run the tests on all the models to see if the functions behave as expected. This could also help with the time-related issues for testing Bayesian models, as the fitting would already be done. We could quickly cover all the models, fitted once and for all.

Do you think this would work?

Check for subset where we get data with eval

Everywhere where we try to get the data from the environment with eval(x$call...) we should also check if there's a x$subset as well. We need then to subset the data from the eval-call.

Vignettes

I just discovered that adding new models needs indeed a bit more detailed description, so I would rather like to include vignettes in milestone 0.2

find_ and get_parameters

Re-writing some code in bayestestR, I realised it would be quite convenient to have another consistent stable extraction function, related to the parameters:

find_parameters <- function(model) {
  return(names(coef(model)))
}

get_parameters <- function(model) {
  return(as.data.frame(model)[find_parameters(model)])
}

For Bayesian models, it would return the posterior of the model's parameters. However, for frequentist models, I am still unsure whether it should return only the betas or other metrics (such as SE). I am leaning toward the first option.

One of the purposes of this function would be to unify the different coef, fixef options that work differently (or does not work at all) depending on the model. For instance brms errors when using coef in favour of fixef. However, using it need in turn importing packages (brmsn or lme4), which we would like to avoid...

PS: Also, side question, what are the reasons to remove the explicit return() at the end of functions?

Using ENHANCE in DESCRIPTION?

Since we enhance existing model classes with new functions, we could (additionally?) add the related packages to the ENHANCE field, if we like. Would you say that this makes sense?

Check if find-functions need to clean variable names

Ropensci and JOSS

What about submitting this to JOSS or Ropensci + JOSS?

Run tests only locally?

For this package, I suggest using comprehensive tests to check the function with the many different modelling packages. However, this would require adding those packages to SUGGEST, which foils our approach to keep dependencies at a minimum.

Should we add tests to Rbuildignore, so tests are only run locally, but not on CRAN?

anova support

note for future, we should think about aov/anova support

Function behaviour

Please check this "demo":
http://rpubs.com/sjPlot/insight-glmmtmb

What should be the defaults? Lists or character vectors as return value?
What should be the defaults: All values, conditional model only etc. (so what would be useful defaults for effects and component)?

We have to check all functions carefully and make sure that not accidentally a value is returned where no value is to be expected (e.g. no "zi"-component for linear (mixed) models).

gam parameters: what to include?

rstanarm, when fitting gamm4 like models, returns a number of "parameters" related to the smooth term (in the form s(smoothterm).1...9, and I am wondering whether to include these K components into the "parameters" / predictors list. This is also the case for mgcv, for which find_parameters returns smooth parameters whereas the summary has a "general" (more interesting for users) smooth parameter.

> insight::find_parameters(mgcv::gam(Petal.Length ~ s(Petal.Width), data=iris))
$`conditional`
 [1] "(Intercept)"      "s(Petal.Width).1" "s(Petal.Width).2" "s(Petal.Width).3" "s(Petal.Width).4"
 [6] "s(Petal.Width).5" "s(Petal.Width).6" "s(Petal.Width).7" "s(Petal.Width).8" "s(Petal.Width).9"

R-version dependency

I just saw that I simply copied the dependency R >= 3.2 - but is this really necessary? Can't we just reduce that dependency even further?

Functions list

This list of functions is becoming neat and consistent ☺️

I was just wondering about these two options:

Rename link_fun for link_function
Add info from find_terms in model_info(); this last function would be a shortcut for finding all things this package provide. Sauron's One Ring 💍

Tests

Ok, I list some model-functions or package names for which I think we still need tests (or add more tests), just to not forget anything:

Any important models I forgot or you would like to add?

Check "component"-argument for hurdle and zeroinfl packages

return constants in terms?

library(insight)
library(nlme)

data(Ovary)
m1 <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary,
          correlation = corAR1(form = ~ 1 | Mare))

find_terms(m1)
#> $response
#> [1] "follicles"
#> 
#> $conditional
#> [1] "pi"   "Time"

Should pi be returned or not?

get_priors

In the same vein, I have struggled in the past with clean priors extraction. A function to do that would be a nice and useful addition. I started drafting one for stan, but it has to be thoroughly tested and adjusted to potential edge-cases.

#' @export
get_priors <- function(model, ...) {
  UseMethod("get_priors")
}

#' @export
get_priors.stanreg <- function(model) {
  info <- rstanarm::prior_summary(model)

  info <- model$prior.info

  # Intercept
  df <- .priors_to_df(info$prior_intercept)
  df$parameter <- "(Intercept)"

  # Priors
  priors <- .priors_to_df(info$prior)
  priors$parameter <- tail(insight::find_parameters(model)$conditional, -1)
  df <- rbind(df, priors[names(priors) %in% names(df)])

  # Aux
  aux <- .priors_to_df(info$prior_aux)
  aux$parameter <- aux$aux_name
  df <- rbind(df, aux[names(aux) %in% names(df)])



  df <- df[c("parameter", names(df)[names(df) != "parameter"])]
  names(df) <- gsub("dist", "distribution", names(df))
  names(df) <- gsub("df", "DoF", names(df))
  return(df)
}

#' @keywords internal
.priors_to_df <- function(priors){
  max_length <- max(sapply(priors, length))
  for(i in names(priors)){
    if(length(priors[[i]]) < max_length){
      if(is.null(priors[[i]])){
        priors[[i]] <- NA
      }
      priors[[i]] <- rep_len(priors[[i]], max_length)
    }
  }
  if(max_length == 1){
    priors <- as.data.frame(t(sapply(priors, c)))
  }else{
    priors <- as.data.frame(sapply(priors, c))
  }

  return(priors)
}

model <- rstanarm::stan_glm(Sepal.Width ~ Species * Petal.Length, data=iris)
get_priors(model)

parameter	distribution	location	scale	adjusted_scale	DoF
(Intercept)	normal	0	10	4.35866284936698	NA
Speciesversicolor	normal	0	2.5	1.08966571234175	NA
Speciesvirginica	normal	0	2.5	1.08966571234175	NA
Petal.Length	normal	0	2.5	0.617270040728345	NA
Speciesversicolor:Petal.Length	normal	0	2.5	0.536028320794122	NA
Speciesvirginica:Petal.Length	normal	0	2.5	0.411970489526739	NA
sigma	exponential	NA	NA	0.435866284936698	NA

get_variances()

The internals for r2() in pkg performance could go into this package, as official exported function. The different variance-components are useful in different contexts, so it might fit into this pkg.

find_parameters() for Stan-models should remove more parameters by default

We could think about removing some more parameters by default for Stan-models:

grep("^(prior_|sd_|cor_|lp__|smooth_sd)", x$Parameter)

bug model_info.brmsfit

Seems like:

library(brms)

model <- brms::brm(mpg ~ wt + cyl, data = mtcars)
insight::model_info(model)

returns NULL 😕

return values for aovlist

@DominiqueMakowski Currently, insight returns the elements $within and $between for aovlist-objects (i.e. anova with error term).

I don't work with aov(), so I', not sure if this makes sense or not, and it "breaks" current behaviour, where we return $conditional, $random etc. But from my perspective, these information from aovlist (within/between) seems useful, and I think we can make an execption with the names of the returned elements here.

model_info: is_probit

While I am thinking about model type description, an $is_probit shortcut could also be useful for those using probit models 🙊