Giter VIP home page Giter VIP logo

parameters's Introduction

parameters

DOI downloads total status

Describe and understand your model’s parameters!

parameters’ primary goal is to provide utilities for processing the parameters of various statistical models (see here for a list of supported models). Beyond computing p-values, CIs, Bayesian indices and other measures for a wide variety of models, this package implements features like bootstrapping of parameters and models, feature reduction (feature extraction and variable selection), or tools for data reduction like functions to perform cluster, factor or principal component analysis.

Another important goal of the parameters package is to facilitate and streamline the process of reporting results of statistical models, which includes the easy and intuitive calculation of standardized estimates or robust standard errors and p-values. parameters therefor offers a simple and unified syntax to process a large variety of (model) objects from many different packages.

Installation

CRAN parameters status badge R-CMD-check

Type Source Command
Release CRAN install.packages("parameters")
Development r - universe install.packages("parameters", repos = "https://easystats.r-universe.dev")
Development GitHub remotes::install_github("easystats/parameters")

Tip

Instead of library(parameters), use library(easystats). This will make all features of the easystats-ecosystem available.

To stay updated, use easystats::install_latest().

Documentation

Documentation Blog Features

Click on the buttons above to access the package documentation and the easystats blog, and check-out these vignettes:

Contributing and Support

In case you want to file an issue or contribute in another way to the package, please follow this guide. For questions about the functionality, you may either contact us via email or also file an issue.

Features

Model’s parameters description

The model_parameters() function (that can be accessed via the parameters() shortcut) allows you to extract the parameters and their characteristics from various models in a consistent way. It can be considered as a lightweight alternative to broom::tidy(), with some notable differences:

  • The column names of the returned data frame are specific to their content. For instance, the column containing the statistic is named following the statistic name, i.e., t, z, etc., instead of a generic name such as statistic (however, you can get standardized (generic) column names using standardize_names()).
  • It is able to compute or extract indices not available by default, such as p-values, CIs, etc.
  • It includes feature engineering capabilities, including parameters bootstrapping.

Classical Regression Models

model <- lm(Sepal.Width ~ Petal.Length * Species + Petal.Width, data = iris)

# regular model parameters
model_parameters(model)
#> Parameter                           | Coefficient |   SE |         95% CI | t(143) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        2.89 | 0.36 | [ 2.18,  3.60] |   8.01 | < .001
#> Petal Length                        |        0.26 | 0.25 | [-0.22,  0.75] |   1.07 | 0.287 
#> Species [versicolor]                |       -1.66 | 0.53 | [-2.71, -0.62] |  -3.14 | 0.002 
#> Species [virginica]                 |       -1.92 | 0.59 | [-3.08, -0.76] |  -3.28 | 0.001 
#> Petal Width                         |        0.62 | 0.14 | [ 0.34,  0.89] |   4.41 | < .001
#> Petal Length × Species [versicolor] |       -0.09 | 0.26 | [-0.61,  0.42] |  -0.36 | 0.721 
#> Petal Length × Species [virginica]  |       -0.13 | 0.26 | [-0.64,  0.38] |  -0.50 | 0.618

# standardized parameters
model_parameters(model, standardize = "refit")
#> Parameter                           | Coefficient |   SE |         95% CI | t(143) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        3.59 | 1.30 | [ 1.01,  6.17] |   2.75 | 0.007 
#> Petal Length                        |        1.07 | 1.00 | [-0.91,  3.04] |   1.07 | 0.287 
#> Species [versicolor]                |       -4.62 | 1.31 | [-7.21, -2.03] |  -3.53 | < .001
#> Species [virginica]                 |       -5.51 | 1.38 | [-8.23, -2.79] |  -4.00 | < .001
#> Petal Width                         |        1.08 | 0.24 | [ 0.59,  1.56] |   4.41 | < .001
#> Petal Length × Species [versicolor] |       -0.38 | 1.06 | [-2.48,  1.72] |  -0.36 | 0.721 
#> Petal Length × Species [virginica]  |       -0.52 | 1.04 | [-2.58,  1.54] |  -0.50 | 0.618

# heteroscedasticity-consitent SE and CI
model_parameters(model, vcov = "HC3")
#> Parameter                           | Coefficient |   SE |         95% CI | t(143) |      p
#> -------------------------------------------------------------------------------------------
#> (Intercept)                         |        2.89 | 0.43 | [ 2.03,  3.75] |   6.66 | < .001
#> Petal Length                        |        0.26 | 0.29 | [-0.30,  0.83] |   0.92 | 0.357 
#> Species [versicolor]                |       -1.66 | 0.53 | [-2.70, -0.62] |  -3.16 | 0.002 
#> Species [virginica]                 |       -1.92 | 0.77 | [-3.43, -0.41] |  -2.51 | 0.013 
#> Petal Width                         |        0.62 | 0.12 | [ 0.38,  0.85] |   5.23 | < .001
#> Petal Length × Species [versicolor] |       -0.09 | 0.29 | [-0.67,  0.48] |  -0.32 | 0.748 
#> Petal Length × Species [virginica]  |       -0.13 | 0.31 | [-0.73,  0.48] |  -0.42 | 0.675

Mixed Models

library(lme4)
model <- lmer(Sepal.Width ~ Petal.Length + (1 | Species), data = iris)

# model parameters with CI, df and p-values based on Wald approximation
model_parameters(model)
#> # Fixed Effects
#> 
#> Parameter    | Coefficient |   SE |       95% CI | t(146) |      p
#> ------------------------------------------------------------------
#> (Intercept)  |        2.00 | 0.56 | [0.89, 3.11] |   3.56 | < .001
#> Petal Length |        0.28 | 0.06 | [0.16, 0.40] |   4.75 | < .001
#> 
#> # Random Effects
#> 
#> Parameter               | Coefficient |   SE |       95% CI
#> -----------------------------------------------------------
#> SD (Intercept: Species) |        0.89 | 0.46 | [0.33, 2.43]
#> SD (Residual)           |        0.32 | 0.02 | [0.28, 0.35]

# model parameters with CI, df and p-values based on Kenward-Roger approximation
model_parameters(model, ci_method = "kenward", effects = "fixed")
#> # Fixed Effects
#> 
#> Parameter    | Coefficient |   SE |       95% CI |    t |     df |      p
#> -------------------------------------------------------------------------
#> (Intercept)  |        2.00 | 0.57 | [0.07, 3.93] | 3.53 |   2.67 | 0.046 
#> Petal Length |        0.28 | 0.06 | [0.16, 0.40] | 4.58 | 140.98 | < .001

Structural Models

Besides many types of regression models and packages, it also works for other types of models, such as structural models (EFA, CFA, SEM…).

library(psych)

model <- psych::fa(attitude, nfactors = 3)
model_parameters(model)
#> # Rotated loadings from Factor Analysis (oblimin-rotation)
#> 
#> Variable   |  MR1  |  MR2  |  MR3  | Complexity | Uniqueness
#> ------------------------------------------------------------
#> rating     | 0.90  | -0.07 | -0.05 |    1.02    |    0.23   
#> complaints | 0.97  | -0.06 | 0.04  |    1.01    |    0.10   
#> privileges | 0.44  | 0.25  | -0.05 |    1.64    |    0.65   
#> learning   | 0.47  | 0.54  | -0.28 |    2.51    |    0.24   
#> raises     | 0.55  | 0.43  | 0.25  |    2.35    |    0.23   
#> critical   | 0.16  | 0.17  | 0.48  |    1.46    |    0.67   
#> advance    | -0.11 | 0.91  | 0.07  |    1.04    |    0.22   
#> 
#> The 3 latent factors (oblimin rotation) accounted for 66.60% of the total variance of the original data (MR1 = 38.19%, MR2 = 22.69%, MR3 = 5.72%).

Variable and parameters selection

select_parameters() can help you quickly select and retain the most relevant predictors using methods tailored for the model type.

lm(disp ~ ., data = mtcars) |>
  select_parameters() |>
  model_parameters()
#> Parameter   | Coefficient |     SE |            95% CI | t(26) |      p
#> -----------------------------------------------------------------------
#> (Intercept) |      141.70 | 125.67 | [-116.62, 400.02] |  1.13 | 0.270 
#> cyl         |       13.14 |   7.90 | [  -3.10,  29.38] |  1.66 | 0.108 
#> hp          |        0.63 |   0.20 | [   0.22,   1.03] |  3.18 | 0.004 
#> wt          |       80.45 |  12.22 | [  55.33, 105.57] |  6.58 | < .001
#> qsec        |      -14.68 |   6.14 | [ -27.31,  -2.05] | -2.39 | 0.024 
#> carb        |      -28.75 |   5.60 | [ -40.28, -17.23] | -5.13 | < .001

Citation

In order to cite this package, please use the following command:

citation("parameters")
To cite package 'parameters' in publications use:

  Lüdecke D, Ben-Shachar M, Patil I, Makowski D (2020). "Extracting,
  Computing and Exploring the Parameters of Statistical Models using
  R." _Journal of Open Source Software_, *5*(53), 2445.
  doi:10.21105/joss.02445 <https://doi.org/10.21105/joss.02445>.

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Extracting, Computing and Exploring the Parameters of Statistical Models using {R}.},
    volume = {5},
    doi = {10.21105/joss.02445},
    number = {53},
    journal = {Journal of Open Source Software},
    author = {Daniel Lüdecke and Mattan S. Ben-Shachar and Indrajeet Patil and Dominique Makowski},
    year = {2020},
    pages = {2445},
  }

Code of Conduct

Please note that the parameters project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

parameters's People

Contributors

aaronpeikert avatar amirdjv avatar benmarwick avatar bwiernik avatar cmaimone avatar d-morrison avatar danieleweeks avatar dominiquemakowski avatar etiennebacher avatar github-actions[bot] avatar indrajeetpatil avatar jimrothstein avatar jluchman avatar jmgirard avatar mattansb avatar pdwaggoner avatar rempsyc avatar snhansen avatar strengejacke avatar vincentarelbundock avatar zen-juen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parameters's Issues

describe_distribution

a wrapper for find_distribution, its confidence, kurtosis, skewness and point estimates.

Remove dplyr deps

Why would we need standardize.grouped_df()? I think the results of standardizing won't differ, no matter if the df is grouped or not. Or do you have any counter-examples?

parameters_reduction

One of the most popular features of psycho was (quite unexpectedly TBH) n_factors, helping in deciding how many factors to retain in FAs. As I plan in a (distant) feature to recenter that package around psychology-related functions, I am thinking about reimplementing (and improving) this function in easystats.

Now that we have here parameters_selection and that we moved the PCA function from performance to here, I think that it would make sense to expand on the set of functions around PCA/FA here (as well as supporting different EFA/CFA/PCA objects from different packages by model_parameters).

These functions could be later be used for a parameters_reduction function that would "reduce" the predictors' space using different methods.

check_factorstructure: improve name and print

  • I am slowly adding a new set of features in parameters around dimensionality reduction (including support for PCA / FA etc.)
  • In this context, I added the two popular tests for checking if the data is suitable for FA
  • I would like to have a master function that returns the output of these two checks "at once"
  • A function simply running these two checks gives an ugly collated output
  • how to improve it?

Originally posted by @DominiqueMakowski in #52

n_factors: improvements

Hi @SachaEpskamp ☺️

Some (long) time ago, upon your review of the psycho package, you mentioned about adding other methods, such as EGA, in the n_factors function that aims at aggregating a lot of methods together.

I recently updated, re-implemented and hopefully improved this function (see here a small demo) in this package. I added the EGA methods using the EGAnet package.

The implementation is rather simple, but the package fails quite often (I didn't manage to run the bootEGA from the example). I am not sure whether it is a bug or related to the provided data.

Anyway, I was hoping you could tell us what you think of this function; does the implementation, looks reasonable to you? Are you aware of any other methods/procedures that we could add to it? Do not hesitate to share any thoughts and comments :) Thanks

Less dependencies

purrr is used only in one location, but my quick base solutions don't work, as they it seems that they do not preserve the method type (i.e., they for some reason transform factors into numerics).

For broom, this will require a bit more work... we can maybe keep it at first (to support many models), and then, progressively, replace it here and there until complete removal.

There is still the dplyr case... I think we could keep it for now, and discuss its keeping later once the package is a bit more in shape.

parameters_selection for lme4: MuMIn's dredge()

Using MuMIn's dredge() could be interesting BUT throws unnecessary warnings and requires to set global options for na.action, or in the model fit, even though no NaNs.

library(lme4)
library(MuMIn)

model <- lmer(Sepal.Width ~ Sepal.Length * Petal.Width * Petal.Length + (1 | Species), data = iris, na.action = na.fail)  # The function needs THE MODEL ot be fitted with this na.action

best <-  summary(MuMIn::get.models(MuMIn::dredge(model), 1)[[1]])

https://github.com/cran/MuMIn/blob/master/R/dredge.R

Maybe it could be reimplemented?

Different methods for model parameters standardization

Currently, model standardization does the full standardization, meaning that it actually directly standardizes the outcome and predictors (omitting the factors and binary variables) and then refit the model.

While this is IMHO the most meaningful method to obtain standardized coefs (that can be interpreted, for numeric predictors, as how an increase of 1 SD in the predictor changes the SD of the outcome), it is also slow (necessitating to refit the model), which can be a problem, for instance in Bayesian analyses.

Although a posteriori standardization of parameters does not always make sense (especially for factors and interactions effects), it would be good to allow for different (and faster) standardization methods, such as the simple scaling of parameters based on the outcome's SD.

I am not sure what design would be the best, and several different methods exist.

Some references:

  • Bring, J. (1994). How to standardize regression coefficients. The American Statistician, 48(3), 209-213.
  • Menard, S. (2004). Six approaches to calculating standardized logistic regression coefficients. The American Statistician, 58(3), 218-223.
  • Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in medicine, 27(15), 2865-2873.
  • Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1(2), 103-113.
  • Menard, S. (2011). Standards for standardized logistic regression coefficients. Social Forces, 89(4), 1409-1428.

Coloured coefs

The more I am thinking about it the more I think it would be awesome having a coloured parameters table, in particular, green/red for the estimate depending on the direction... it would become much more readable,

@strengejacke you mentioned that such colour printing inspired by crayon was easy to implement... do you have any code snippet/example so I could experiment with it?

Remove find_distribution

After thinking about it, I think I'll remove find_distribution and the associated machine learning model.

First of all, because it's a bit out of scope. But the true reason is that I've been trying to wrap my head around it and it would probably need its own package to be done right. In fact, classifying distributions might be more interesting and relevant using a step-by-step process (or ensemble of models).

For instance, a first model could try finding the "number of peaks" and discriminating between uniform, unimodal and multimodal distributions. Then, if the distribution is unimodal, it would try discriminating between symetric and non-symmetric (skewed) distributions. Depending on that, the next step would be to refine the classification (for symmetric distributions, segregating normal, Cauchy, t etc.) and for non-symmetric between beta gamma exp and whatnot. For multimodal distributions, it could use some form of mixture modelling.

Anyway, doing it right is beyond my abilities and the scope of this package :)

Improve format_parameters for nested models

Factors within interactions are correctly captured and nicely formatted, but this fails for some of the levels for nested models (when a parameter of a component doesn't appear alone), here for Speciessetosa that should be transformed to Species (setosa)

library(parameters)
model <- lm(Sepal.Length ~ Species * Petal.Length, data = iris)
format_parameters(model)
#> [1] "(Intercept)"                        
#> [2] "Species (versicolor)"               
#> [3] "Species (virginica)"                
#> [4] "Petal.Length"                       
#> [5] "Species (versicolor) * Petal.Length"
#> [6] "Species (virginica) * Petal.Length"

model <- lm(Sepal.Length ~ Species / Petal.Length, data = iris)
format_parameters(model)
#> [1] "(Intercept)"                        
#> [2] "Species (versicolor)"               
#> [3] "Species (virginica)"                
#> [4] "Speciessetosa * Petal.Length"       
#> [5] "Species (versicolor) * Petal.Length"
#> [6] "Species (virginica) * Petal.Length"

Created on 2019-08-03 by the reprex package (v0.3.0)

In general, it would be good to capture nested parameters to differentiate them from interactions (for instance; Petal.Length in Species (versicolor) or Species (versicolor): Petal.Length instead of Species (versicolor) * Petal.Length), although I am not sure how to do it as the names of the parameters are similar (separated by :). We might have to understand the pattern underlying it...

consistent naming

not sure how we decide on this, but model_parameters() currently has following column names:

Parameter beta SE CI_low CI_high t p Std_beta

Do we want to harmonize this a bit?

CRAN v2 roadmap

Features

Plan

  • generalise p values
  • generalise ci
  • Check that it works for rstanarm and brms models
  • Check that it works for frequentist models

Optional (possibly for v2)

  • se for standards errors #29
  • Marginal coefs
  • Conditional equivalence

Other

  • testing
  • documentation

model_parameters for subcomponents (random, ...)

@strengejacke As you can see, it differs from broom by giving more flexibility, computing more things, and also being less "generic" (e.g. the colnames directly refers to what they represent, i.e., Mean, Median, instead of "estimate"). This last point is both a bad thing (from a programming perspective, as you need to adapt to the input), but I believe a good thing for users (improved clarity and explicitness). Especially, it was designed this way for straightforward future compatibility with report (which will paste colnames and values into textual report). Hence the presence of title, upper and lower cases; although it is not conventional for programming, it is usually the way we format the tables for reporting in papers...

What do you think of the current design direction?

model_parameters and p_value

we might (partially) extract the code from p_value() for the different models I added, to write related model_parameters() functions, and then just call model-parameters to get the p-values.

parameters_selection or variables_selection

Currently the function is named parameters_selection, because of consistency with others (e.g. parameters_standardize) and because of packagename_* convention sometimes used.

However, this aspect is, I think, more known as "variables selection" (or features selection).

Same goes for the future parameters_reduction that I will soon start to think about.

Should we change these functions to variables_*? Keep both (variables_* as aliases)? Something else?

Warning: bad markup (extra space?)

I have these warnings during the checks:

Duration: 2m 12.9s

> checking whether package 'parameters' can be installed ... WARNING
  See below...

> checking Rd files ... WARNING
  prepare_Rd: bad markup (extra space?) at describe_distribution.Rd:13:131
  prepare_Rd: bad markup (extra space?) at model_parameters.stanreg.Rd:16:131
  prepare_Rd: bad markup (extra space?) at model_parameters.stanreg.Rd:24:135
  prepare_Rd: bad markup (extra space?) at model_parameters.stanreg.Rd:26:79
  prepare_Rd: bad markup (extra space?) at model_parameters.stanreg.Rd:30:161

0 errors v | 2 warnings x | 0 notes v
Error: R CMD check found WARNINGs
Execution halted

Asked on roxygen's repo: r-lib/roxygen2#866

print() method for diagnostic / describe posterior

describe_posterior() resp. diagnostic_posterior() are very similar to sjstats::tidy_stan(), but what is missing is a nice print-method. See current printing in bayestestR and then sjstats::tidy_stan, the latter would be nice to have in bayestestR as well.

library(bayestestR)
library(sjstats)

m2 <- insight::download_model("brms_zi_2")

# current
diagnostic_posterior(m2, effects = "all", component = "all")
#>                     Parameter  ESS      Rhat        MCSE
#> 1                    b_camper 2724 0.9999710 0.002876512
#> 2                     b_child 1089 1.0015550 0.004902069
#> 3                 b_Intercept  562 1.0091945 0.015863127
#> 4                   b_persons  382 1.0103310 0.004391830
#> 5                 b_zi_camper 2277 1.0005747 0.021860957
#> 6                  b_zi_child 2322 1.0011371 0.017525259
#> 7              b_zi_Intercept  845 1.0005454 0.031346016
#> 8      r_persons.1.Intercept.  572 1.0085952 0.004216279
#> 9      r_persons.2.Intercept.  691 1.0076532 0.003427134
#> 10     r_persons.3.Intercept.  340 1.0105217 0.003251703
#> 11     r_persons.4.Intercept.  287 1.0112452 0.009202861
#> 12 r_persons__zi.1.Intercept.  811 1.0011321 0.030257181
#> 13 r_persons__zi.2.Intercept.  759 1.0012496 0.029842693
#> 14 r_persons__zi.3.Intercept.  871 1.0009854 0.028159589
#> 15 r_persons__zi.4.Intercept.  912 0.9997436 0.028118818

# nice to have
tidy_stan(m2, type = "all")
#> 
#> # Summary Statistics of Stan-Model
#> 
#> ## Conditional Model: Fixed effects
#> 
#>            estimate std.error ci.lvl      HDI(89%) ratio rhat mcse
#>  Intercept    -0.84      0.28     89 [-1.44 -0.29]  0.14 1.01 0.02
#>  persons       0.84      0.09     89 [-1.29 -0.98]  0.10 1.01 0.01
#>  child        -1.15      0.09     89 [ 0.58  0.89]  0.27 1.00 0.00
#>  camper        0.73      0.09     89 [-1.93  0.52]  0.68 1.00 0.00
#> 
#> ## Conditional Model: Random effect (Intercept: persons)
#> 
#>            estimate std.error ci.lvl      HDI(89%) ratio rhat mcse
#>  persons.1    -0.01      0.10     89 [-0.38  0.28]  0.14 1.01 0.01
#>  persons.2     0.02      0.09     89 [-0.17  0.30]  0.17 1.01 0.01
#>  persons.3    -0.02      0.08     89 [-0.26  0.18]  0.08 1.01 0.01
#>  persons.4     0.00      0.09     89 [-0.32  0.33]  0.07 1.01 0.01
#> 
#> ## Zero-Inflated Model: Fixed effects
#> 
#>            estimate std.error ci.lvl      HDI(89%) ratio rhat mcse
#>  Intercept    -0.64      0.71     89 [ 0.66  1.06]  0.21    1 0.03
#>  child         1.88      0.32     89 [ 1.40  2.43]  0.58    1 0.01
#>  camper       -0.83      0.36     89 [-1.41 -0.24]  0.57    1 0.01
#> 
#> ## Zero-Inflated Model: Random effect (Intercept: persons)
#> 
#>            estimate std.error ci.lvl      HDI(89%) ratio rhat mcse
#>  persons.1     1.28      0.78     89 [ 0.08  2.70]  0.20    1 0.03
#>  persons.2     0.25      0.68     89 [-0.90  1.57]  0.19    1 0.03
#>  persons.3    -0.18      0.71     89 [-1.51  1.01]  0.22    1 0.03
#>  persons.4    -1.29      0.74     89 [-2.62 -0.01]  0.23    1 0.03

Created on 2019-05-19 by the reprex package (v0.2.1)

Discrepancy standardize_parameters()

I think these three methods should return the same results, however, standardize_parameters() differs (for factors).

library(lm.beta)
library(sjstats)
library(parameters)
#> 
#> Attaching package: 'parameters'
#> The following object is masked from 'package:sjstats':
#> 
#>     p_value

data(iris)
m <- lm(Sepal.Length ~ Species + Petal.Width, data = iris)

lm.beta(m)
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Species + Petal.Width, data = iris)
#> 
#> Standardized Coefficients::
#>       (Intercept) Speciesversicolor  Speciesvirginica       Petal.Width 
#>        0.00000000       -0.03441674       -0.02860860        0.84401156

std_beta(m)
#>                term std.estimate std.error   conf.low conf.high
#> 1 Speciesversicolor  -0.03441674 0.1316090 -0.2923655 0.2235321
#> 2  Speciesvirginica  -0.02860860 0.2046162 -0.4296489 0.3724317
#> 3       Petal.Width   0.84401156 0.1784474  0.4942611 1.1937620

standardize_parameters(m, method = "full", robust = FALSE)
#>           Parameter Std_Estimate
#> 1       (Intercept)           NA
#> 2 Speciesversicolor  -0.07276516
#> 3  Speciesvirginica  -0.06048538
#> 4       Petal.Width   0.84401156

Created on 2019-05-01 by the reprex package (v0.2.1)

"Automatic" predictors selection

We could think of the possibility/sense/feasibility of a cross-model wrapper for variable selection, using projpred for bayesian models or the new cAIC4 for frequentist mixed models.

Different p values for report() to_fulltabable() and model_parameters()

Hi guys,

I found this different p value for a lmer model, for TIME2 (in bold) which should I relay on?

Cheers :)

Parameter       | Coefficient |   SE | CI_low | CI_high |     t |    p | Std_Coefficient |    Fit
-------------------------------------------------------------------------------------------------
(Intercept)     |        3.62 | 0.37 |  -3.47 |   10.71 |  9.75 | 0.00 |            0.27 |       
Condition_ASync |       -0.47 | 0.22 |   0.45 |   -1.39 | -2.13 | 0.03 |           -0.32 |       
**Time2**       |       -0.19 | 0.23 |   0.18 |   -0.55 | -0.83 | **0.41** |           -0.13 |       
                |             |      |        |         |       |      |                 |       
AIC             |             |      |        |         |       |      |                 | 192.40
BIC             |             |      |        |         |       |      |                 | 202.79
R2_conditional  |             |      |        |         |       |      |                 |   0.70
R2_marginal     |             |      |        |         |       |      |                 |   0.03
ICC             |             |      |        |         |       |      |                 |   0.69
RMSE            |             |      |        |         |       |      |                 |   0.72

> model_performance(fit1)
       AIC      BIC R2_conditional R2_marginal      ICC     RMSE
1 192.3982 202.7859      0.7017558  0.02693898 0.693499 0.716923
> model_parameters(fit1, effsize = "cohen1988")
        Parameter Coefficient        SE     CI_low    CI_high          t            p Std_Coefficient
1     (Intercept)   3.6170389 0.3708574 -3.4722271 10.7063049  9.7531787 1.787809e-22       0.2679454
2 Condition_ASync  -0.4695803 0.2200682  0.4507802 -1.3899408 -2.1337941 3.285964e-02      -0.3185324
3           **Time2**  -0.1868707 0.2264781  0.1793892 -0.5531306 -0.8251162 **4.093056e-01**

      -0.1267608

Coefficient -> Beta

Since a CRAN release is close, I wanted to have your opinion about something:

We decided here to change from beta (lowercase) to Coefficient (for the column name of the parameters table).

However, since we usually try to stick to the APA norms, and that they recommend using the beta symbol to report slopes, I wondered if we should change it to Beta (title case for consistency).

Thoughts?

Conditional Equivalence Testing (CET)

Follow up on here

Similar to equivalence_test() in bayestestR, we might think about implementing a similar method for the frequentists framework. See this paper: https://doi.org/10.1371/journal.pone.0195145

From the paper:

  • Step 1- Calculate a (1 − α1)% Confidence Interval for θ.
  • Step 2- If this C.I. excludes θ0, then declare a positive result. Otherwise, if θ0 is within the C.I., proceed to Step 3.
  • Step 3- Calculate a (1 − 2α2)% Confidence Interval for θ.
  • Step 4- If this C.I. is entirely within δ, declare a negative result. Otherwise, proceed to Step 5.
  • Step 5- Declare an inconclusive result. There is insufficient evidence to support any conclusion.

Get standard errors from models

In many cases, we get this from the summary(), but not all. insight::get_parameters() might help to find this out for different models.

But we should include a function like se() or maybe std_error() or whatever, because we then have all we need for:

  • coefficients
  • se
  • p-value
  • ci

and thus we have a light-weight broom-alternative.

function names

seeing how things develop, I wonder if it would make sense to have all parameters functions starting by parameters (similarly to performance). For example, parameters_standardization(), parameters_selection(), parameters_bootstrap(), parameters_marginal() etc.

Print-method for Model-Summary

Ok, I have implemented the new insight::print_parameters() in sjstats::tidy_stan(). Here's an example. Usually, it's pretty straightforward to print, except if you need some special preparation - in this case, I print multiple HDI's side-by-side (this is something we could think of doing in bayestestR as well by default?):

And by the way, tidy_stan() might be something we could re-implement in bayestestR as well, any thoughts?

library(sjstats)
library(easystats)
m <- download_model("brms_zi_3")
tidy_stan(m, prob = c(.5, .89, .95))
#> 
#> Summary Statistics of Stan-Model
#> 
#> # Fixed effects (conditional) 
#> 
#>    Parameter Estimate Std.Error      HDI(50%)      HDI(89%)      HDI(95%) ESS Rhat MCSE
#>  (Intercept)     1.32      0.75 [ 1.19  2.11] [ 0.05  2.27] [-0.59  2.95]  78 1.00 0.10
#>        child    -1.16      0.11 [-1.25 -1.11] [-1.32 -0.98] [-1.40 -0.97] 172 1.00 0.01
#>       camper     0.73      0.09 [ 0.64  0.76] [ 0.59  0.86] [ 0.59  0.92] 233 1.00 0.01
#> 
#> 
#> # Fixed effects (zero-inflated) 
#> 
#>    Parameter Estimate Std.Error      HDI(50%)      HDI(89%)      HDI(95%) ESS Rhat MCSE
#>  (Intercept)    -0.78      0.63 [-1.39 -0.57] [-1.89  0.22] [-2.31  0.61]  92 1.00 0.08
#>        child     1.89      0.29 [ 1.72  2.10] [ 1.30  2.30] [ 1.25  2.48]  72 1.01 0.04
#>       camper    -0.84      0.30 [-1.05 -0.64] [-1.34 -0.23] [-1.50 -0.23] 182 1.00 0.04
#> 
#> 
#> # Random effects (conditional) Intercept: persons
#> 
#>  Parameter Estimate Std.Error      HDI(50%)      HDI(89%)      HDI(95%) ESS Rhat MCSE
#>  persons.1    -1.32      0.69 [-1.69 -0.78] [-2.55 -0.03] [-3.24  0.30]  80 1.00 0.06
#>  persons.2    -0.38      0.73 [-1.12 -0.17] [-1.45  1.01] [-1.90  1.59]  78 1.01 0.10
#>  persons.3     0.31      0.75 [-0.43  0.53] [-0.73  1.59] [-1.41  2.13]  77 1.00 0.06
#>  persons.4     1.21      0.74 [ 0.42  1.33] [ 0.29  2.54] [-0.69  2.87]  78 1.00 0.09
#> 
#> 
#> # Random effects (zero-inflated) Intercept: persons
#> 
#>  Parameter Estimate Std.Error      HDI(50%)      HDI(89%)      HDI(95%) ESS Rhat MCSE
#>  persons.1     1.35      0.73 [ 0.97  1.91] [ 0.37  2.66] [-0.09  2.81]  91 1.00 0.08
#>  persons.2     0.38      0.58 [-0.02  0.73] [-0.73  1.49] [-0.93  1.93]  99 1.00 0.07
#>  persons.3    -0.12      0.62 [-0.39  0.39] [-1.16  1.13] [-1.50  1.42]  94 1.00 0.07
#>  persons.4    -1.17      0.60 [-1.48 -0.71] [-2.46 -0.06] [-2.92  0.16] 113 1.00 0.07

Originally posted by @strengejacke in easystats/bayestestR#74 (comment)

Return value for p_value

I would suggest returning a data frame (one column "Parameter", one column "P" or so), which is more consistent with our other functions.

degrees of freedom: DoF or df?

I've seen a trend (also in other languages such as Julia) toward naming degrees of freedom dof instead of df (my guess is that it's to avoid confusion with dataframe). However, in APA tables it's usually reported as df.

Should we rename the column to df?

docs that need polishing

Improve find_distribution model

Increase performance:

  • Model comparison, try different models
  • Number of distribution peaks obtained through a gradient
  • Normalize density y axis, but standardize x axis

Decrease size:

  • Features selection
  • Optimize the saving and reading methods (look for the possibility of saving only the coefs matrix)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.