easystats / correlation Goto Github PK
View Code? Open in Web Editor NEW:link: Methods for Correlation Analysis
Home Page: https://easystats.github.io/correlation/
License: Other
:link: Methods for Correlation Analysis
Home Page: https://easystats.github.io/correlation/
License: Other
since we have several different types of correlations for which p values are not straightforward, would be nice to include a method for bootstrapped CI. Any advice?
I am trying to use correlation
in my package and had to use it with purrr
and discovered this weird behavior. I am not sure what's the source of this error though.
Works as expected.
# setup
set.seed(123)
library(tidyverse)
# creating a list of dataframes
df_ls1 <- iris %>%
split(x = ., f = .$Species, drop = TRUE)
# running function of interest
purrr::pmap(list(df_ls1), correlation::correlation)
#> $setosa
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> -----------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | 0.74 | 7.68 | 48 | < .001 | [ 0.59, 0.85] | Pearson | 50
#> Sepal.Length | Petal.Length | 0.27 | 1.92 | 48 | 0.202 | [-0.01, 0.51] | Pearson | 50
#> Sepal.Length | Petal.Width | 0.28 | 2.01 | 48 | 0.202 | [ 0.00, 0.52] | Pearson | 50
#> Sepal.Width | Petal.Length | 0.18 | 1.25 | 48 | 0.217 | [-0.11, 0.43] | Pearson | 50
#> Sepal.Width | Petal.Width | 0.23 | 1.66 | 48 | 0.208 | [-0.05, 0.48] | Pearson | 50
#> Petal.Length | Petal.Width | 0.33 | 2.44 | 48 | 0.093 | [ 0.06, 0.56] | Pearson | 50
#>
#> $versicolor
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ----------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | 0.53 | 4.28 | 48 | < .001 | [0.29, 0.70] | Pearson | 50
#> Sepal.Length | Petal.Length | 0.75 | 7.95 | 48 | < .001 | [0.60, 0.85] | Pearson | 50
#> Sepal.Length | Petal.Width | 0.55 | 4.52 | 48 | < .001 | [0.32, 0.72] | Pearson | 50
#> Sepal.Width | Petal.Length | 0.56 | 4.69 | 48 | < .001 | [0.33, 0.73] | Pearson | 50
#> Sepal.Width | Petal.Width | 0.66 | 6.15 | 48 | < .001 | [0.47, 0.80] | Pearson | 50
#> Petal.Length | Petal.Width | 0.79 | 8.83 | 48 | < .001 | [0.65, 0.87] | Pearson | 50
#>
#> $virginica
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> -----------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | 0.46 | 3.56 | 48 | 0.003 | [0.20, 0.65] | Pearson | 50
#> Sepal.Length | Petal.Length | 0.86 | 11.90 | 48 | < .001 | [0.77, 0.92] | Pearson | 50
#> Sepal.Length | Petal.Width | 0.28 | 2.03 | 48 | 0.048 | [0.00, 0.52] | Pearson | 50
#> Sepal.Width | Petal.Length | 0.40 | 3.03 | 48 | 0.012 | [0.14, 0.61] | Pearson | 50
#> Sepal.Width | Petal.Width | 0.54 | 4.42 | 48 | < .001 | [0.31, 0.71] | Pearson | 50
#> Petal.Length | Petal.Width | 0.32 | 2.36 | 48 | 0.045 | [0.05, 0.55] | Pearson | 50
# data with NAs
# creating a list of dataframes
df_ls2 <- ggplot2::msleep %>%
split(x = ., f = .$vore, drop = TRUE)
# running function of interest
purrr::pmap(list(df_ls2), correlation::correlation)
#> Error in rbind(deparse.level, ...): numbers of columns of arguments do not match
Created on 2020-03-20 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R Under development (unstable) (2020-02-28 r77874)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz Europe/Berlin
#> date 2020-03-20
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 4.0.0)
#> bayestestR 0.5.2.1 2020-03-16 [1] Github (easystats/bayestestR@6ee7e37)
#> broom 0.5.3.9000 2020-03-01 [1] Github (tidymodels/broom@3c922d5)
#> callr 3.4.2 2020-02-12 [1] CRAN (R 4.0.0)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0)
#> correlation 0.1.0 2020-03-17 [1] Github (easystats/correlation@c1c35b0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0)
#> dbplyr 1.4.2 2019-06-17 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.2.2 2020-02-17 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr * 0.8.5 2020-03-07 [1] CRAN (R 4.0.0)
#> effectsize 0.2.0.1 2020-03-06 [1] Github (easystats/effectsize@64bfbc3)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> forcats * 0.5.0 2020-03-01 [1] CRAN (R 4.0.0)
#> fs 1.3.2 2020-03-05 [1] CRAN (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 4.0.0)
#> glue 1.3.2 2020-03-12 [1] CRAN (R 4.0.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 4.0.0)
#> insight 0.8.2.1 2020-03-16 [1] Github (easystats/insight@e0b229b)
#> jsonlite 1.6.1 2020-02-02 [1] CRAN (R 4.0.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0.9000 2020-03-16 [1] Github (r-lib/lifecycle@355dcba)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> modelr 0.1.6 2020-02-22 [1] CRAN (R 4.0.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
#> parameters 0.6.0 2020-03-12 [1] CRAN (R 4.0.0)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 4.0.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4 2020-03-17 [1] CRAN (R 4.0.0)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 4.0.0)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 4.0.0)
#> rlang 0.4.5 2020-03-01 [1] CRAN (R 4.0.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 4.0.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble * 2.1.3 2019-06-06 [1] CRAN (R 4.0.0)
#> tidyr * 1.0.2 2020-01-24 [1] CRAN (R 4.0.0)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 4.0.0)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.0)
#> usethis 1.5.1.9000 2020-03-18 [1] Github (r-lib/usethis@8c32c73)
#> vctrs 0.2.4 2020-03-10 [1] CRAN (R 4.0.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 4.0.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 4.0.0)
#> xml2 1.2.5 2020-03-11 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/inp099/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-devel/library
I remember mentioning this somewhere, but I'll rephrase it here for future reference:
Sometimes we want to correlate x and y, where x is, for instance, the mean score of some measure for an individual (average reaction time in the condition A). This value is sometimes accompanied by some measure of variability (or uncertainty) (for instance, the SD). We might want to take this information into account in the correlation, so that observations of x that are more precise (with lower associated variability) have more weight that observations with large uncertainty.
One alternative is to use weighted correlation:
cov.wt(cor = TRUE)
psych::cor.wt
"JOSS submissions are suspended until at least 4th May 2020" ๐
Another alternative could be JORS (never tried), which has publication fees (ยฃ400.00) but says If you do not have funds to pay such fees, you will have an opportunity to waive each fee. We do not want fees to prevent the publication of worthy work.
(bottom of here). Which could apply here, as easystats (as the mother-project) has no funding. What do you say?
Here you are:
#' @importFrom stats na.omit
.factor_to_numeric <- function(x, lowest = NULL) {
if (is.numeric(x)) {
return(x)
}
if (anyNA(suppressWarnings(as.numeric(as.character(stats::na.omit(x)))))) {
if (is.character(x)) {
x <- as.factor(x)
}
levels(x) <- 1:nlevels(x)
}
out <- as.numeric(as.character(x))
if (!is.null(lowest)) {
difference <- min(out) - lowest
out <- out - difference
}
out
}
#' @importFrom stats dnorm qnorm complete.cases sd
own_biserial <- function(x, y) {
cc <- stats::complete.cases(x, y)
x <- x[cc]
y <- y[cc]
.factor_to_numeric(y, lowest = 0)
m1 <- mean(x[y == 1])
m0 <- mean(x[y == 0])
sn <- stats::sd(x)
q <- mean(y)
p <- 1 - q
zp <- stats::dnorm(stats::qnorm(q))
(((m1 - m0) * (p * q / zp)) / sd(x))
}
set.seed(123)
y <- rbinom(100, 1, .3)
x <- rnorm(100)
own_biserial(x, y)
#> [1] 0.08155037
psych::biserial(x, y)
#> [,1]
#> [1,] 0.08155037
set.seed(456)
y <- rbinom(100, 1, .3)
x <- rnorm(100)
own_biserial(x, y)
#> [1] 0.02964972
psych::biserial(x, y)
#> [,1]
#> [1,] 0.02964972
Created on 2020-03-23 by the reprex package (v0.3.0)
Originally posted by @strengejacke in #55
This workflow used to work with the CRAN
version of dplyr
. But I updated to the development version of dplyr
and it no longer seems to work. I can't seem to trace the source of this origin.
correlation
set.seed(123)
library(tidyverse)
iris %>%
split(., .$Species) %>%
map(., ~broom::tidy(stats::lm(formula = Sepal.Length ~ Sepal.Width, data = .x))) %>%
purrr::map_dfr(., tibble::as_tibble)
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> * <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 2.64 0.310 8.51 3.74e-11
#> 2 Sepal.Width 0.690 0.0899 7.68 6.71e-10
#> 3 (Intercept) 3.54 0.563 6.29 9.07e- 8
#> 4 Sepal.Width 0.865 0.202 4.28 8.77e- 5
#> 5 (Intercept) 3.91 0.757 5.16 4.66e- 6
#> 6 Sepal.Width 0.902 0.253 3.56 8.43e- 4
correlation
function(ls <-
iris %>%
split(., .$Species) %>%
map(., correlation::correlation))
#> $setosa
#> Parameter1 | Parameter2 | r | 95% CI | t | df | p | Method | n_Obs
#> -----------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | 0.74 | [ 0.59, 0.85] | 7.68 | 48 | < .001 | Pearson | 50
#> Sepal.Length | Petal.Length | 0.27 | [-0.01, 0.51] | 1.92 | 48 | 0.202 | Pearson | 50
#> Sepal.Length | Petal.Width | 0.28 | [ 0.00, 0.52] | 2.01 | 48 | 0.202 | Pearson | 50
#> Sepal.Width | Petal.Length | 0.18 | [-0.11, 0.43] | 1.25 | 48 | 0.217 | Pearson | 50
#> Sepal.Width | Petal.Width | 0.23 | [-0.05, 0.48] | 1.66 | 48 | 0.208 | Pearson | 50
#> Petal.Length | Petal.Width | 0.33 | [ 0.06, 0.56] | 2.44 | 48 | 0.093 | Pearson | 50
#>
#> $versicolor
#> Parameter1 | Parameter2 | r | 95% CI | t | df | p | Method | n_Obs
#> ----------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | 0.53 | [0.29, 0.70] | 4.28 | 48 | < .001 | Pearson | 50
#> Sepal.Length | Petal.Length | 0.75 | [0.60, 0.85] | 7.95 | 48 | < .001 | Pearson | 50
#> Sepal.Length | Petal.Width | 0.55 | [0.32, 0.72] | 4.52 | 48 | < .001 | Pearson | 50
#> Sepal.Width | Petal.Length | 0.56 | [0.33, 0.73] | 4.69 | 48 | < .001 | Pearson | 50
#> Sepal.Width | Petal.Width | 0.66 | [0.47, 0.80] | 6.15 | 48 | < .001 | Pearson | 50
#> Petal.Length | Petal.Width | 0.79 | [0.65, 0.87] | 8.83 | 48 | < .001 | Pearson | 50
#>
#> $virginica
#> Parameter1 | Parameter2 | r | 95% CI | t | df | p | Method | n_Obs
#> -----------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | 0.46 | [0.20, 0.65] | 3.56 | 48 | 0.003 | Pearson | 50
#> Sepal.Length | Petal.Length | 0.86 | [0.77, 0.92] | 11.90 | 48 | < .001 | Pearson | 50
#> Sepal.Length | Petal.Width | 0.28 | [0.00, 0.52] | 2.03 | 48 | 0.048 | Pearson | 50
#> Sepal.Width | Petal.Length | 0.40 | [0.14, 0.61] | 3.03 | 48 | 0.012 | Pearson | 50
#> Sepal.Width | Petal.Width | 0.54 | [0.31, 0.71] | 4.42 | 48 | < .001 | Pearson | 50
#> Petal.Length | Petal.Width | 0.32 | [0.05, 0.55] | 2.36 | 48 | 0.045 | Pearson | 50
purrr::map_dfr(ls, tibble::as_tibble)
#> Error in (function (x = list(), n = NULL, ..., class = NULL) : formal argument "n" matched by multiple actual arguments
Created on 2020-03-24 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R Under development (unstable) (2020-02-28 r77874)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz Europe/Berlin
#> date 2020-03-24
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib
#> assertthat 0.2.1 2019-03-21 [1]
#> backports 1.1.5 2019-10-02 [1]
#> bayestestR 0.5.2.1 2020-03-16 [1]
#> broom 0.5.3.9000 2020-03-01 [1]
#> callr 3.4.2 2020-02-12 [1]
#> cellranger 1.1.0 2016-07-27 [1]
#> cli 2.0.2 2020-02-28 [1]
#> colorspace 1.4-1 2019-03-18 [1]
#> correlation 0.1.1 2020-03-21 [1]
#> crayon 1.3.4 2017-09-16 [1]
#> DBI 1.1.0 2019-12-15 [1]
#> dbplyr 1.4.2 2019-06-17 [1]
#> desc 1.2.0 2018-05-01 [1]
#> devtools 2.2.2 2020-02-17 [1]
#> digest 0.6.25 2020-02-23 [1]
#> dplyr * 0.8.99.9002 2020-03-23 [1]
#> effectsize 0.3.0 2020-03-22 [1]
#> ellipsis 0.3.0 2019-09-20 [1]
#> evaluate 0.14 2019-05-28 [1]
#> fansi 0.4.1 2020-01-08 [1]
#> forcats * 0.5.0 2020-03-01 [1]
#> fs 1.3.2 2020-03-05 [1]
#> generics 0.0.2 2018-11-29 [1]
#> ggplot2 * 3.3.0 2020-03-05 [1]
#> glue 1.3.2 2020-03-12 [1]
#> gtable 0.3.0 2019-03-25 [1]
#> haven 2.2.0 2019-11-08 [1]
#> highr 0.8 2019-03-20 [1]
#> hms 0.5.3 2020-01-08 [1]
#> htmltools 0.4.0 2019-10-04 [1]
#> httr 1.4.1 2019-08-05 [1]
#> insight 0.8.2.1 2020-03-22 [1]
#> jsonlite 1.6.1 2020-02-02 [1]
#> knitr 1.28 2020-02-06 [1]
#> lifecycle 0.2.0.9000 2020-03-16 [1]
#> lubridate 1.7.4 2018-04-11 [1]
#> magrittr 1.5 2014-11-22 [1]
#> memoise 1.1.0 2017-04-21 [1]
#> modelr 0.1.6 2020-02-22 [1]
#> munsell 0.5.0 2018-06-12 [1]
#> parameters 0.6.0 2020-03-12 [1]
#> pillar 1.4.3 2019-12-20 [1]
#> pkgbuild 1.0.6 2019-10-09 [1]
#> pkgconfig 2.0.3 2019-09-22 [1]
#> pkgload 1.0.2 2018-10-29 [1]
#> prettyunits 1.1.1 2020-01-24 [1]
#> processx 3.4.2 2020-02-09 [1]
#> ps 1.3.2 2020-02-13 [1]
#> purrr * 0.3.3 2019-10-18 [1]
#> R6 2.4.1 2019-11-12 [1]
#> Rcpp 1.0.4 2020-03-17 [1]
#> readr * 1.3.1 2018-12-21 [1]
#> readxl 1.3.1 2019-03-13 [1]
#> remotes 2.1.1 2020-02-15 [1]
#> reprex 0.3.0 2019-05-16 [1]
#> rlang 0.4.5.9000 2020-03-23 [1]
#> rmarkdown 2.1 2020-01-20 [1]
#> rprojroot 1.3-2 2018-01-03 [1]
#> rvest 0.3.5 2019-11-08 [1]
#> scales 1.1.0 2019-11-18 [1]
#> sessioninfo 1.1.1 2018-11-05 [1]
#> stringi 1.4.6 2020-02-17 [1]
#> stringr * 1.4.0 2019-02-10 [1]
#> testthat 2.3.2 2020-03-02 [1]
#> tibble * 2.99.99.9014 2020-03-23 [1]
#> tidyr * 1.0.2 2020-01-24 [1]
#> tidyselect 1.0.0 2020-01-27 [1]
#> tidyverse * 1.3.0 2019-11-21 [1]
#> usethis 1.5.1.9000 2020-03-24 [1]
#> utf8 1.1.4 2018-05-24 [1]
#> vctrs 0.2.99.9010 2020-03-23 [1]
#> withr 2.1.2 2018-03-15 [1]
#> xfun 0.12 2020-01-13 [1]
#> xml2 1.2.5 2020-03-11 [1]
#> yaml 2.2.1 2020-02-01 [1]
#> source
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/bayestestR@6ee7e37)
#> Github (tidymodels/broom@3c922d5)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/correlation@1fe04b9)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (tidyverse/dplyr@35d3ace)
#> Github (easystats/effectsize@6f4d5a3)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/insight@b46a9eb)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/lifecycle@355dcba)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/rlang@a90b04b)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (tidyverse/tibble@96af653)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/usethis@01dbd8f)
#> CRAN (R 4.0.0)
#> Github (r-lib/vctrs@3675fdf)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#>
#> [1] C:/Users/inp099/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-devel/library
Hello,
Thank you for the package.
Your package is working as expected in local shiny apps. But when I try to upload it to a shinyapps.io server, I get the following error:
installed from sources; Packrat will assume this package is available from a CRAN-like repository during future restores Execution halted
I think there is a missing metadata in the package as described in this post:
https://community.rstudio.com/t/error-when-using-devtools-install-github-with-shiny-for-private-repository/39053/2?u=serdarbalci
Best wishes
as JOSS reopens soon @mattansb @strengejacke @IndrajeetPatil the kings of equations, Gondor calls for aid!
Would be cool to add a few equations (you know, because science and all) when possible below the description of the different correlation types (in the paper and the docstrings) ๐ผ
Would love to see an option to remove the stars from the correlation output (could even be the default) to correspond to the ASA's recommendations to remove stars and other references to statistical significance. It also makes sense pedagogically that there should be an option to just report descriptive measures of correlation without reference to inference.
there is again a small legacy dplyr usage in correlation (correlation.R file) for grouped data frames that needs to be removed. Master @strengejacke is it as straightforward as in the other cases?
I'd say as.table()
should save the name (rho
, r
) as attribute.
Originally posted by @strengejacke in easystats/see#65 (comment)
library(correlation)
library(WRS2)
df <- dplyr::select(ggplot2::msleep, c(sleep_rem, awake:bodywt))
set.seed(123)
pball(df, beta = 0.1)
#> Call:
#> pball(x = df, beta = 0.1)
#>
#> Robust correlation matrix:
#> sleep_rem awake brainwt bodywt
#> sleep_rem 1.0000 -0.7669 -0.3956 -0.4226
#> awake -0.7669 1.0000 0.5697 0.5303
#> brainwt -0.3956 0.5697 1.0000 0.8680
#> bodywt -0.4226 0.5303 0.8680 1.0000
#>
#> p-values:
#> sleep_rem awake brainwt bodywt
#> sleep_rem NA 0 0.00538 0.00069
#> awake 0.00000 NA 0.00000 0.00000
#> brainwt 0.00538 0 NA 0.00000
#> bodywt 0.00069 0 0.00000 NA
#>
#>
#> Test statistic H: Inf, p-value = 0
set.seed(123)
pball(df, beta = 0.5)
#> Call:
#> pball(x = df, beta = 0.5)
#>
#> Robust correlation matrix:
#> sleep_rem awake brainwt bodywt
#> sleep_rem 1.0000 -0.6882 -0.4020 -0.4009
#> awake -0.6882 1.0000 0.4959 0.4673
#> brainwt -0.4020 0.4959 1.0000 0.9466
#> bodywt -0.4009 0.4673 0.9466 1.0000
#>
#> p-values:
#> sleep_rem awake brainwt bodywt
#> sleep_rem NA 0e+00 0.00463 0.00136
#> awake 0.00000 NA 0.00010 0.00001
#> brainwt 0.00463 1e-04 NA 0.00000
#> bodywt 0.00136 1e-05 0.00000 NA
#>
#>
#> Test statistic H: Inf, p-value = 0
Correlation coefficients are identical for different betas.
set.seed(123)
correlation(df, method = "percentage", beta = 0.1)
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ------------------------------------------------------------------------------------------------
#> sleep_rem | awake | -0.75 | -8.79 | 59 | < .001 | [-0.84, -0.62] | Percentage Bend | 61
#> sleep_rem | brainwt | -0.41 | -3.04 | 46 | 0.004 | [-0.62, -0.14] | Percentage Bend | 48
#> sleep_rem | bodywt | -0.40 | -3.38 | 59 | 0.003 | [-0.59, -0.17] | Percentage Bend | 61
#> awake | brainwt | 0.59 | 5.36 | 54 | < .001 | [ 0.39, 0.74] | Percentage Bend | 56
#> awake | bodywt | 0.51 | 5.31 | 81 | < .001 | [ 0.33, 0.65] | Percentage Bend | 83
#> brainwt | bodywt | 0.92 | 16.71 | 54 | < .001 | [ 0.86, 0.95] | Percentage Bend | 56
set.seed(123)
correlation(df, method = "percentage", beta = 0.5)
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ------------------------------------------------------------------------------------------------
#> sleep_rem | awake | -0.75 | -8.79 | 59 | < .001 | [-0.84, -0.62] | Percentage Bend | 61
#> sleep_rem | brainwt | -0.41 | -3.04 | 46 | 0.004 | [-0.62, -0.14] | Percentage Bend | 48
#> sleep_rem | bodywt | -0.40 | -3.38 | 59 | 0.003 | [-0.59, -0.17] | Percentage Bend | 61
#> awake | brainwt | 0.59 | 5.36 | 54 | < .001 | [ 0.39, 0.74] | Percentage Bend | 56
#> awake | bodywt | 0.51 | 5.31 | 81 | < .001 | [ 0.33, 0.65] | Percentage Bend | 83
#> brainwt | bodywt | 0.92 | 16.71 | 54 | < .001 | [ 0.86, 0.95] | Percentage Bend | 56
Created on 2020-03-20 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R Under development (unstable) (2020-02-28 r77874)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz Europe/Berlin
#> date 2020-03-20
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 4.0.0)
#> bayestestR 0.5.2.1 2020-03-16 [1] Github (easystats/bayestestR@6ee7e37)
#> callr 3.4.2 2020-02-12 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0)
#> correlation * 0.1.0 2020-03-17 [1] Github (easystats/correlation@c1c35b0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.2.2 2020-02-17 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr 0.8.5 2020-03-07 [1] CRAN (R 4.0.0)
#> effectsize 0.2.0.1 2020-03-06 [1] Github (easystats/effectsize@64bfbc3)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.3.2 2020-03-05 [1] CRAN (R 4.0.0)
#> ggplot2 3.3.0 2020-03-05 [1] CRAN (R 4.0.0)
#> glue 1.3.2 2020-03-12 [1] CRAN (R 4.0.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> insight 0.8.2.1 2020-03-16 [1] Github (easystats/insight@e0b229b)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0.9000 2020-03-16 [1] Github (r-lib/lifecycle@355dcba)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> MASS 7.3-51.5 2019-12-20 [2] CRAN (R 4.0.0)
#> mc2d 0.1-18 2017-03-06 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
#> mvtnorm 1.1-0 2020-02-24 [1] CRAN (R 4.0.0)
#> parameters 0.6.0 2020-03-12 [1] CRAN (R 4.0.0)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 4.0.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0)
#> purrr 0.3.3 2019-10-18 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4 2020-03-17 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> reshape 0.8.8 2018-10-23 [1] CRAN (R 4.0.0)
#> rlang 0.4.5 2020-03-01 [1] CRAN (R 4.0.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 4.0.0)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 4.0.0)
#> usethis 1.5.1.9000 2020-03-18 [1] Github (r-lib/usethis@8c32c73)
#> vctrs 0.2.4 2020-03-10 [1] CRAN (R 4.0.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 4.0.0)
#> WRS2 * 1.0-0 2019-06-06 [1] CRAN (R 4.0.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/inp099/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-devel/library
Should we add a plot method in see for a correlation matrix obtained via summary()
or as.table()
, similar to the one made in the README using ggcorplot. Can we do it with raw ggplot to avoid having another (conditional) dependency?
Note the following examples:
res <- correlation::correlation(mtcars, partial = TRUE)
res[res$Parameter1=="mpg",]
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ---------------------------------------------------------------------------------------
#> mpg | cyl | -0.02 | -0.13 | 30 | 1.000 | [-0.37, 0.33] | Pearson | 32
#> mpg | disp | 0.16 | 0.89 | 30 | 1.000 | [-0.20, 0.48] | Pearson | 32
#> mpg | hp | -0.21 | -1.18 | 30 | 1.000 | [-0.52, 0.15] | Pearson | 32
#> mpg | drat | 0.10 | 0.58 | 30 | 1.000 | [-0.25, 0.44] | Pearson | 32
#> mpg | wt | -0.39 | -2.34 | 30 | 1.000 | [-0.65, -0.05] | Pearson | 32
#> mpg | qsec | 0.24 | 1.34 | 30 | 1.000 | [-0.12, 0.54] | Pearson | 32
#> mpg | vs | 0.03 | 0.18 | 30 | 1.000 | [-0.32, 0.38] | Pearson | 32
#> mpg | am | 0.26 | 1.46 | 30 | 1.000 | [-0.10, 0.56] | Pearson | 32
#> mpg | gear | 0.10 | 0.52 | 30 | 1.000 | [-0.26, 0.43] | Pearson | 32
#> mpg | carb | -0.05 | -0.29 | 30 | 1.000 | [-0.39, 0.30] | Pearson | 32
res <- ppcor::pcor(mtcars)
data.frame(r = res$estimate[-1,1],
t = res$statistic[-1,1],
p = res$p.value[-1,1])
#> r t p
#> cyl -0.02326429 -0.1066392 0.91608738
#> disp 0.16083460 0.7467585 0.46348865
#> hp -0.21052027 -0.9868407 0.33495531
#> drat 0.10445452 0.4813036 0.63527790
#> wt -0.39344938 -1.9611887 0.06325215
#> qsec 0.23809863 1.1234133 0.27394127
#> vs 0.03293117 0.1509915 0.88142347
#> am 0.25832849 1.2254035 0.23398971
#> gear 0.09534261 0.4389142 0.66520643
#> carb -0.05243662 -0.2406258 0.81217871
Created on 2020-04-06 by the reprex package (v0.3.0)
The resulting partial correlations are identical, but the t values are not (and by extension so are the CIs, and the unadjusted p values). Why?
Because correlation()
computes partial correlations by residualizing variables, and then computing the correlations between them. But the df
of the residualizing process - that is, the degree of uncertainty in estimating the residuals - is not accounted for. (Note that this should be true for Bayesian partial correlations as well - the priors and likelihood of the residualizing process are not accounted for).
Solutions:
cor_test(iris, "Sepal.Length", "Sepal.Width", method = "spearman")
Error in max(unlist(lapply(stats::na.omit(round(CI_low, digits)), function(.i) nchar(as.character(.i))))) :
(converted from warning) no non-missing arguments to max; returning -Inf
8.
doWithOneRestart(return(expr), restart)
7.
withOneRestart(expr, restarts[[1L]])
6.
withRestarts({
.Internal(.signalCondition(simpleWarning(msg, call), msg,
call))
.Internal(.dfltWarn(msg, call)) ...
5.
.signalSimpleWarning("no non-missing arguments to max; returning -Inf",
base::quote(max(unlist(lapply(stats::na.omit(round(CI_low,
digits)), function(.i) nchar(as.character(.i))))))) at format_ci.R#28
4.
insight::format_ci(x[[ci_low[i]]], x[[ci_high[i]]], ci = NULL,
digits = ci_digits, width = "auto", brackets = TRUE) at parameters_table.R#73
3.
parameters_table(x, pretty_names = pretty_names, ...) at print.parameters_model.R#72
2.
print.parameters_model(x)
1.
(function (x, ...)
UseMethod("print"))(x)
@strengejacke do you think it's best to address it in insight or parameters?
correlation(iris[c("Species", "Petal.Width", "Petal.Length")], include_factors = TRUE, method = "auto")
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
I would like report to create traditional correlation matrices from the data provided by the new correlation package, which is in a long format.
For square matrices (i.e., all variables correlated with all variables), something like this could be a first step:
model <- correlation::correlation(iris)
cells <- model$r
m <- matrix(cells, nrow = as.integer(sqrt(length(cells))), ncol=as.integer(sqrt(length(cells))), byrow = TRUE)
However, colnames and rownames still need to be named appropriately. Moreover, this wouldn't work in the case of uneven matrices, such as:
model <- correlation::correlation(
select(iris, Sepal.Length),
select(iris, starts_with("Petal"))
)
@strengejacke do you have by any chance any intuition?
library(correlation)
library(dplyr)
library(see)cor <- correlation(iris)
cor %>%
"wuut, yet another useless brick?"
@strengejacke don't worry this is a very small package with a very narrow focus, pretty much feature-complete, that just implements correlations (further to be displayed in nice tables through report) ๐
I woke with this thing in mind, so I put it here for future reference.
I am currently working on some survey data, with a factor analysis part and exploring some psychometric networks.
Both of these are based more or less based on some kind of correlation matrices. However, Factor analysis requires, to my knowledge (?), a "regular" correlation matrix whereas the second (see all the work by @SachaEpskamp) is often based on partial correlations (or regularized partial correlations obtained for example via LASSO reg).
Here's the thing. My data contain some factors, or grouping structure, which I'd like somehow to adjust the correlations for. However, to my knowledge, there is no package or function that gives "random-effects (partial) correlation matrices".
Our recent discussion on effectsize made salient the fact that you can extract partial correlations in a quite straightforward way from linear regression models. HENCE, I wonder if it would be possible to apply the same to linear mixed models to extract partial correlations "adjusted" for random effects?
Also, I wonder if there's a way to recover the full correlation matrix from such mixed model, which could be in turn useful for EFA.
I think it should be made more explicit (especially for the Bayesian methods) how Partial and Rank correlation are actually obtained.
X<->Z
or Z<->Y
), but Z
is partial-ed out of x & y using OLS, and then these residual-scores are tested for a Pearson correlation.library(correlation)
d <- data.frame(
x = as.ordered(sample(1:5, 20, TRUE)),
y = as.ordered(sample(letters[1:5], 20, TRUE))
)
correlation(d, method = "polychoric")
#> Error in .cor_test_polychoric(data, x, y, ci = ci, ...): Polychoric correlations can only be ran on ordinal factors.
Created on 2019-10-23 by the reprex package (v0.3.0)
I just randomly sampled a few methods here but the point I am trying to make is that it will be nice to be consistent across different methods in terms of what the output looks like.
library(tidyverse)
library(correlation)
# select only numeric varibles
df <- purrr::keep(ggplot2::msleep, is_bare_numeric)
# pearson (95% CI? : Yes)
correlation::correlation(df)
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method
#> -------------------------------------------------------------------------------------
#> sleep_total | sleep_rem | 0.75 | 8.76 | 59 | < .001 | [ 0.62, 0.84] | Pearson
#> sleep_total | sleep_cycle | -0.47 | -2.95 | 30 | 0.049 | [-0.71, -0.15] | Pearson
#> sleep_total | awake | -1.00 | -5328.71 | 81 | < .001 | [-1.00, -1.00] | Pearson
#> sleep_total | brainwt | -0.36 | -2.84 | 54 | 0.049 | [-0.57, -0.11] | Pearson
#> sleep_total | bodywt | -0.31 | -2.96 | 81 | 0.041 | [-0.49, -0.10] | Pearson
#> sleep_rem | sleep_cycle | -0.34 | -1.97 | 30 | 0.117 | [-0.61, 0.01] | Pearson
#> sleep_rem | awake | -0.75 | -8.76 | 59 | < .001 | [-0.84, -0.62] | Pearson
#> sleep_rem | brainwt | -0.22 | -1.54 | 46 | 0.131 | [-0.48, 0.07] | Pearson
#> sleep_rem | bodywt | -0.33 | -2.66 | 59 | 0.049 | [-0.54, -0.08] | Pearson
#> sleep_cycle | awake | 0.47 | 2.95 | 30 | 0.049 | [ 0.15, 0.71] | Pearson
#> sleep_cycle | brainwt | 0.85 | 8.60 | 28 | < .001 | [ 0.71, 0.93] | Pearson
#> sleep_cycle | bodywt | 0.42 | 2.52 | 30 | 0.052 | [ 0.08, 0.67] | Pearson
#> awake | brainwt | 0.36 | 2.84 | 54 | 0.049 | [ 0.11, 0.57] | Pearson
#> awake | bodywt | 0.31 | 2.96 | 81 | 0.041 | [ 0.10, 0.49] | Pearson
#> brainwt | bodywt | 0.93 | 19.18 | 54 | < .001 | [ 0.89, 0.96] | Pearson
# percentage bend (95% CI? : Yes)
correlation::correlation(df, method = "percentage")
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method
#> ---------------------------------------------------------------------------------------------
#> sleep_total | sleep_rem | 0.75 | 8.79 | 59 | < .001 | [ 0.62, 0.84] | Percentage_Bend
#> sleep_total | sleep_cycle | -0.51 | -3.22 | 30 | 0.012 | [-0.73, -0.19] | Percentage_Bend
#> sleep_total | awake | -1.00 | -6525.30 | 81 | < .001 | [-1.00, -1.00] | Percentage_Bend
#> sleep_total | brainwt | -0.59 | -5.36 | 54 | < .001 | [-0.74, -0.39] | Percentage_Bend
#> sleep_total | bodywt | -0.51 | -5.31 | 81 | < .001 | [-0.65, -0.33] | Percentage_Bend
#> sleep_rem | sleep_cycle | -0.40 | -2.37 | 30 | 0.025 | [-0.65, -0.06] | Percentage_Bend
#> sleep_rem | awake | -0.75 | -8.79 | 59 | < .001 | [-0.84, -0.62] | Percentage_Bend
#> sleep_rem | brainwt | -0.41 | -3.04 | 46 | 0.012 | [-0.62, -0.14] | Percentage_Bend
#> sleep_rem | bodywt | -0.40 | -3.38 | 59 | 0.006 | [-0.59, -0.17] | Percentage_Bend
#> sleep_cycle | awake | 0.51 | 3.22 | 30 | 0.012 | [ 0.19, 0.73] | Percentage_Bend
#> sleep_cycle | brainwt | 0.89 | 10.45 | 28 | < .001 | [ 0.78, 0.95] | Percentage_Bend
#> sleep_cycle | bodywt | 0.77 | 6.62 | 30 | < .001 | [ 0.58, 0.88] | Percentage_Bend
#> awake | brainwt | 0.59 | 5.36 | 54 | < .001 | [ 0.39, 0.74] | Percentage_Bend
#> awake | bodywt | 0.51 | 5.31 | 81 | < .001 | [ 0.33, 0.65] | Percentage_Bend
#> brainwt | bodywt | 0.92 | 16.71 | 54 | < .001 | [ 0.86, 0.95] | Percentage_Bend
# spearman (95% CI? : No)
correlation::correlation(df, method = "spearman")
#> Parameter1 | Parameter2 | rho | S | p | Method
#> -------------------------------------------------------------------
#> sleep_total | sleep_rem | 0.76 | 8920.08 | < .001 | Spearman
#> sleep_total | sleep_cycle | -0.49 | 8122.87 | 0.014 | Spearman
#> sleep_total | awake | -1.00 | 1.90568e+05 | < .001 | Spearman
#> sleep_total | brainwt | -0.59 | 46627.12 | < .001 | Spearman
#> sleep_total | bodywt | -0.53 | 1.46223e+05 | < .001 | Spearman
#> sleep_rem | sleep_cycle | -0.33 | 7280.52 | 0.061 | Spearman
#> sleep_rem | awake | -0.76 | 66719.92 | < .001 | Spearman
#> sleep_rem | brainwt | -0.41 | 26049.73 | 0.014 | Spearman
#> sleep_rem | bodywt | -0.45 | 54903.63 | 0.001 | Spearman
#> sleep_cycle | awake | 0.49 | 2789.13 | 0.014 | Spearman
#> sleep_cycle | brainwt | 0.87 | 572.26 | < .001 | Spearman
#> sleep_cycle | bodywt | 0.85 | 837.92 | < .001 | Spearman
#> awake | brainwt | 0.59 | 11892.88 | < .001 | Spearman
#> awake | bodywt | 0.53 | 44345.02 | < .001 | Spearman
#> brainwt | bodywt | 0.96 | 1253.56 | < .001 | Spearman
# kendall (95% CI? : No)
correlation::correlation(df, method = "kendall")
#> Parameter1 | Parameter2 | tau | z | p | Method
#> -------------------------------------------------------------
#> sleep_total | sleep_rem | 0.59 | 6.64 | < .001 | Kendall
#> sleep_total | sleep_cycle | -0.35 | -2.75 | 0.024 | Kendall
#> sleep_total | awake | -1.00 | -13.30 | < .001 | Kendall
#> sleep_total | brainwt | -0.43 | -4.65 | < .001 | Kendall
#> sleep_total | bodywt | -0.39 | -5.14 | < .001 | Kendall
#> sleep_rem | sleep_cycle | -0.21 | -1.63 | 0.104 | Kendall
#> sleep_rem | awake | -0.59 | -6.64 | < .001 | Kendall
#> sleep_rem | brainwt | -0.26 | -2.61 | 0.024 | Kendall
#> sleep_rem | bodywt | -0.32 | -3.56 | 0.002 | Kendall
#> sleep_cycle | awake | 0.35 | 2.75 | 0.024 | Kendall
#> sleep_cycle | brainwt | 0.71 | 5.47 | < .001 | Kendall
#> sleep_cycle | bodywt | 0.65 | 5.21 | < .001 | Kendall
#> awake | brainwt | 0.43 | 4.65 | < .001 | Kendall
#> awake | bodywt | 0.39 | 5.14 | < .001 | Kendall
#> brainwt | bodywt | 0.84 | 9.11 | < .001 | Kendall
Created on 2020-02-10 by the reprex package (v0.3.0)
devtools::session_info()
#> โ Session info โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2020-02-10
#>
#> โ Packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> bayestestR 0.5.1 2020-01-27 [1] CRAN (R 3.6.2)
#> broom 0.5.4 2020-01-27 [1] CRAN (R 3.6.2)
#> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.2)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
#> correlation * 0.1.0 2020-02-10 [1] Github (easystats/correlation@b80559a)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
#> dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0)
#> dplyr * 0.8.4 2020-01-31 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
#> insight 0.8.1 2020-02-02 [1] CRAN (R 3.6.2)
#> jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.2)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> mnormt 1.5-6 2020-02-03 [1] CRAN (R 3.6.0)
#> modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
#> nlme 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
#> parameters 0.5.0 2020-02-09 [1] CRAN (R 3.6.2)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> psych * 1.9.12.31 2020-01-08 [1] CRAN (R 3.6.2)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.6.2)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.2)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> vctrs 0.2.2 2020-01-24 [1] CRAN (R 3.6.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
#>
#> [1] /Users/patil/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
correlation/vignettes/types.Rmd
Line 71 in 99ef225
I think it will be nice if the output also contains n
column that tracks the number of observations for each correlation test. This might seem redundant for datasets without any NA
s, but will be a very handy feature to have when there is missing data. For example-
library(tidyverse)
library(psych)
library(correlation)
# select only numeric varibles
df <- purrr::keep(ggplot2::msleep, is_bare_numeric)
# using `psych`
corr_obj <- psych::corr.test(df, method = "spearman")
# looking at sample sizes
corr_obj$n
#> sleep_total sleep_rem sleep_cycle awake brainwt bodywt
#> sleep_total 83 61 32 83 56 83
#> sleep_rem 61 61 32 61 48 61
#> sleep_cycle 32 32 32 32 30 32
#> awake 83 61 32 83 56 83
#> brainwt 56 48 30 56 56 56
#> bodywt 83 61 32 83 56 83
# correlation output (no info about sample sizes)
correlation::correlation(df, method = "spearman")
#> Parameter1 | Parameter2 | rho | S | p | Method
#> -------------------------------------------------------------------
#> sleep_total | sleep_rem | 0.76 | 8920.08 | < .001 | Spearman
#> sleep_total | sleep_cycle | -0.49 | 8122.87 | 0.014 | Spearman
#> sleep_total | awake | -1.00 | 1.90568e+05 | < .001 | Spearman
#> sleep_total | brainwt | -0.59 | 46627.12 | < .001 | Spearman
#> sleep_total | bodywt | -0.53 | 1.46223e+05 | < .001 | Spearman
#> sleep_rem | sleep_cycle | -0.33 | 7280.52 | 0.061 | Spearman
#> sleep_rem | awake | -0.76 | 66719.92 | < .001 | Spearman
#> sleep_rem | brainwt | -0.41 | 26049.73 | 0.014 | Spearman
#> sleep_rem | bodywt | -0.45 | 54903.63 | 0.001 | Spearman
#> sleep_cycle | awake | 0.49 | 2789.13 | 0.014 | Spearman
#> sleep_cycle | brainwt | 0.87 | 572.26 | < .001 | Spearman
#> sleep_cycle | bodywt | 0.85 | 837.92 | < .001 | Spearman
#> awake | brainwt | 0.59 | 11892.88 | < .001 | Spearman
#> awake | bodywt | 0.53 | 44345.02 | < .001 | Spearman
#> brainwt | bodywt | 0.96 | 1253.56 | < .001 | Spearman
Created on 2020-02-10 by the reprex package (v0.3.0)
devtools::session_info()
#> โ Session info โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2020-02-10
#>
#> โ Packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> bayestestR 0.5.1 2020-01-27 [1] CRAN (R 3.6.2)
#> broom 0.5.4 2020-01-27 [1] CRAN (R 3.6.2)
#> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.2)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
#> correlation * 0.1.0 2020-02-10 [1] Github (easystats/correlation@b80559a)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
#> dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0)
#> dplyr * 0.8.4 2020-01-31 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
#> insight 0.8.1 2020-02-02 [1] CRAN (R 3.6.2)
#> jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.2)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> mnormt 1.5-6 2020-02-03 [1] CRAN (R 3.6.0)
#> modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
#> nlme 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
#> parameters 0.5.0 2020-02-09 [1] CRAN (R 3.6.2)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> psych * 1.9.12.31 2020-01-08 [1] CRAN (R 3.6.2)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.6.2)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.2)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> vctrs 0.2.2 2020-01-24 [1] CRAN (R 3.6.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
#>
#> [1] /Users/patil/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Would be nice to re-implement these methods to avoid depending on psych
and polycor
and also to have a more flexible implementation to someday try implementing a Bayesian version.
This is in a way related to #12. If I wanted to create a correlation matrix visualization with ggcorrplot
, I need both a matrix of correlations and p-values but as.matrix
doesn't seem to work with the latter:
library(tidyverse)
library(correlation)
# formatting to respect current `ggcorrmat` defaults
df <-
correlation::correlation(
data = ggplot2::msleep,
ci = "default",
method = "pearson"
)
# create a matrix of correlation values
df %>%
select(Parameter1, Parameter2, r) %>%
as.matrix()
#> sleep_total sleep_rem sleep_cycle awake brainwt bodywt
#> sleep_total 1.0000000 0.7517550 -0.4737127 -0.9999986 -0.3604874 -0.3120106
#> sleep_rem 0.7517550 1.0000000 -0.3381235 -0.7517713 -0.2213348 -0.3276507
#> sleep_cycle -0.4737127 -0.3381235 1.0000000 0.4737127 0.8516203 0.4178029
#> awake -0.9999986 -0.7517713 0.4737127 1.0000000 0.3604874 0.3119801
#> brainwt -0.3604874 -0.2213348 0.8516203 0.3604874 1.0000000 0.9337822
#> bodywt -0.3120106 -0.3276507 0.4178029 0.3119801 0.9337822 1.0000000
# create a matrix of p-values
df %>%
select(Parameter1, Parameter2, p) %>%
as.matrix()
#> Error in frame[row, col] <- object[(object$Parameter1 == row & object$Parameter2 == : number of items to replace is not a multiple of replacement length
Created on 2020-02-27 by the reprex package (v0.3.0)
devtools::session_info()
#> โ Session info โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2020-02-27
#>
#> โ Packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> bayestestR 0.5.2 2020-02-13 [1] Github (easystats/bayestestR@4350b4f)
#> broom 0.5.3.9000 2020-02-20 [1] Github (tidymodels/broom@3c922d5)
#> callr 3.4.2 2020-02-12 [1] CRAN (R 3.6.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.2)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
#> correlation * 0.1.0 2020-02-27 [1] Github (easystats/correlation@f0ec824)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
#> dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.2 2020-02-17 [1] CRAN (R 3.6.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
#> dplyr * 0.8.4 2020-01-31 [1] CRAN (R 3.6.0)
#> effectsize 0.2.0 2020-02-25 [1] CRAN (R 3.6.2)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
#> insight 0.8.1.1 2020-02-20 [1] Github (easystats/insight@ff0c9a2)
#> jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.2)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
#> parameters 0.5.0.1 2020-02-20 [1] Github (easystats/parameters@f62f3ea)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.2)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.0)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.2)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.0)
#> usethis 1.5.1.9000 2020-02-18 [1] Github (r-lib/usethis@2a3d134)
#> vctrs 0.2.3 2020-02-20 [1] CRAN (R 3.6.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
#>
#> [1] /Users/patil/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Similar to cor_to_pcor, just need to pass the cov matrix.
Pearson's correlation coefficient estimate is different depending on whether bayesian
is set to TRUE
or FALSE
:
library(correlation)
set.seed(123)
tibble::as_tibble(correlation(iris, method = "pearson"))
#> # A tibble: 6 x 10
#> Parameter1 Parameter2 r t df p CI_low CI_high Method n_Obs
#> <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr> <int>
#> 1 Sepal.Leng~ Sepal.Wid~ -0.118 -1.44 148 1.52e- 1 -0.273 0.0435 Pears~ 150
#> 2 Sepal.Leng~ Petal.Len~ 0.872 21.6 148 5.19e-47 0.827 0.906 Pears~ 150
#> 3 Sepal.Leng~ Petal.Wid~ 0.818 17.3 148 9.30e-37 0.757 0.865 Pears~ 150
#> 4 Sepal.Width Petal.Len~ -0.428 -5.77 148 1.35e- 7 -0.551 -0.288 Pears~ 150
#> 5 Sepal.Width Petal.Wid~ -0.366 -4.79 148 8.15e- 6 -0.497 -0.219 Pears~ 150
#> 6 Petal.Leng~ Petal.Wid~ 0.963 43.4 148 2.81e-85 0.949 0.973 Pears~ 150
set.seed(123)
tibble::as_tibble(correlation(iris, method = "pearson", bayesian = TRUE))
#> Loading required namespace: BayesFactor
#> # A tibble: 6 x 12
#> Parameter1 Parameter2 rho CI_low CI_high pd ROPE_Percentage BF
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sepal.Len~ Sepal.Wid~ -0.114 -0.236 0.0189 0.924 0.443 5.09e- 1
#> 2 Sepal.Len~ Petal.Len~ 0.863 0.827 0.895 1 0 2.14e+43
#> 3 Sepal.Len~ Petal.Wid~ 0.806 0.759 0.850 1 0 2.62e+33
#> 4 Sepal.Wid~ Petal.Len~ -0.415 -0.517 -0.306 1 0 3.49e+ 5
#> 5 Sepal.Wid~ Petal.Wid~ -0.349 -0.462 -0.240 1 0 5.29e+ 3
#> 6 Petal.Len~ Petal.Wid~ 0.959 0.949 0.969 1 0 1.24e+80
#> # ... with 4 more variables: Prior_Distribution <chr>, Prior_Location <dbl>,
#> # Prior_Scale <dbl>, n_Obs <int>
Created on 2020-03-19 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R Under development (unstable) (2020-02-28 r77874)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz Europe/Berlin
#> date 2020-03-19
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib
#> assertthat 0.2.1 2019-03-21 [1]
#> backports 1.1.5 2019-10-02 [1]
#> BayesFactor 0.9.12-4.2 2018-05-19 [1]
#> bayestestR 0.5.2.1 2020-03-16 [1]
#> callr 3.4.2 2020-02-12 [1]
#> cli 2.0.2 2020-02-28 [1]
#> coda 0.19-3 2019-07-05 [1]
#> correlation * 0.1.0 2020-03-17 [1]
#> crayon 1.3.4 2017-09-16 [1]
#> desc 1.2.0 2018-05-01 [1]
#> devtools 2.2.2 2020-02-17 [1]
#> digest 0.6.25 2020-02-23 [1]
#> effectsize 0.2.0.1 2020-03-06 [1]
#> ellipsis 0.3.0 2019-09-20 [1]
#> evaluate 0.14 2019-05-28 [1]
#> fansi 0.4.1 2020-01-08 [1]
#> fs 1.3.2 2020-03-05 [1]
#> glue 1.3.2 2020-03-12 [1]
#> gtools 3.8.1 2018-06-26 [1]
#> highr 0.8 2019-03-20 [1]
#> htmltools 0.4.0 2019-10-04 [1]
#> insight 0.8.2.1 2020-03-16 [1]
#> knitr 1.28 2020-02-06 [1]
#> lattice 0.20-40 2020-02-19 [2]
#> magrittr 1.5 2014-11-22 [1]
#> Matrix 1.2-18 2019-11-27 [2]
#> MatrixModels 0.4-1 2015-08-22 [1]
#> memoise 1.1.0 2017-04-21 [1]
#> mvtnorm 1.1-0 2020-02-24 [1]
#> parameters 0.6.0 2020-03-12 [1]
#> pbapply 1.4-2 2019-08-31 [1]
#> pillar 1.4.3 2019-12-20 [1]
#> pkgbuild 1.0.6 2019-10-09 [1]
#> pkgconfig 2.0.3 2019-09-22 [1]
#> pkgload 1.0.2 2018-10-29 [1]
#> prettyunits 1.1.1 2020-01-24 [1]
#> processx 3.4.2 2020-02-09 [1]
#> ps 1.3.2 2020-02-13 [1]
#> R6 2.4.1 2019-11-12 [1]
#> Rcpp 1.0.4 2020-03-17 [1]
#> remotes 2.1.1 2020-02-15 [1]
#> rlang 0.4.5 2020-03-01 [1]
#> rmarkdown 2.1 2020-01-20 [1]
#> rprojroot 1.3-2 2018-01-03 [1]
#> sessioninfo 1.1.1 2018-11-05 [1]
#> stringi 1.4.6 2020-02-17 [1]
#> stringr 1.4.0 2019-02-10 [1]
#> testthat 2.3.2 2020-03-02 [1]
#> tibble 2.1.3 2019-06-06 [1]
#> usethis 1.5.1.9000 2020-03-18 [1]
#> utf8 1.1.4 2018-05-24 [1]
#> vctrs 0.2.4 2020-03-10 [1]
#> withr 2.1.2 2018-03-15 [1]
#> xfun 0.12 2020-01-13 [1]
#> yaml 2.2.1 2020-02-01 [1]
#> source
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/bayestestR@6ee7e37)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/correlation@c1c35b0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/effectsize@64bfbc3)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (easystats/insight@e0b229b)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/usethis@8c32c73)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#>
#> [1] C:/Users/inp099/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-devel/library
This is unexpected. The estimate for the association shouldn't change even if the hypothesis testing framework changes.
For example, here is what JASP produces for the same analyses (the estimates are identical):
Am I missing something?
Hi - I noticed what looks like a potential bug in the multilevel correlation function. Specifically, when there are rows with a value for the factor, but neither of the variables to be correlated, these rows still seem to be counted in the degrees of freedom. This also means that dropping (or arbitrarily adding) empty rows changes the inferential statistics.
MWE:
library(tidyverse)
library(correlation)
# Full data frame
df1 <- data.frame("id" = factor(rep(letters[1:10], each = 10)),
"V1" = rnorm(100, 0, 1),
"V2" = rnorm(100, 0, 1))
correlation(df1, multilevel = TRUE)
# Introduce missingness
df2 <- df1
df2[sample(1:100, 10), c("V1","V2")] <- NA
correlation(df2, multilevel = TRUE)
# Drop rows with missingness
df3 <- df2 %>%
drop_na()
correlation(df3, multilevel = TRUE)
This seem problematic to me:
library(correlation)
cor <- correlation(iris)
class(as.table(cor))
#> [1] "easycormatrix" "data.frame"
(and not a table
.)
I suggest changing as.matrix.easycorrelation
in two ways:
redundant
argument (default FALSE
?).print.easycormatrix
to give the same printing as is currently given?
stars
that when true returns a character matrix with the stars baked in?It will be nice to have two columns here: p
and p.adjusted
, which would be identical only in case the p_adjust = "none"
. Maybe another column called p_value_adjustment
containing details of the adjustment method will also be helpful (e.g., I do this for pairwise comparisons: https://indrajeetpatil.github.io/pairwiseComparisons/reference/pairwise_comparisons.html#examples)
Here initially I couldn't tell if the p-values were adjusted or not because they are so small, but having these new columns might avoid such confusion.
library(correlation)
correlation(iris, p_adjust = "none")
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ---------------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | -0.12 | -1.44 | 148 | 0.152 | [-0.27, 0.04] | Pearson | 150
#> Sepal.Length | Petal.Length | 0.87 | 21.65 | 148 | < .001 | [ 0.83, 0.91] | Pearson | 150
#> Sepal.Length | Petal.Width | 0.82 | 17.30 | 148 | < .001 | [ 0.76, 0.86] | Pearson | 150
#> Sepal.Width | Petal.Length | -0.43 | -5.77 | 148 | < .001 | [-0.55, -0.29] | Pearson | 150
#> Sepal.Width | Petal.Width | -0.37 | -4.79 | 148 | < .001 | [-0.50, -0.22] | Pearson | 150
#> Petal.Length | Petal.Width | 0.96 | 43.39 | 148 | < .001 | [ 0.95, 0.97] | Pearson | 150
correlation(iris, p_adjust = "holm")
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ---------------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | -0.12 | -1.44 | 148 | 0.152 | [-0.27, 0.04] | Pearson | 150
#> Sepal.Length | Petal.Length | 0.87 | 21.65 | 148 | < .001 | [ 0.83, 0.91] | Pearson | 150
#> Sepal.Length | Petal.Width | 0.82 | 17.30 | 148 | < .001 | [ 0.76, 0.86] | Pearson | 150
#> Sepal.Width | Petal.Length | -0.43 | -5.77 | 148 | < .001 | [-0.55, -0.29] | Pearson | 150
#> Sepal.Width | Petal.Width | -0.37 | -4.79 | 148 | < .001 | [-0.50, -0.22] | Pearson | 150
#> Petal.Length | Petal.Width | 0.96 | 43.39 | 148 | < .001 | [ 0.95, 0.97] | Pearson | 150
correlation(iris, p_adjust = "BH")
#> Parameter1 | Parameter2 | r | t | df | p | 95% CI | Method | n_Obs
#> ---------------------------------------------------------------------------------------------
#> Sepal.Length | Sepal.Width | -0.12 | -1.44 | 148 | 0.152 | [-0.27, 0.04] | Pearson | 150
#> Sepal.Length | Petal.Length | 0.87 | 21.65 | 148 | < .001 | [ 0.83, 0.91] | Pearson | 150
#> Sepal.Length | Petal.Width | 0.82 | 17.30 | 148 | < .001 | [ 0.76, 0.86] | Pearson | 150
#> Sepal.Width | Petal.Length | -0.43 | -5.77 | 148 | < .001 | [-0.55, -0.29] | Pearson | 150
#> Sepal.Width | Petal.Width | -0.37 | -4.79 | 148 | < .001 | [-0.50, -0.22] | Pearson | 150
#> Petal.Length | Petal.Width | 0.96 | 43.39 | 148 | < .001 | [ 0.95, 0.97] | Pearson | 150
Created on 2020-03-19 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R Under development (unstable) (2020-02-28 r77874)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz Europe/Berlin
#> date 2020-03-19
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 4.0.0)
#> bayestestR 0.5.2.1 2020-03-16 [1] Github (easystats/bayestestR@6ee7e37)
#> callr 3.4.2 2020-02-12 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> correlation * 0.1.0 2020-03-17 [1] Github (easystats/correlation@c1c35b0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.2.2 2020-02-17 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> effectsize 0.2.0.1 2020-03-06 [1] Github (easystats/effectsize@64bfbc3)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.3.2 2020-03-05 [1] CRAN (R 4.0.0)
#> glue 1.3.2 2020-03-12 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> insight 0.8.2.1 2020-03-16 [1] Github (easystats/insight@e0b229b)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> parameters 0.6.0 2020-03-12 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4 2020-03-17 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.5 2020-03-01 [1] CRAN (R 4.0.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> usethis 1.5.1.9000 2020-03-18 [1] Github (r-lib/usethis@8c32c73)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 4.0.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/inp099/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-devel/library
check similarity with rmcorr.
In the paper about Shepherd's pi correlation (#15), they say:
The Mahalanobis distance (in squared units) measures the distance in multivariate space taking into account the covariance structure of the data. Because a few extreme outliers can skew the covariance estimate, Dm is not robust. We therefore bootstrap the Mahalanobis distance by resampling n observations with replacement (i.e., allowing duplicates) and then calculating the Mahalanobis distance for each actual observation from the bivariate mean of the resampled data. The bootstrapped Mahalanobis distance, Ds, for each observation is the mean across the distances from all resamples.
I tried to add a bootstrapped mahalanobis, but something's wrong (the indices are roughly the same for all observations):
Does anyone have an idea? I added the basis of the function (currently on dev)
.distance_mahalanobis <- function(data, indices = 1:nrow(data), ...) {
dat <- data[indices, ] # allows boot to select sample
row.names(dat) <- NULL
stats::mahalanobis(dat, center = colMeans(dat), cov = stats::cov(dat))
}
rez <- boot::boot(data = mtcars, statistic = .distance_mahalanobis, R = 1000, sim="permutation")
bayestestR::point_estimate(as.data.frame(rez$t), centrality="all")
#> # Point Estimates
#>
#> Parameter Median Mean MAP
#> V1 9.5 11 9.0
#> V2 9.9 11 9.0
#> V3 9.5 11 9.0
#> V4 9.5 11 9.0
#> V5 9.9 11 9.0
#> V6 9.5 11 9.0
#> V7 9.9 11 9.0
#> V8 9.9 11 9.0
#> V9 9.5 11 9.0
#> V10 9.5 10 9.0
#> V11 9.5 11 9.0
#> V12 9.7 11 9.0
#> V13 9.9 11 9.0
#> V14 9.9 11 9.0
#> V15 9.9 10 9.0
#> V16 9.5 10 9.0
#> V17 9.9 11 9.0
#> V18 9.5 11 9.0
#> V19 9.9 11 9.0
#> V20 9.9 11 9.0
#> V21 9.5 11 9.0
#> V22 9.5 11 8.9
#> V23 9.9 11 9.0
#> V24 9.9 11 8.9
#> V25 9.9 11 9.0
#> V26 9.5 11 9.0
#> V27 9.5 11 9.0
#> V28 9.5 10 9.0
#> V29 9.5 11 9.0
#> V30 9.5 11 9.0
#> V31 9.5 10 9.0
#> V32 9.9 11 9.0
Created on 2019-12-09 by the reprex package (v0.3.0)
as shown by @lindeloev, Spearman correlations can be estimated using a rank transformation. Hence, it might be interesting to develop a more flexible framework for robust spearman-like correlations now that we have ranktransform()
in effectsize, to make it work with all the correlation types, Bayesian, partial etc.
Following @profandyfield remarks, there seems to be a small discrepancy:
exam_tib <- readr::read_csv("http://discoveringstatistics.com/repository/dsr2/exam_anxiety.csv")
#> Parsed with column specification:
#> cols(
#> id = col_double(),
#> revise = col_double(),
#> exam_grade = col_double(),
#> anxiety = col_double(),
#> sex = col_character()
#> )
data <- exam_tib[c("exam_grade", "revise", "anxiety")]
# With rounding
correlation::correlation(data, partial = TRUE)[c("r", "p", "t", "n_Obs", "Method")]
#> r | p | t | n_Obs | Method
#> ----------------------------------------
#> 0.13 | 0.182 | 1.35 | 103 | Pearson
#> -0.25 | 0.024 | -2.56 | 103 | Pearson
#> -0.65 | < .001 | -8.56 | 103 | Pearson
# Without
as.data.frame(correlation::correlation(data, partial = TRUE))[c("r", "p", "t", "n_Obs", "Method")]
#> r p t n_Obs Method
#> 1 0.1326783 1.815432e-01 1.345293 103 Pearson
#> 2 -0.2466658 2.402532e-02 -2.558002 103 Pearson
#> 3 -0.6485301 3.881594e-13 -8.562455 103 Pearson
ppcor::pcor.test(data$exam_grade, data$revise, data$anxiety)
#> estimate p.value statistic n gp Method
#> 1 0.1326783 0.1837308 1.338617 103 1 pearson
ppcor::pcor.test(data$exam_grade, data$anxiety, data$revise)
#> estimate p.value statistic n gp Method
#> 1 -0.2466658 0.01244581 -2.545307 103 1 pearson
ppcor::pcor.test(data$revise, data$anxiety, data$exam_grade)
#> estimate p.value statistic n gp Method
#> 1 -0.6485301 1.708019e-13 -8.519961 103 1 pearson
Created on 2020-04-15 by the reprex package (v0.3.0)
Just installed the package and trying to replicate the example got an error:
library(correlation)
correlation(iris)
Error in attributes(x)$ci * 100 : non-numeric argument to binary operator
With artificial data got the same error:
df = data.frame(a=rnorm(100,1,1), b=rnorm(100,2,3))
correlation(df)
Error in attributes(x)$ci * 100 : non-numeric argument to binary operator
Running on a clean R 3.6.1 session
x <- as.table(correlation(iris))
x
x[1:3, 1:3]
And this is because subsetting a dataframe removes all attributes for some reason (which I didn't know). I am not sure if they are any easy workarounds.
In genomics, we usually use the biweight correlation from the WCGNA package
More details can be found in
https://en.wikipedia.org/wiki/Biweight_midcorrelation
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465711/
I understand this is outside your field but adding this feature and making it tidy is something that
no one has done before in my knowledge... Perhaps it could be something to look into to push the package towards a larger group of users.
Originally posted by @JauntyJJS in #2 (comment)
@JauntyJJS I've started implementing but struggling a bit to get the formula right.
Assuming that:
set.seed(12345)
var_x <- rnorm(200)
var_y <- 0.5 * var_x + sqrt(1 - 0.5^2) * rnorm(200)
I have:
u <- (var_x - median(var_x)) / 9 * mad(var_x, constant = 1)
v <- (var_y - median(var_y)) / 9 * mad(var_y, constant = 1)
Then:
I_x <- ifelse((1 - abs(u)) > 0, 1, 0)
I_y <- ifelse((1 - abs(v)) > 0, 1, 0)
w_x <- (1 - u^2)^2 * (I_x * (1 - abs(u)))
w_y <- (1 - v^2)^2 * (I_y * (1 - abs(v)))
for
Finally:
denominator_x <- sqrt(sum((var_x - median(var_x)) * w_x)^2)
x_curly <- ((var_x - median(var_x)) * w_x) / denominator_x
denominator_y <- sqrt(sum((var_y - median(var_y)) * w_y)^2)
y_curly <- ((var_y - median(var_y)) * w_y) / denominator_y
r <- sum(x_curly * y_curly)
For
However, it seems that something is wrong ๐, because the biweight correlation should be 0.5584808
and I have 7.70 ๐ฌ
@mattansb @IndrajeetPatil @pdwaggoner @lindeloev and people who like equations ^^
When data2
is a grouped_df
object, correlation()
doesn't seem right:
library(correlation)
df1 <- data.frame(x = rnorm(30),
y = rnorm(30),
g = rep_len(LETTERS[1:3], 30))
df2 <- data.frame(a = rnorm(30),
b = rnorm(30),
g = rep_len(LETTERS[1:3], 30))
correlation(dplyr::group_by(df1, g),
dplyr::group_by(df2, g))
## Error in `[.data.frame`(out, c("Group", names(out)[names(out) != "Group"])) :
## undefined columns selected
data(iris)
x <- iris[iris$Species != "versicolor", ]
correlation::cor_test(x, "Species", "Sepal.Length", method = "biseral")
#> Error in match.arg(tolower(method), c("pearson", "kendall", "spearman"), : 'arg' should be one of "pearson", "kendall", "spearman"
correlation:::.cor_test_biserial(x, "Species", "Sepal.Length", method = "biseral")
#> Warning in Ops.factor(x, 1): '%%' not meaningful for factors
#> Error in if (all(x%%1 == 0)) {: missing value where TRUE/FALSE needed
Created on 2020-03-24 by the reprex package (v0.3.0)
@strengejacke I think I might try adding a bit more tests, and then go ahead with submission. Although there are potential developments for this package (#2), I think it should be ok for an initial release. What do you think?
I see correlation 0.2.0
on CRAN
, but this is not reflected in the NEWS
documentation:
https://cran.r-project.org/web/packages/correlation/news/news.html
This is another option for implementing a robust correlation coefficient:
https://rdrr.io/cran/WRS2/man/pbcor.html
Code:
https://github.com/cran/WRS2/blob/master/R/pbcor.R
This paper presents some evidence for why percentage bend is more robust than Spearman:
https://www.frontiersin.org/articles/10.3389/fpsyg.2012.00606/full
Originally posted by @IndrajeetPatil in #15 (comment)
From strengejacke/sjPlot#31:
Would be nice if sjt.corr could display different correlation coefficients in the upper and lower triangle of the matrix. I think it is quite common to display Pearson and Spearman correlation coefficients and to do this in one matrix.
lower.tri(x, diag = FALSE) and upper.tri(x, diag = FALSE) might come in handy as mentioned here:
http://www.sthda.com/english/wiki/elegant-correlation-table-using-xtable-r-package
I have no idea of the maths behind these methods, but it would be great to be able to do it using a Bayesian estimation. No idea how hard/easy this would be
Following @profandyfield suggestions, we need to propagate the digits number somewhere until parameters' printing if I'm not mistaken @strengejacke?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.