Comments (49)
Note that emmeans
has equivalence testing baked into the test()
function via the delta
arg, so it should be super easy to implament this in modelbased
.
library(emmeans)
fit <- lm(Sepal.Length ~ Species, iris)
em_ <- emmeans(fit, ~ Species)
pair_test <- pairs(em_)
test(pair_test)
#> contrast estimate SE df t.ratio p.value
#> setosa - versicolor -0.930 0.103 147 -9.033 <.0001
#> setosa - virginica -1.582 0.103 147 -15.366 <.0001
#> versicolor - virginica -0.652 0.103 147 -6.333 <.0001
#>
#> P value adjustment: tukey method for comparing a family of 3 estimates
test(pair_test, delta = 1)
#> contrast estimate SE df t.ratio p.value
#> setosa - versicolor -0.930 0.103 147 -0.680 0.5761
#> setosa - virginica -1.582 0.103 147 5.653 1.0000
#> versicolor - virginica -0.652 0.103 147 -3.380 0.0014
#>
#> P value adjustment: sidak method for 3 tests
#> Statistics are tests of equivalence with a threshold of 1
#> P values are left-tailed
Created on 2020-04-14 by the reprex package (v0.3.0)
from parameters.
Daniellllllll STOP. BEING. BAYESIAN!
Ask what is not rejected!
A + C - Accept
B - Reject"
D - Undecided
from parameters.
I'm drafting a function, cet()
, and reading that paper again. I think it's very similar (if not the same) to Kruschke's equivalence-testing. What I don't exactly understand, is: how is delta calculated in this figure?
https://journals.plos.org/plosone/article/figure?id=10.1371/journal.pone.0195145.g001
The delta in this figure is the rope in equicalence_test()
. The major difference I see is that with a similar approach to equivalence:test()
, we would be more conservative than the suggestions by the CET-paper, if I understand it right. But that's no disadvantage.
My impression is, that the method the authors suggest is a bit complicated and less intuitive. See the third error bar from top in the left panel here, which is inconclusive. The larger (blue) CI crosses the zero, while the smaller CI (organge) does not. However, the 4th errorbar in the group "negative" has the same properties. The only difference is that both upper CIs also lie within the deltas, that's what it apparantly makes it negative instead of inconclusive...
Edit: The rules seems to be:
- inconclusive: If zero is crossed, but at least on of the smaller CI (orange) are outside the ROPE
- negative: Zero is crossed and smaller CI (orange) are inside rope.
- positive: not crossing zero (which means: statistical significance)
We can follow this rule, but at the moment I will write this function in accordance with Kruschke's equi-testing.
from parameters.
I feel like the Conditional ET paradigm is indeed a bit more convoluted than a straight ET,, and after reading the paper I was not sure of the rationale and necessity for this conditional decision rule involving alpha and 2*alpha.
But yes, let's first add "regular" ET for frequentist models, and then we can add the conditional variant.!
from parameters.
Test... Some differences look impressive, but given the small ranges of CI, ratios have larger difference very quickly.
library(bayestestR)
library(parameters)
library(rstanarm)
m1 <- lm(mpg ~ gear + wt + cyl + hp, data = mtcars)
m2 <- stan_glm(mpg ~ gear + wt + cyl + hp, data = mtcars)
equivalence_test(m2, ci = .95)
#> # Test for Practical Equivalence
#>
#> ROPE: [-0.60 0.60]
#>
#> Parameter H0 inside ROPE 95% HDI
#> (Intercept) rejected 0.00 % [24.96 48.78]
#> gear undecided 43.86 % [-1.59 2.32]
#> wt rejected 0.00 % [-4.70 -1.29]
#> cyl undecided 37.38 % [-2.09 0.58]
#> hp accepted 100.00 % [-0.05 0.01]
cet(m1)
#> # Conditional Equivalence Test
#>
#> Region of equivalence: [-0.60 0.60]
#>
#> parameter H0 inside ROPE 95% CI
#> (Intercept) rejected 0.00 % [24.44 48.94]
#> gear undecided 29.37 % [-1.69 2.41]
#> wt rejected 0.00 % [-4.77 -1.28]
#> cyl undecided 42.29 % [-2.17 0.55]
#> hp accepted 100.00 % [-0.05 0.01]
Created on 2019-03-21 by the reprex package (v0.2.1)
from parameters.
But yes, let's first add "regular" ET for frequentist models, and then we can add the conditional variant.!
Yeah, I agree. Yet I think we should probably find another name than cet()
, to avoid confusion? Or is it still ok?
(I can't commit from here, so I will commit later...)
from parameters.
That's interesting! Could the fact that in the Bayesian case in uses hdi and quantile method for the frequentist add to the difference?
For the name, I think it makes sense to keep the equivalence_test()
and rope()
methods and expand them to support freq models. cet
(or equivalence_test(..., conditional=TRUE)
or such) could be reserved for the method described by the paper. What do you think?
from parameters.
Yes, I was also thinking about equivalence_test()
, but hesitating given the fact that arguments were named posterior
. We should rename it into something more general, probably x
? Since we have a numeric-method and a method for models, model
does not quite fit well... hm.
from parameters.
https://journals.sagepub.com/doi/10.1177/1948550617697177
from parameters.
I re-read the paper again. Based on what @DominiqueMakowski wrote in his initial post, and the figure in my answer, it looks like following:
Step 2 (conclusive) seems to be a simple statistically significant result (because CIs are within ROPE, but exclude 0). If the narrow CI is within the ROPE, it's negative, else inconclusive.
I'd say we stick to our rules. I have implemented functions for more frequentist models and for simulated model parameters (https://easystats.github.io/see/articles/parameters.html#equivalence-testing-1), maybe we can do this for bootstrapped parameters as well...
We might take a close look at the Lakens-paper to decide whether we want to investigate this further. I would rather not rely on the emmeans-function for now.
from parameters.
The Lakens paper is not related to regression models (but could maybe interesting for the effectsize package).
from parameters.
@strengejacke Do you mean to use the CIs from effectsize
as criteria?
This would mean to label the output according to some "rules", no? 🤔
Or maybe some in-between - have the CI bounds interpreted with the interpret_*
function?
Sounds like would work well with report (:
from parameters.
I'm re-opening this as I believe the implementation we provide is not the common one (see also easystats/effectsize#70).
Frequentist "logic"
In the frequentist NHST logic:
- If P(data | Hx) < α
- Then reject Hx
Using CIs, this can be reframed as:
- If CI1-α does not contain the value expected by Hx
- Then reject Hx
And so, with frequientist equivalence testing:
If the null was rejected, and the SESOI was not, then we conclude that H0 is rejected.
Or decision matrix then looks like this:
Halt rejected | Not rejected | |
---|---|---|
Hnull rejected | Accept null* | Reject null |
Not rejected | Accept null | Undecided |
* Even though our result is significantly larger than 0, it is also significantly smaller than SESOI. So we accept the null, as it is defined by the ROPE.
The rules
The rules then are:
- If CI does not contain the SESOI (or larger) - accept null.
- Else, if CI does not contain 0 - reject the null.
- Else, inconclusive.
It seems like the CET paper suggests switching the order of 1 and 2, so:
- Else, if CI does not contain 0 - reject the null.
- If CI does not contain the SESOI (or larger) - accept null.
- Else, inconclusive.
(sometimes the CI for testing the SESOI is smaller than that used for testing the 0.)
In either case, the weird results that @strengejacke spotted, are a "feature" of the frequentist framework, not of CET in particular.
Required changes
equivalence_test.lm
should:
- remove the %inROPE, which is a very Bayesian and not appropriate here
- only have the labels, as dictated by the rules above.
from parameters.
ROPE is just a region of practical equivalence, and not specific to Bayesian analysis. Thus, I don't see any problems using the term here.
And looking at the rules from the CET paper (see #12 (comment)), the "positive" results are just results with p < 0.05, while all other are not statistically significant. So there's no difference to classic NHST. From Lakens' paper, it seems to me that there's the same issue, but I'm not sure anyway if these conclusions can be transferred to regression models or not (t-tests and correlation are discussed).
Something is not practically equivalent to zero, if the estimate and the uncertainty in estimation are outside a range that is considered as practically equivalent to zero. That is what equivalence_test()
does, and I think this concept is transferable from Bayesian to frequentist.
I would say we can make this explicit in the docs, but I would not change the function design.
from parameters.
The term ROPE is fine, but the calculation of % in ROPE is not in this instance.
The idea of EQ is transferable to any testing in the frequentist framework.
Something is not practically equivalent to zero, if the estimate and the uncertainty in estimation are outside a range that is considered as practically equivalent to zero. That is what equivalence_test() does
That is not what equivalence testing is in the frequentist framework - in which you cannot conclude anything from lack of significance, only from a significant result. Unlike the Bayesian framework, where you can talk about probabilities, and uncertainty in ranges and estimates...
But in the frequentist framework, all you have are two tests (from the original paper):
- Is |θ| > 0? (Can we reject the null: |θ| = 0)
- Is |θ| < SESOI? (Can we reject the null: |θ| > SESOI)
- If both are significant, than θ is not 0, but it at the very least small (accept).
- If only the first is significant, θ is not 0 (reject).
- If only the first is significant, θ is at the very least small (accept).
- If neither, then the results are inconclusive (undecided).
from parameters.
If only the first is significant, θ is not 0 (reject).
If only the first is significant, θ is at the very least small (accept).
These two conditions are the same.
from parameters.
Sorry, fixed:
- If both are significant, than θ is not 0, but it at the very least small (accept).
- If only the first is significant, θ is not 0 (reject).
- If only the second is significant, θ is at the very least small (accept).
- If neither, then the results are inconclusive (undecided).
I think the point of contention is this:
In Bayes, to reject the null you want to test if |θ| > SESOI (is it outside the rope).
But here the SESOI is not used to reject the null - the point null is used for that. The SESOI is used to accept the null by rejecting the hypothesis that |θ| > SESOI.
Is this dumb? Yes.
But is this what people have been doing? Or more importantly, what users would expect from this function? Yes.
from parameters.
But what you're describing is Lakens' rule, not the one from the other CET paper, right? I don't see these rules apply to the figure from my above comment.
from parameters.
To summarise: if a significant coefficient's confidence intervals includes or is greater than the ROPE, this would be not practically equivalent to null?
from parameters.
From Lakens == Hauck and Anderson - yes.
What the CET paper does is say:
Only if the null that |θ| = 0 is not rejected, than test for equivalence.
So this changes the rules a little:
Halt rejected | Not rejected | |
---|---|---|
Hnull rejected | CET: Reject null / ET: Accept null | Reject null |
Not rejected | Accept null | Undecided |
To summarise: if a significant coefficient's confidence intervals includes or is greater than the ROPE, this would be not practically equivalent to null?
Yes, precisely ^_^
from parameters.
2nd Summary:
According to #12 (comment), relevant parameters are those where the coefficient is a statistically significant coefficient.
This approach just adds an "inconclusive" category.
According to #12 (comment), relevant parameters are coefficients that statistically significant, but also need to be either large enough (to be outside rope), or have a very high uncertainty (i.e. large CI) to range outside the ROPE?
This approach just distinguishes small effect sizes with low and high uncertainty, where small effect sizes with higher uncertainty are preferred.
My conclusion: I mention in the docs that we do something different and stick to that current implementation, as I don't like both of the above two methods... Maybe we can add an additional argument that mimics the two above approaches.
from parameters.
What do you mean by "relevant parameters"?
This approach just adds an "inconclusive" category.
No - because without EQ, any non-significant result is inconclusive! ET adds the "accept" category! (This literal backwards thinking is such a mindf@#$...)
I think we should give the ET results, and maybe add an option for CET (if so, I will do the same in effectsize).
I think having a mix of Bayesian logic on frequentist methods is misleading - think if someone tried to publish anything based on this "mixed" method....
(and the %inROPE does not belong here either - in the frequentists framework you cannot slice up CIs to get probabilities like that...).
Again, I totally agree that this logic has flaws (I mean, I am a developer of bayestestR after all!), but it is what it is... Instead we can add in the docs something like "If you want to truly accept the null, consider becoming a Bayesian <link to our awesome stuff>".
from parameters.
(Also I think the p-value feature is redundant when you switch to ET - the label "rejected" is enough...)
from parameters.
(Also I think the p-value feature is redundant when you switch to ET - the label "rejected" is enough...)
(True, but the p-value is adjusted for multiple testing and should at least "mimic" the possible problem of non-dependent parameters)
from parameters.
What do you mean by "relevant parameters"?
Statistically significant and not practically equivalent to zero.
from parameters.
Oh - so close, but you have to flip the thinking to some weird somersault 😅
In ET:
relevant parameters are coefficients that statistically significant, but also need to
be either large enough (to be outside rope), or have a very high uncertainty (i.e. large CI) to range outside the ROPEnot have their CI entirely in the ROPE (= not reject the hypothesis that |θ| >= SESOI).
from parameters.
But really both sig-tests can be viewed independently: both sig means non-0, but small.
from parameters.
rules:
"classic"
"cet"
- conditional"bayes"
- test non-equivalence
Hows that?
from parameters.
Oh - so close, but you have to flip the thinking to some weird somersault 😅
In ET:
relevant parameters are coefficients that statistically significant, but also need tobe either large enough (to be outside rope), or have a very high uncertainty (i.e. large CI) to range outside the ROPEnot have their CI entirely in the ROPE (= not reject the hypothesis that |θ| >= SESOI).
That's what I meant by "large enought", i.e. exceeding the ROPE limits ;-)
from parameters.
Hows that?
Ok, and "classic" = Lakens?
from parameters.
yeah
from parameters.
That's what I meant by "large enought", i.e. exceeding the ROPE limits ;-)
But it's the other way around! It not "large enough" it's "not too small"... 🤪🤪🤪🤪🤪
from parameters.
Anyway... :-D
What would be "accept", "reject" and "undecided"? We have four cases here...
from parameters.
I'd say: A, D=accept; B=reject; C=undecided (though D would be not equivalent, so from the "test perspective", it also could be "reject").
from parameters.
But it's the other way around! It not "large enough" it's "not too small"...
We're approaching from different points, but we arrive at the same goal. ;-)
from parameters.
These are the 3 rules:
library(effectsize)
library(magrittr)
ds <- t_to_d(t = c(0.45, -0.65, -2.2, 2.25, 7),
df_error = c(675, 525, 900, 1875, 2000))
ds %>%
equivalence_test(range = 0.2) %>%
plot()
ds %>%
equivalence_test(range = 0.2, rule = "cet") %>%
plot()
ds %>%
equivalence_test(range = 0.2, rule = "bayes") %>%
plot()
Created on 2020-04-23 by the reprex package (v0.3.0)
from parameters.
For the CET, remember that we have narrow CI's for the 2nd step.
from parameters.
I'm not using the TOAST approach here, just what ever CI as passed (I have no way to recover the original data to do this without breaking the code) 😖
I will add in the docs that users should change the CI level if they want to use the TOAST approach.
I've also added this info in the printing:
library(effectsize)
ds <- t_to_d(t = c(0.45, -0.65, -2.2, 2.25, 7),
df_error = c(675, 525, 900, 1875, 2000),
ci = 0.95)
equivalence_test(ds, range = 0.2)
#> # Test for Practical Equivalence
#>
#> ROPE: [-0.20 0.20]
#>
#> d | 95% CI | H0
#> ----------------------------------
#> 0.03 | [-0.12, 0.19] | Accepted
#> -0.06 | [-0.23, 0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Rejected
#> 0.10 | [ 0.01, 0.19] | Accepted
#> 0.31 | [ 0.22, 0.40] | Rejected
equivalence_test(ds, range = 0.2, rule = "cet")
#> # Conditional Test for Practical Equivalence
#>
#> ROPE: [-0.20 0.20]
#>
#> d | 95% CI | H0
#> ----------------------------------
#> 0.03 | [-0.12, 0.19] | Accepted
#> -0.06 | [-0.23, 0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Rejected
#> 0.10 | [ 0.01, 0.19] | Rejected
#> 0.31 | [ 0.22, 0.40] | Rejected
equivalence_test(ds, range = 0.2, rule = "bayes")
#> # Test for Practical Equivalence
#>
#> ROPE: [-0.20 0.20]
#>
#> d | 95% CI | H0
#> ----------------------------------
#> 0.03 | [-0.12, 0.19] | Accepted
#> -0.06 | [-0.23, 0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Undecided
#> 0.10 | [ 0.01, 0.19] | Accepted
#> 0.31 | [ 0.22, 0.40] | Rejected
#>
#> (Using Bayesian guidlines)
Created on 2020-04-23 by the reprex package (v0.3.0)
from parameters.
I'm not using the TOST approach here, just what ever CI as passed (I have no way to recover the original data to do this without breaking the code) 😖
I will add in the docs that users should change the CI level if they want to use the TOST approach.
I've also added this info in the printing:
library(effectsize)
ds <- t_to_d(t = c(0.45, -0.65, -2.2, 2.25, 7),
df_error = c(675, 525, 900, 1875, 2000),
ci = 0.95)
equivalence_test(ds, range = 0.2)
#> # Test for Practical Equivalence
#>
#> ROPE: [-0.20 0.20]
#>
#> d | 95% CI | H0
#> ----------------------------------
#> 0.03 | [-0.12, 0.19] | Accepted
#> -0.06 | [-0.23, 0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Rejected
#> 0.10 | [ 0.01, 0.19] | Accepted
#> 0.31 | [ 0.22, 0.40] | Rejected
equivalence_test(ds, range = 0.2, rule = "cet")
#> # Conditional Test for Practical Equivalence
#>
#> ROPE: [-0.20 0.20]
#>
#> d | 95% CI | H0
#> ----------------------------------
#> 0.03 | [-0.12, 0.19] | Accepted
#> -0.06 | [-0.23, 0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Rejected
#> 0.10 | [ 0.01, 0.19] | Rejected
#> 0.31 | [ 0.22, 0.40] | Rejected
equivalence_test(ds, range = 0.2, rule = "bayes")
#> # Test for Practical Equivalence
#>
#> ROPE: [-0.20 0.20]
#>
#> d | 95% CI | H0
#> ----------------------------------
#> 0.03 | [-0.12, 0.19] | Accepted
#> -0.06 | [-0.23, 0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Undecided
#> 0.10 | [ 0.01, 0.19] | Accepted
#> 0.31 | [ 0.22, 0.40] | Rejected
#>
#> (Using Bayesian guidlines)
Created on 2020-04-23 by the reprex package (v0.3.0)
from parameters.
The TOST (using narrow CIs) isn't just for the CET, it is also the the "classic".
from parameters.
I'm not talking about TOST (Lakens), but about CET:
from parameters.
Not sure what you mean - TOST is the same as using a narrow CI (the orange CIs) for the equi-test... 🤔
from parameters.
hm, ok. But for "classic"
I'm just using the regular CI, for "cet"
the regular and narrow CI. Should I use the narrow ones only for "classic"
?
from parameters.
I think you should use the narrow one for both (maybe even all 3?)... Or for none, and set the default ci to 0.9? (As in let the user control this and explain in the docs? Which is what I have over at effectsize).
Daniel, I feel like we've been to war together after all of this... I'm tired and I want to go home! (Ignoring the fact that I've been home for nearly 8 weeks...)
from parameters.
TOST is the same as using a narrow CI
from parameters.
Daniel, I feel like we've been to war together after all of this... I'm tired and I want to go home! (Ignoring the fact that I've been home for nearly 8 weeks...)
Yes, yes! Take a break ;-) I'll use the narrow CI for cet and classic, and that's it, I'd say. We can think about changing this later...
from parameters.
Alright!
So I think... I think that's it?
Can it be?!!?!
from parameters.
from parameters.
from parameters.
Related Issues (20)
- `nnet::multinom` with wide format data
- `vcov` argument not supported for `plm` models HOT 5
- `model_parameters()` on marginaleffects::predictions()-object defaults to `exponentiate = TRUE` HOT 1
- Bootstrap Argument in `parameters::parameters()` Function Not Working with svyglm Models HOT 5
- check mmrm implementation
- `print(include_reference = TRUE)` doesn't work with pipes HOT 1
- Select and list the same parameter from multiple models HOT 2
- Combine multiple imputation with robust standard errors? HOT 3
- Warn (in model_parameters) when argument is not supported by method? HOT 1
- Support for {circular} uniformity tests
- pool_parameters df_error Infinity HOT 1
- Fix vignette build failures HOT 1
- Misleading names using `pretty_names` in combination with `cut()` HOT 3
- parameters.marginaleffects transforms the predicted? HOT 2
- Exponentiation of clmm coefficients in the ordinal package
- `include_reference` in `parameters()`? HOT 5
- `include_reference = TRUE` doesn't work in combination with `as.factor()` HOT 2
- Conflict between `include_reference` and `pretty_names` HOT 1
- `include_reference = TRUE` erroneously works with `datawizard::contr.deviation()` HOT 6
- coxme fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parameters.