Follow up on <a href="https://github.com/easystats/report/issues/9" data-hovercard-typ

I re-read the paper again. Based on what <a class="user-mention notranslate" data-hove

Conditional Equivalence Testing (CET),about easystats/parameters

Comments (49)

mattansb commented on May 12, 2024 1

Note that emmeans has equivalence testing baked into the test() function via the delta arg, so it should be super easy to implament this in modelbased.

library(emmeans)

fit <- lm(Sepal.Length ~ Species, iris)

em_ <- emmeans(fit, ~ Species)

pair_test <- pairs(em_)

test(pair_test)
#>  contrast               estimate    SE  df t.ratio p.value
#>  setosa - versicolor      -0.930 0.103 147  -9.033 <.0001 
#>  setosa - virginica       -1.582 0.103 147 -15.366 <.0001 
#>  versicolor - virginica   -0.652 0.103 147  -6.333 <.0001 
#> 
#> P value adjustment: tukey method for comparing a family of 3 estimates

test(pair_test, delta = 1)
#>  contrast               estimate    SE  df t.ratio p.value
#>  setosa - versicolor      -0.930 0.103 147 -0.680  0.5761 
#>  setosa - virginica       -1.582 0.103 147  5.653  1.0000 
#>  versicolor - virginica   -0.652 0.103 147 -3.380  0.0014 
#> 
#> P value adjustment: sidak method for 3 tests 
#> Statistics are tests of equivalence with a threshold of 1 
#> P values are left-tailed

^{Created on 2020-04-14 by the reprex package (v0.3.0)}

from parameters.

mattansb commented on May 12, 2024 1

Daniellllllll STOP. BEING. BAYESIAN!
Ask what is not rejected!

A + C - Accept
B - Reject"
D - Undecided

from parameters.

strengejacke commented on May 12, 2024

I'm drafting a function, cet(), and reading that paper again. I think it's very similar (if not the same) to Kruschke's equivalence-testing. What I don't exactly understand, is: how is delta calculated in this figure?

https://journals.plos.org/plosone/article/figure?id=10.1371/journal.pone.0195145.g001

The delta in this figure is the rope in equicalence_test(). The major difference I see is that with a similar approach to equivalence:test(), we would be more conservative than the suggestions by the CET-paper, if I understand it right. But that's no disadvantage.

My impression is, that the method the authors suggest is a bit complicated and less intuitive. See the third error bar from top in the left panel here, which is inconclusive. The larger (blue) CI crosses the zero, while the smaller CI (organge) does not. However, the 4th errorbar in the group "negative" has the same properties. The only difference is that both upper CIs also lie within the deltas, that's what it apparantly makes it negative instead of inconclusive...

Edit: The rules seems to be:

inconclusive: If zero is crossed, but at least on of the smaller CI (orange) are outside the ROPE
negative: Zero is crossed and smaller CI (orange) are inside rope.
positive: not crossing zero (which means: statistical significance)

We can follow this rule, but at the moment I will write this function in accordance with Kruschke's equi-testing.

from parameters.

DominiqueMakowski commented on May 12, 2024

I feel like the Conditional ET paradigm is indeed a bit more convoluted than a straight ET,, and after reading the paper I was not sure of the rationale and necessity for this conditional decision rule involving alpha and 2*alpha.

But yes, let's first add "regular" ET for frequentist models, and then we can add the conditional variant.!

from parameters.

strengejacke commented on May 12, 2024

Test... Some differences look impressive, but given the small ranges of CI, ratios have larger difference very quickly.

library(bayestestR)
library(parameters)
library(rstanarm)

m1 <- lm(mpg ~ gear + wt + cyl + hp, data = mtcars)
m2 <- stan_glm(mpg ~ gear + wt + cyl + hp, data = mtcars)

equivalence_test(m2, ci = .95)
#> # Test for Practical Equivalence
#> 
#>   ROPE: [-0.60 0.60]
#> 
#>    Parameter        H0 inside ROPE       95% HDI
#>  (Intercept)  rejected      0.00 % [24.96 48.78]
#>         gear undecided     43.86 % [-1.59  2.32]
#>           wt  rejected      0.00 % [-4.70 -1.29]
#>          cyl undecided     37.38 % [-2.09  0.58]
#>           hp  accepted    100.00 % [-0.05  0.01]

cet(m1)
#> # Conditional Equivalence Test
#> 
#>   Region of equivalence: [-0.60 0.60]
#> 
#>    parameter        H0 inside ROPE        95% CI
#>  (Intercept)  rejected      0.00 % [24.44 48.94]
#>         gear undecided     29.37 % [-1.69  2.41]
#>           wt  rejected      0.00 % [-4.77 -1.28]
#>          cyl undecided     42.29 % [-2.17  0.55]
#>           hp  accepted    100.00 % [-0.05  0.01]

^{Created on 2019-03-21 by the reprex package (v0.2.1)}

from parameters.

strengejacke commented on May 12, 2024

But yes, let's first add "regular" ET for frequentist models, and then we can add the conditional variant.!

Yeah, I agree. Yet I think we should probably find another name than cet(), to avoid confusion? Or is it still ok?

(I can't commit from here, so I will commit later...)

from parameters.

DominiqueMakowski commented on May 12, 2024

That's interesting! Could the fact that in the Bayesian case in uses hdi and quantile method for the frequentist add to the difference?

For the name, I think it makes sense to keep the equivalence_test() and rope() methods and expand them to support freq models. cet (or equivalence_test(..., conditional=TRUE) or such) could be reserved for the method described by the paper. What do you think?

from parameters.

strengejacke commented on May 12, 2024

Yes, I was also thinking about equivalence_test(), but hesitating given the fact that arguments were named posterior. We should rename it into something more general, probably x? Since we have a numeric-method and a method for models, model does not quite fit well... hm.

from parameters.

strengejacke commented on May 12, 2024

https://journals.sagepub.com/doi/10.1177/1948550617697177

from parameters.

strengejacke commented on May 12, 2024

I re-read the paper again. Based on what @DominiqueMakowski wrote in his initial post, and the figure in my answer, it looks like following:

Step 2 (conclusive) seems to be a simple statistically significant result (because CIs are within ROPE, but exclude 0). If the narrow CI is within the ROPE, it's negative, else inconclusive.

I'd say we stick to our rules. I have implemented functions for more frequentist models and for simulated model parameters (https://easystats.github.io/see/articles/parameters.html#equivalence-testing-1), maybe we can do this for bootstrapped parameters as well...

We might take a close look at the Lakens-paper to decide whether we want to investigate this further. I would rather not rely on the emmeans-function for now.

from parameters.

strengejacke commented on May 12, 2024

The Lakens paper is not related to regression models (but could maybe interesting for the effectsize package).

from parameters.

mattansb commented on May 12, 2024

@strengejacke Do you mean to use the CIs from effectsize as criteria?
This would mean to label the output according to some "rules", no? 🤔

Or maybe some in-between - have the CI bounds interpreted with the interpret_* function?
Sounds like would work well with report (:

from parameters.

mattansb commented on May 12, 2024

I'm re-opening this as I believe the implementation we provide is not the common one (see also easystats/effectsize#70).

Frequentist "logic"

In the frequentist NHST logic:

If P(data | H_x) < α
Then reject H_x

Using CIs, this can be reframed as:

If CI_1-α does not contain the value expected by H_x
Then reject H_x

And so, with frequientist equivalence testing:
If the null was rejected, and the SESOI was not, then we conclude that H0 is rejected.

Or decision matrix then looks like this:

	H_alt rejected	Not rejected
H_null rejected	Accept null*	Reject null
Not rejected	Accept null	Undecided

* Even though our result is significantly larger than 0, it is also significantly smaller than SESOI. So we accept the null, as it is defined by the ROPE.

From Laken's paper:

The rules

The rules then are:

If CI does not contain the SESOI (or larger) - accept null.
Else, if CI does not contain 0 - reject the null.
Else, inconclusive.

It seems like the CET paper suggests switching the order of 1 and 2, so:

Else, if CI does not contain 0 - reject the null.
If CI does not contain the SESOI (or larger) - accept null.
Else, inconclusive.

(sometimes the CI for testing the SESOI is smaller than that used for testing the 0.)

In either case, the weird results that @strengejacke spotted, are a "feature" of the frequentist framework, not of CET in particular.

Required changes

equivalence_test.lm should:

remove the %inROPE, which is a very Bayesian and not appropriate here
only have the labels, as dictated by the rules above.

from parameters.

strengejacke commented on May 12, 2024

ROPE is just a region of practical equivalence, and not specific to Bayesian analysis. Thus, I don't see any problems using the term here.

And looking at the rules from the CET paper (see #12 (comment)), the "positive" results are just results with p < 0.05, while all other are not statistically significant. So there's no difference to classic NHST. From Lakens' paper, it seems to me that there's the same issue, but I'm not sure anyway if these conclusions can be transferred to regression models or not (t-tests and correlation are discussed).

Something is not practically equivalent to zero, if the estimate and the uncertainty in estimation are outside a range that is considered as practically equivalent to zero. That is what equivalence_test() does, and I think this concept is transferable from Bayesian to frequentist.

I would say we can make this explicit in the docs, but I would not change the function design.

from parameters.

mattansb commented on May 12, 2024

The term ROPE is fine, but the calculation of % in ROPE is not in this instance.

The idea of EQ is transferable to any testing in the frequentist framework.

Something is not practically equivalent to zero, if the estimate and the uncertainty in estimation are outside a range that is considered as practically equivalent to zero. That is what equivalence_test() does

That is not what equivalence testing is in the frequentist framework - in which you cannot conclude anything from lack of significance, only from a significant result. Unlike the Bayesian framework, where you can talk about probabilities, and uncertainty in ranges and estimates...

But in the frequentist framework, all you have are two tests (from the original paper):

Is |θ| > 0? (Can we reject the null: |θ| = 0)
Is |θ| < SESOI? (Can we reject the null: |θ| > SESOI)

If both are significant, than θ is not 0, but it at the very least small (accept).
If only the first is significant, θ is not 0 (reject).
If only the first is significant, θ is at the very least small (accept).
If neither, then the results are inconclusive (undecided).

from parameters.

strengejacke commented on May 12, 2024

If only the first is significant, θ is not 0 (reject).
If only the first is significant, θ is at the very least small (accept).

These two conditions are the same.

from parameters.

mattansb commented on May 12, 2024

Sorry, fixed:

If both are significant, than θ is not 0, but it at the very least small (accept).
If only the first is significant, θ is not 0 (reject).
If only the second is significant, θ is at the very least small (accept).
If neither, then the results are inconclusive (undecided).

I think the point of contention is this:

In Bayes, to reject the null you want to test if |θ| > SESOI (is it outside the rope).

But here the SESOI is not used to reject the null - the point null is used for that. The SESOI is used to accept the null by rejecting the hypothesis that |θ| > SESOI.

Is this dumb? Yes.
But is this what people have been doing? Or more importantly, what users would expect from this function? Yes.

from parameters.

strengejacke commented on May 12, 2024

But what you're describing is Lakens' rule, not the one from the other CET paper, right? I don't see these rules apply to the figure from my above comment.

from parameters.

strengejacke commented on May 12, 2024

To summarise: if a significant coefficient's confidence intervals includes or is greater than the ROPE, this would be not practically equivalent to null?

from parameters.

mattansb commented on May 12, 2024

From Lakens == Hauck and Anderson - yes.

What the CET paper does is say:

Only if the null that |θ| = 0 is not rejected, than test for equivalence.

So this changes the rules a little:

	H_alt rejected	Not rejected
H_null rejected	CET: Reject null / ET: Accept null	Reject null
Not rejected	Accept null	Undecided

To summarise: if a significant coefficient's confidence intervals includes or is greater than the ROPE, this would be not practically equivalent to null?

Yes, precisely ^_^

from parameters.

strengejacke commented on May 12, 2024

2nd Summary:

According to #12 (comment), relevant parameters are those where the coefficient is a statistically significant coefficient.

This approach just adds an "inconclusive" category.

According to #12 (comment), relevant parameters are coefficients that statistically significant, but also need to be either large enough (to be outside rope), or have a very high uncertainty (i.e. large CI) to range outside the ROPE?

This approach just distinguishes small effect sizes with low and high uncertainty, where small effect sizes with higher uncertainty are preferred.

My conclusion: I mention in the docs that we do something different and stick to that current implementation, as I don't like both of the above two methods... Maybe we can add an additional argument that mimics the two above approaches.

from parameters.

mattansb commented on May 12, 2024

What do you mean by "relevant parameters"?

This approach just adds an "inconclusive" category.

No - because without EQ, any non-significant result is inconclusive! ET adds the "accept" category! (This literal backwards thinking is such a mindf@#$...)

I think we should give the ET results, and maybe add an option for CET (if so, I will do the same in effectsize).
I think having a mix of Bayesian logic on frequentist methods is misleading - think if someone tried to publish anything based on this "mixed" method....
(and the %inROPE does not belong here either - in the frequentists framework you cannot slice up CIs to get probabilities like that...).

Again, I totally agree that this logic has flaws (I mean, I am a developer of bayestestR after all!), but it is what it is... Instead we can add in the docs something like "If you want to truly accept the null, consider becoming a Bayesian <link to our awesome stuff>".

from parameters.

mattansb commented on May 12, 2024

(Also I think the p-value feature is redundant when you switch to ET - the label "rejected" is enough...)

from parameters.

strengejacke commented on May 12, 2024

(Also I think the p-value feature is redundant when you switch to ET - the label "rejected" is enough...)

(True, but the p-value is adjusted for multiple testing and should at least "mimic" the possible problem of non-dependent parameters)

from parameters.

strengejacke commented on May 12, 2024

What do you mean by "relevant parameters"?

Statistically significant and not practically equivalent to zero.

from parameters.

mattansb commented on May 12, 2024

Oh - so close, but you have to flip the thinking to some weird somersault 😅

In ET:

relevant parameters are coefficients that statistically significant, but also need to ~~be either large enough (to be outside rope), or have a very high uncertainty (i.e. large CI) to range outside the ROPE~~ not have their CI entirely in the ROPE (= not reject the hypothesis that |θ| >= SESOI).

from parameters.

mattansb commented on May 12, 2024

But really both sig-tests can be viewed independently: both sig means non-0, but small.

from parameters.

mattansb commented on May 12, 2024

rules:

"classic"
"cet" - conditional
"bayes" - test non-equivalence

Hows that?

from parameters.

strengejacke commented on May 12, 2024

Oh - so close, but you have to flip the thinking to some weird somersault 😅
In ET:
relevant parameters are coefficients that statistically significant, but also need to ~~be either large enough (to be outside rope), or have a very high uncertainty (i.e. large CI) to range outside the ROPE~~ not have their CI entirely in the ROPE (= not reject the hypothesis that |θ| >= SESOI).

That's what I meant by "large enought", i.e. exceeding the ROPE limits ;-)

from parameters.

strengejacke commented on May 12, 2024

Hows that?

Ok, and "classic" = Lakens?

from parameters.

mattansb commented on May 12, 2024

yeah

from parameters.

mattansb commented on May 12, 2024

That's what I meant by "large enought", i.e. exceeding the ROPE limits ;-)

But it's the other way around! It not "large enough" it's "not too small"... 🤪🤪🤪🤪🤪

from parameters.

strengejacke commented on May 12, 2024

Anyway... :-D

What would be "accept", "reject" and "undecided"? We have four cases here...

from parameters.

strengejacke commented on May 12, 2024

I'd say: A, D=accept; B=reject; C=undecided (though D would be not equivalent, so from the "test perspective", it also could be "reject").

from parameters.

strengejacke commented on May 12, 2024

But it's the other way around! It not "large enough" it's "not too small"...

We're approaching from different points, but we arrive at the same goal. ;-)

from parameters.

mattansb commented on May 12, 2024

These are the 3 rules:

library(effectsize)
library(magrittr)

ds <- t_to_d(t = c(0.45, -0.65, -2.2, 2.25, 7),
       df_error = c(675, 525, 900, 1875, 2000))

ds %>% 
  equivalence_test(range = 0.2) %>% 
  plot()

ds %>% 
  equivalence_test(range = 0.2, rule = "cet") %>% 
  plot()

ds %>% 
  equivalence_test(range = 0.2, rule = "bayes") %>% 
  plot()

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

from parameters.

strengejacke commented on May 12, 2024

For the CET, remember that we have narrow CI's for the 2nd step.

from parameters.

mattansb commented on May 12, 2024

I'm not using the TOAST approach here, just what ever CI as passed (I have no way to recover the original data to do this without breaking the code) 😖

I will add in the docs that users should change the CI level if they want to use the TOAST approach.

I've also added this info in the printing:

library(effectsize)


ds <- t_to_d(t = c(0.45, -0.65, -2.2, 2.25, 7),
             df_error = c(675, 525, 900, 1875, 2000),
             ci = 0.95)

equivalence_test(ds, range = 0.2)
#> # Test for Practical Equivalence
#> 
#>   ROPE: [-0.20 0.20]
#> 
#>     d |         95% CI |        H0
#> ----------------------------------
#>  0.03 | [-0.12,  0.19] |  Accepted
#> -0.06 | [-0.23,  0.11] | Undecided
#> -0.15 | [-0.28, -0.02] |  Rejected
#>  0.10 | [ 0.01,  0.19] |  Accepted
#>  0.31 | [ 0.22,  0.40] |  Rejected


equivalence_test(ds, range = 0.2, rule = "cet")
#> # Conditional Test for Practical Equivalence
#> 
#>   ROPE: [-0.20 0.20]
#> 
#>     d |         95% CI |        H0
#> ----------------------------------
#>  0.03 | [-0.12,  0.19] |  Accepted
#> -0.06 | [-0.23,  0.11] | Undecided
#> -0.15 | [-0.28, -0.02] |  Rejected
#>  0.10 | [ 0.01,  0.19] |  Rejected
#>  0.31 | [ 0.22,  0.40] |  Rejected


equivalence_test(ds, range = 0.2, rule = "bayes")
#> # Test for Practical Equivalence
#> 
#>   ROPE: [-0.20 0.20]
#> 
#>     d |         95% CI |        H0
#> ----------------------------------
#>  0.03 | [-0.12,  0.19] |  Accepted
#> -0.06 | [-0.23,  0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Undecided
#>  0.10 | [ 0.01,  0.19] |  Accepted
#>  0.31 | [ 0.22,  0.40] |  Rejected
#> 
#> (Using Bayesian guidlines)

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

from parameters.

mattansb commented on May 12, 2024

I'm not using the TOST approach here, just what ever CI as passed (I have no way to recover the original data to do this without breaking the code) 😖

I will add in the docs that users should change the CI level if they want to use the TOST approach.

I've also added this info in the printing:

library(effectsize)


ds <- t_to_d(t = c(0.45, -0.65, -2.2, 2.25, 7),
             df_error = c(675, 525, 900, 1875, 2000),
             ci = 0.95)

equivalence_test(ds, range = 0.2)
#> # Test for Practical Equivalence
#> 
#>   ROPE: [-0.20 0.20]
#> 
#>     d |         95% CI |        H0
#> ----------------------------------
#>  0.03 | [-0.12,  0.19] |  Accepted
#> -0.06 | [-0.23,  0.11] | Undecided
#> -0.15 | [-0.28, -0.02] |  Rejected
#>  0.10 | [ 0.01,  0.19] |  Accepted
#>  0.31 | [ 0.22,  0.40] |  Rejected


equivalence_test(ds, range = 0.2, rule = "cet")
#> # Conditional Test for Practical Equivalence
#> 
#>   ROPE: [-0.20 0.20]
#> 
#>     d |         95% CI |        H0
#> ----------------------------------
#>  0.03 | [-0.12,  0.19] |  Accepted
#> -0.06 | [-0.23,  0.11] | Undecided
#> -0.15 | [-0.28, -0.02] |  Rejected
#>  0.10 | [ 0.01,  0.19] |  Rejected
#>  0.31 | [ 0.22,  0.40] |  Rejected


equivalence_test(ds, range = 0.2, rule = "bayes")
#> # Test for Practical Equivalence
#> 
#>   ROPE: [-0.20 0.20]
#> 
#>     d |         95% CI |        H0
#> ----------------------------------
#>  0.03 | [-0.12,  0.19] |  Accepted
#> -0.06 | [-0.23,  0.11] | Undecided
#> -0.15 | [-0.28, -0.02] | Undecided
#>  0.10 | [ 0.01,  0.19] |  Accepted
#>  0.31 | [ 0.22,  0.40] |  Rejected
#> 
#> (Using Bayesian guidlines)

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

from parameters.

mattansb commented on May 12, 2024

The TOST (using narrow CIs) isn't just for the CET, it is also the the "classic".

from parameters.

strengejacke commented on May 12, 2024

I'm not talking about TOST (Lakens), but about CET:

from parameters.

mattansb commented on May 12, 2024

Not sure what you mean - TOST is the same as using a narrow CI (the orange CIs) for the equi-test... 🤔

from parameters.

strengejacke commented on May 12, 2024

hm, ok. But for "classic" I'm just using the regular CI, for "cet" the regular and narrow CI. Should I use the narrow ones only for "classic"?

from parameters.

mattansb commented on May 12, 2024

I think you should use the narrow one for both (maybe even all 3?)... Or for none, and set the default ci to 0.9? (As in let the user control this and explain in the docs? Which is what I have over at effectsize).

Daniel, I feel like we've been to war together after all of this... I'm tired and I want to go home! (Ignoring the fact that I've been home for nearly 8 weeks...)

from parameters.

mattansb commented on May 12, 2024

TOST is the same as using a narrow CI

from parameters.

strengejacke commented on May 12, 2024

Daniel, I feel like we've been to war together after all of this... I'm tired and I want to go home! (Ignoring the fact that I've been home for nearly 8 weeks...)

Yes, yes! Take a break ;-) I'll use the narrow CI for cet and classic, and that's it, I'd say. We can think about changing this later...

from parameters.

mattansb commented on May 12, 2024

Alright!

So I think... I think that's it?
Can it be?!!?!

from parameters.

strengejacke commented on May 12, 2024

from parameters.

mattansb commented on May 12, 2024

from parameters.

Conditional Equivalence Testing (CET) about parameters HOT 49 CLOSED

Comments (49)

Frequentist "logic"

The rules

Required changes

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent