Is there a way to calculate Cohen's d for paired data in DABEST? Currently DABEST appe

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Add paired Cohen's d about dabest-python HOT 4 CLOSED

acclab commented on May 30, 2024

Add paired Cohen's d

from dabest-python.

Comments (4)

josesho commented on May 30, 2024 1

Hi @paul-hawkins ,

It seems like DABEST only allows paired tests between two sets of data, while paired comparisons can be carried out on three or more sets of data.

This is half-correct: paired comparisons can only be done on pairs of data. For instance, using the data you posted in #98,

new_df = pd.read_csv('all_ringrmsd_data_only.txt', sep='\t')

# Need to have an ID column so DABEST knows which observations go together.
new_df.rename(columns={"Unnamed: 0": "id"}, inplace=True)

multi_paired = dabest.load(
                   new_df,
                   # Here, we assume OMEGA and MOE are a set of repeated measures,
                   # while Macromodel and Desmond are a second, unrelated set of repeated measures.
                   idx=(('OMEGA','MOE'),
                        ('Macromodel', 'Desmond')), 

                   id_col="id", paired=True)

multi_paired.cohens_d.plot();

DABEST v0.3.0
=============
             
Good afternoon!
The current time is Fri Apr 24 16:12:45 2020.

The paired Cohen's d between OMEGA and MOE is 0.182 [95%CI 0.0781, 0.285].
The p-value of the two-sided permutation t-test is 0.0014. 

The paired Cohen's d between Macromodel and Desmond is 0.314 [95%CI 0.179, 0.448].
The p-value of the two-sided permutation t-test is 0.0. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`

DABEST paired analysis design insists that none of the groups are repeated more than once.

multi_paired_neg = dabest.load(new_df,
                           idx=(('OMEGA','MOE'),
                                ('OMEGA', 'Desmond')), 
                           
                           id_col="id", paired=True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-53d5efeaf2d7> in <module>()
      5                                 ('OMEGA', 'Desmond')), 
      6 
----> 7                            id_col="id", paired=True)

~/anaconda3/envs/dabest-dev-py3.6/lib/python3.6/site-packages/dabest/_api.py in load(data, idx, x, y, paired, id_col, ci, resamples, random_seed)
     63     from ._classes import Dabest
     64 
---> 65     return Dabest(data, idx, x, y, paired, id_col, ci, resamples, random_seed)

~/anaconda3/envs/dabest-dev-py3.6/lib/python3.6/site-packages/dabest/_classes.py in __init__(self, data, idx, x, y, paired, id_col, ci, resamples, random_seed)
     60                 err1 = ' or a tuple has repeated groups in it.'
     61                 err2 = ' Please remove any duplicates and try again.'
---> 62                 raise ValueError(err0 + err1 + err2)
     63 
     64         else: # mix of string and tuple?

ValueError: Groups are repeated across tuples, or a tuple has repeated groups in it. Please remove any duplicates and try again.

This is designed deliberately to reduce any confusion. Paired comparisons, by definition, should only have a before measure, and an after measure. Setting up the comparison as in multi_paired_neg implies this is a not a strict paired comparison.

If you are doing a successive repeated measures experiment (ie.OMEGA is t=0, MOE is t=1, and then Macromodel is t=2), the way to do this is:

first = dabest.load(new_df,
                    idx=('OMEGA','MOE'),
                    id_col="id", paired=True)

second = dabest.load(new_df,
                    idx=('OMEGA', 'Macromodel'),
                    id_col="id", paired=True)

first.cohens_d

DABEST v0.3.0
=============
             
Good afternoon!
The current time is Fri Apr 24 16:34:10 2020.

The paired Cohen's d between OMEGA and MOE is 0.182 [95%CI 0.0781, 0.285].
The p-value of the two-sided permutation t-test is 0.0014. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`

second.cohens_d

DABEST v0.3.0
=============
             
Good afternoon!
The current time is Fri Apr 24 16:34:12 2020.

The paired Cohen's d between OMEGA and Macromodel is 0.0295 [95%CI -0.0672, 0.122].
The p-value of the two-sided permutation t-test is 0.541. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`

To plot them alongside each other:

import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format = 'retina'

sns.set(context="talk")
f, axx = plt.subplots(ncols=2, figsize=(10, 7),
                      # Adjust the width-wise spacing
                      gridspec_kw={"wspace":0.5})

plot_kwargs = dict(float_contrast=False, 
                    contrast_ylim=(0, 0.7))

first.cohens_d.plot(ax=axx[0], **plot_kwargs);
second.cohens_d.plot(ax=axx[1], **plot_kwargs);

You can read more here.

Hope this helps!

from dabest-python.

josesho commented on May 30, 2024

Hi @paul-hawkins,

Are you asking how to compute paired Cohen's d? Or are you saying that the paired Cohen's d returned by DABEST is not actually paired?

If you question is the first one, simply load an experiment as a paired experiment:

import pandas as pd
import dabest

# Load the iris dataset. Requires internet access.
iris = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/iris.csv")
iris.reset_index(inplace=True)

virginica = iris[iris.species=="virginica"].copy()

virginica_melted = pd.melt(virginica, 
                           id_vars="index", 
                           value_vars=["sepal_length", "petal_length"],
                           var_name="flower_part",
                           value_name="width")

virginica_paired = dabest.load(data=virginica_melted,  x="flower_part", y="width",
                                 paired=True, id_col="index",
                                 idx=("sepal_length", "petal_length"))

then produce the Cohen's d:

virginica_paired.cohens_d

DABEST v0.3.0
=============
             
Good evening!
The current time is Thu Apr 23 18:44:08 2020.

The paired Cohen's d between sepal_length and petal_length is -1.74 [95%CI -2.1, -1.37].
The p-value of the two-sided permutation t-test is 0.0. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`

If you are saying the latter (ie the paired Cohen's d returned by DABEST is not actually paired), could you provide a dummy dataset with the expected accurate values, vis-a-vis what DABEST produces? Thanks!

from dabest-python.

paul-hawkins commented on May 30, 2024

Joses, I think this problem is just my inexperience with DABEST. Looking at running the Cohen’s d function s_control.cohens_d DABEST v0.3.0 ============= Good morning! The current time is Thu Apr 16 11:45:23 2020. The unpaired Cohen's d between OMEGA and MOE is 0.182 [95%CI -0.0108, 0.367]. The p-value of the two-sided permutation t-test is 0.0704. The unpaired Cohen's d between OMEGA and Macromodel is 0.0295 [95%CI -0.168, 0.215]. The p-value of the two-sided permutation t-test is 0.766. The unpaired Cohen's d between OMEGA and Desmond is 0.37 [95%CI 0.162, 0.572]. The p-value of the two-sided permutation t-test is 0.0002. The unpaired Cohen's d between OMEGA and RDKit is 0.593 [95%CI 0.36, 0.805]. The p-value of the two-sided permutation t-test is 0.0. The unpaired Cohen's d between OMEGA and Prime is -0.0323 [95%CI -0.224, 0.162]. The p-value of the two-sided permutation t-test is 0.747. 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. For each p-value, 5000 reshuffles of the control and test labels were performed. To get the results of all valid statistical tests, use `.cohens_d.statistical_tests` DABEST says it is reporting unpaired Cohen’s d. When looking at the results of s_control.cohens_d.statistical_tests There is a column ‘is_paired’ which is set to False, so I thought that being able to set it to True would solve my problem, but I could not find a way to do that. However the values returned by the ‘cohens_d’ function are more or less identical to the values I get for paired d, so this appears to be a problem that has solved itself. However, if I run s_control = db.load(new_df,idx=('OMEGA','MOE','Macromodel','Desmond','RDKit','Prime'),paired=True) DABEST returns an error ValueError: `is_paired` is True, but some idx in ('OMEGA', 'MOE', 'Macromodel', 'Desmond', 'RDKit', 'Prime') does not consist only of two groups. It seems like DABEST only allows paired tests between two sets of data, while paired comparisons can be carried out on three or more sets of data. As you can see this is with DABSET 0.3.0. Paul. From: Joses W. Ho <[email protected]> Sent: Thursday, April 23, 2020 4:49 AM To: ACCLAB/DABEST-python <[email protected]> Cc: Paul Hawkins <[email protected]>; Mention <[email protected]> Subject: Re: [ACCLAB/DABEST-python] Add paired Cohen's d (#99) Hi @paul-hawkins<https://github.com/paul-hawkins>, Are you asking how to compute paired Cohen's d? Or are you saying that the paired Cohen's d returned by DABEST is not actually paired? If you question is the first one, simply load an experiment as a paired experiment: import pandas as pd import dabest # Load the iris dataset. Requires internet access. iris = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/iris.csv") iris.reset_index(inplace=True) virginica = iris[iris.species=="virginica"].copy() virginica_melted = pd.melt(virginica, id_vars="index", value_vars=["sepal_length", "petal_length"], var_name="flower_part", value_name="width") virginica_paired = dabest.load(data=virginica_melted, x="flower_part", y="width", paired=True, id_col="index", idx=("sepal_length", "petal_length")) then produce the Cohen's d: virginica_paired.cohens_d DABEST v0.3.0 ============= Good evening! The current time is Thu Apr 23 18:44:08 2020. The paired Cohen's d between sepal_length and petal_length is -1.74 [95%CI -2.1, -1.37]. The p-value of the two-sided permutation t-test is 0.0. 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. For each p-value, 5000 reshuffles of the control and test labels were performed. To get the results of all valid statistical tests, use `.cohens_d.statistical_tests` If you are saying the latter (ie the paired Cohen's d returned by DABEST is not actually paired), could you provide a dummy dataset with the expected accurate values, vis-a-vis what DABEST produces? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#99 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE6VZPUBJQEXUNRJWCHCENLROAMJXANCNFSM4MMXABCQ>.

from dabest-python.

maiyishan commented on May 30, 2024

Hi @paul-hawkins,

I hope Joses sufficiently answered your question.

Just to let you know, we have just released a new version of DABEST and you will have to use paired=baseline or paired=sequential for future paired comparisons. Please see the new documentation for details.

I will now be closing this issue. Thank you!

from dabest-python.

Add paired Cohen's d about dabest-python HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent