I (and many psychologists) frequently work with repeated-measure designs, where a give

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you so much <a class="user-mention notranslate" data-hovercard-type="user" data-

Consider adding support for repeated-measures error bars/ CIs (following Morey, 2008) about seaborn HOT 6 CLOSED

henrymj commented on August 16, 2024

Consider adding support for repeated-measures error bars/ CIs (following Morey, 2008)

from seaborn.

Comments (6)

thuiop commented on August 16, 2024 2

Hm, indeed I have no obvious solution here (besides doing the statistical work outside of seaborn). I could probably whip up something for the objects interface though (which is probably the way to go; I believe that the changes to the old interface would need to be somewhat heavy for this to work).

from seaborn.

mwaskom commented on August 16, 2024 1

I've made plots like these before, IIRC the upstream data transform can be done in a pandas one-liner; I'm not sure why the linked stackoverflow question makes it seem so complicated.

But I am 👎 on adding this in seaborn; even though the data transform is straightforward, exposing a specification API that is sufficiently general for all experimental designs where you would want to use it is not.

Agree it could probably be supported through a plugin stat object.

Thanks for the suggestion.

from seaborn.

thuiop commented on August 16, 2024 1

@henrymj Ok, so I built this for the object interface (it is a bit ugly, sorry) :

import numpy as np
import pandas as pd
import seaborn.objects as so

from seaborn._stats.aggregation import Est

class CMEst(Est):
    def __call__(
        self, data, groupby, orient, scales,
    ):
        var = {"x": "y", "y": "x"}[orient]
        means_indiv = data.groupby([orient,"id_var"]).agg(np.mean)[var]
        means_all = data.groupby(orient).agg(np.mean)[var]
        num_cond = None
        def normalize_df(df):
            nonlocal num_cond
            if num_cond is None:
                num_cond = len(df)
            new_df = df.copy()
            i = df.reset_index().loc[0,[orient,"id_var"]]
            partial_mean = means_indiv[i[orient],i["id_var"]]
            full_mean = means_all[i[orient]]
            new_df[var] = new_df[var] - partial_mean  + full_mean
            return new_df
        data = data.groupby([orient,"id_var"],group_keys=False).apply(normalize_df)
        def adjust_variance(df):
            df[var] = df[var].mean() + np.sqrt(num_cond/(num_cond-1)) * (df[var] - df[var].mean())
            return df
        data = groupby.apply(data,adjust_variance)
        return super().__call__(data, groupby, orient, scales)

This is close to a drop-in replacement of so.Est(), except that you need to specify an id_var column which represents the individual you are considering. The example from your stackexchange post would look like :

df = pd.read_csv("DemoWS-30x2.csv")
# Busy work for converting to long-form
df["index"] = df.index
df = pd.wide_to_long(df,[f"activation.{i}" for i in range(1,31)],i="index",j="condition",sep=".").reset_index()
df["index"] = df.index
df = pd.wide_to_long(df,"activation",i="index",j="timepoints",sep=".").reset_index()
p = (
    so.Plot(data=df, x="timepoints", y="activation", color="condition")
    .add(so.Band(), CMEst(),id_var="id")
    #.add(so.Band(),so.Est())
    .add(so.Line(marker="o"), so.Agg())
    .scale(color=so.Nominal(["red","blue"]))
)
p.show()

giving (there are some differences with the provided example though, I believe it might be due to the way the CI is computed, but don't hesitate to check the code as I may have made a mistake)

from seaborn.

thuiop commented on August 16, 2024

This seems a bit specific to your domain. Is there anything preventing you from passing a callable to the errorbar parameter, which allows you to define how the error is computed ?

from seaborn.

henrymj commented on August 16, 2024

My impression is that this would be a beneficial feature to almost also social science domains (and any clinical work that compares post-treatments to baselines), but I might be wrong.

I looked into customizable error bar callables using the old API (I haven't migrated yet), but at least in the documentation it appears that the callables should only expect a 1D vector of data, and thus wouldn't be able to make adjustments that require richer vectors with information like a individual identifier. However, it's possible I've just been looking in the wrong place and/or haven't understood how flexible the callables can be.

Here's an example that illustrates the stroop example above - if someone were able to provide a flexible solution to it, I'd be very happy to withdraw the issue!

import numpy as np
import pandas as pd
import seaborn as sns

np.random.seed(0)

# getting group average RTs for the 2 conditions
congruent_rts = np.random.normal(250, 50, 30)
incongruent_shift = np.random.normal(25, 10, 30)
incongruent_rts = congruent_rts + incongruent_shift

df = pd.DataFrame({
    'Condition': ['Congruent'] * 30 + ['Incongruent'] * 30,
    'subject': np.repeat(range(30), 2),
    'RT': np.concatenate([congruent_rts, incongruent_rts])
})

_ = sns.barplot(x='Condition', y='RT', data=df)  # barplots will mask the difference between conditions

from seaborn.

henrymj commented on August 16, 2024

Thank you so much @thuiop! I'll test out this approach. I really appreciate you taking the time to generate this - hopefully other social scientists will also benefit from this example.

from seaborn.

Consider adding support for repeated-measures error bars/ CIs (following Morey, 2008) about seaborn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent