Giter VIP home page Giter VIP logo

cluster-experiments's People

Contributors

alexanderhensel avatar aureliolova avatar danielapinho avatar david26694 avatar gabrielcidral1 avatar ludovico-lanni avatar oaclavijo10 avatar pablobd avatar pinsacco avatar victorbr92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cluster-experiments's Issues

Fix power users example

        df_power_users[self.treatment_col] = np.random.choice(
            [0, 1], size=len(df_power_users), p=[0.1, 0.9]
        )

choice should be from A, B

Variable-length washover

At some hour of day, we may need a longer washover than on another hour. Allow users to implement different washover lengths according to time of day / day of week.

Implement washover based on two events

For each record in our experiment, we have two events' timestamps, login_timestamp and logout_timestamp. We want to apply washover such that if there is a change in treatment between login_timestamp and logout_timestamp.

Calendar:

treatment,time
A,10:00
B,11:00
B,12:00

Record data:

id,start_time,end_time
1,10:50,10:59
2,10:51,11:01
3,11:01,11:05
4,11:01,12:01
5,12:01,12:05

In this case, we want to washover row number 2.

Binary Metric: when type is Boolean, power analysis workflow throws errors.

Description

Right now, if the metric is binary and it is of type Boolean (True, False), the power analysis will throw a series of errors that can't be easily traced back to this reason.

Fix Suggestions

We could either perform a conversion in the code, or raise an error specifying that the input metric should be of type integer (0,1).

Add synthetic control as analysis method

Add a wrapper to a synthetic control implementation that gives p-values, this should allow us to treat synthetic control as just another analysis method and check if it has higher power than simpler things

Create exact power calculation class

A new PowerAnalysis class needs to be set up. Something like:
class ExactPowerAnalysis, that implements the power_analysis and power_line methods using the linear model formula.

I have this script I used a long time ago:

import numpy as np
import pandas as pd
from cluster_experiments import PowerAnalysis
from scipy.stats import norm

def power_2_tails(df, splitter, alpha, ate, regressor):
    """Power of a test with a given alpha, ate and sigma

    Parameters
    ----------
    alpha : float
        Significance level
    ate : float
        Average treatment effect
    sigma : float
        Standard deviation of the difference of means (already divided by n)
    """
    df_treated = splitter.assign_treatment_df(df).assign(
        treatment=lambda x: (x.treatment == 'B').astype(int),
    )
    import statsmodels.api as sm
    fitted_ols = sm.OLS.from_formula(f"target ~ treatment + {regressor}", data=df_treated).fit()

    # get standard error of the regression coefficient
    se = fitted_ols.bse["treatment"]

    from scipy.stats import norm
    z_alpha = norm.ppf(1 - alpha / 2) 
    norm_cdf = norm.cdf(z_alpha - ate / se)
    norm_cdf_2 = norm.cdf(-z_alpha - ate / se)
    return 1 - norm_cdf + norm_cdf_2


# Create fake data
N = 2_000
alpha = 0.05
average_effect = 0.5
sigma = 1
df = pd.DataFrame(
    {
        "target": np.random.normal(0, sigma, size=2 * N),
        "regressor": np.random.normal(0, sigma, size=2 * N),
        "better_regressor": np.random.normal(0, sigma, size=2 * N),
    }
).assign(
    target=lambda x: x.target + x.regressor * 2 + x.better_regressor * 10
)


config = {
    "analysis": "ols_non_clustered",
    "perturbator": "uniform",
    "splitter": "non_clustered",
    "n_simulations": 1000,
    "covariates": ["regressor"],
    "alpha": alpha
}
pw = PowerAnalysis.from_dict(config)

pw_better = PowerAnalysis.from_dict({
    **config,
    "covariates": ["better_regressor"],
})


EFFECTS = [0, 0.1, 0.2, 0.3, 0.4, 0.5]


powers = pw.power_line(df, average_effects=EFFECTS)

powers_better = pw_better.power_line(df, average_effects=EFFECTS)

powers_exact = {}
powers_exact_better = {}
for average_effect in EFFECTS:
    powers_exact[average_effect] = power_2_tails(df, pw.splitter, alpha, average_effect, "regressor")
    powers_exact_better[average_effect] = power_2_tails(df, pw.splitter, alpha, average_effect, "better_regressor")



powers_better, powers
powers_exact_better, powers_exact


import matplotlib.pyplot as plt


plt.plot(EFFECTS, powers.values(), label="regressor")
plt.plot(EFFECTS, powers_better.values(), label="better regressor")
plt.plot(EFFECTS, powers_exact.values(), label="exact regressor")
plt.plot(EFFECTS, powers_exact_better.values(), label="exact better regressor")
plt.legend()
plt.xlabel("Average effect")
plt.ylabel("Power")
plt.title("Power of the test")
plt.show()

to compare exact vs simulation.

Create heterogeneous perturbator

A possible perturbator, given an ATE $a$:

  • $w_1$% of the treated get an uniform increase of $a_1$
  • $w_2$% of the treated get an uniform increase of $a_2$
  • ...
  • $w_n$% of the treated get an uniform increase of $a_n$

Such that $\sum \frac{w_i a_i}{100} = a$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.