Giter VIP home page Giter VIP logo

synthx's Introduction

synthx

SynthX: A Python Library for Advanced Synthetic Control Analysis

python code sanity check - lint python code sanity check - test PyPI Latest Release

Algorithm Behind

Synthetic Control

Synthetic Control is a statistical method for estimating the causal effect of an intervention on a single treated unit by constructing a weighted combination of control units that closely resembles the treated unit in terms of pre-intervention characteristics. This method is particularly useful when randomized experiments are not feasible, and there is a limited number of control units.

The key idea behind Synthetic Control is to create a "synthetic" control unit that serves as a counterfactual for the treated unit. By comparing the post-intervention outcomes of the treated unit with the outcomes of the synthetic control unit, one can estimate the causal effect of the intervention.

Placebo Test

Placebo Test is a method for assessing the statistical significance of the estimated treatment effect in Synthetic Control. The idea is to apply the Synthetic Control method to control units that did not receive the intervention, pretending that they were treated at the same time as the actual treated unit. By comparing the estimated effect for the true treated unit with the distribution of placebo effects, one can determine whether the observed effect is likely due to chance or represents a genuine causal effect.

If the estimated effect for the true treated unit is larger than most of the placebo effects, it suggests that the intervention had a significant impact. On the other hand, if the true effect is similar in magnitude to the placebo effects, it indicates that the observed effect may be due to chance rather than the intervention.

For more infomation

Read the paper: Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program

Set up

Install the latest SynthX version with

pip install synthx

Usage

Sample data generation

You can use your own data. For test purpose, you can generate sample data with

import synthx as sx

df = sx.sample(
    n_units=20,
    n_time=50,
    n_observed_covariates=3,
    n_unobserved_covariates=1,
    intervention_units=1,
    intervention_time=40,
    intervention_effect=1.2,
    noise_effect=0.1,
    seed=42,
)

The sample() function generates a synthetic dataset with the specified number of units, time periods, observed and unobserved covariates, intervention units, intervention time, intervention effect, and noise effect. The seed parameter ensures reproducibility of the generated data.

>>> df.head()
┌──────┬──────┬─────────┬────────────┬────────────┬────────────┐
│ unittimeycovariate_1covariate_2covariate_3┆
│ ------------------        │
│ i64i64f64f64f64f64        │
╞══════╪══════╪═════════╪════════════╪════════════╪════════════╡
│ 112.3400960.9500880.1342980.794324   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 122.3701350.9500880.1342980.794324   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 132.7764340.9500880.1342980.794324   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 143.1406310.9500880.1342980.794324   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 153.4107970.9500880.1342980.794324   │
└──────┴──────┴─────────┴────────────┴────────────┴────────────┘

Dataset instance

Create a Dataset instance from the generated or your own data. The Dataset class is used to encapsulate the data and provide methods for data validation and visualization.

Note: all units should have the same timestamps.

dataset = sx.Dataset(
    df,
    unit_column = 'unit',
    time_column = 'time',
    y_column = 'y',
    covariate_columns = ['covariate_1', 'covariate_2', 'covariate_3'],
    intervention_units=1,
    intervention_time=40,
)

You can plot the generated data using the plot() method of the Dataset instance.

>>> dataset.plot()

If there are too many units, you can specify the units to visualize.

>>> dataset.plot([1, 2, 3])

Synthetic Control

Perform Synthetic Control analysis on the Dataset instance using the synthetic_control() function.

sc = sx.synthetic_control(dataset)

You can plot the test and control units using the plot() method of the SyntheticControlResult instance returned by synthetic_control().

>>> sc.plot()

You can estimate the causal effect of the intervention using the estimate_effects() method.

>>> sc.estimate_effects()
[0.8398940970771678]

Placebo Test

Perform a Placebo Test to assess the statistical significance of the estimated treatment effect using the placebo_test() function.

effects_test, effects_placebo, sc_test, scs_placebo = sx.placebo_test(dataset)

Calculate the p-value of the estimated treatment effect using the calc_p_value() function from the stats module.

>>> sx.stats.calc_p_value(effects_test, effects_placebo)
0.03228841882463891

Sensitivity Check

Perform a sensitivity check on the synthetic control results using the placebo_sensitivity_check() function.

>>> effects_test, effects_placebo, sc_test, scs_placebo = sx.placebo_test(dataset)
>>> sx.placebo_sensitivity_check(dataset, effects_placebo)
1.05

This means this set up can capture the effect of the intervention which has more than 5 % uplift.

OR you can do with the ttest_sensitivity_check() function.

>>> sx.ttest_sensitivity_check(dataset)
1.05

Contributing

Please read developer docs for information on how to contribute to the project.

synthx's People

Contributors

kenki931128 avatar

Stargazers

Michiyasu avatar

Watchers

 avatar

synthx's Issues

Validation Period uplift

Add functionality to observe the uplift during validation period vs synthetic period. Assume validation period uplift should not exist.

Filtering by lift and moe

def remove_anomalous_series(df: pd.DataFrame, unit_column: str, y_column: str):
    """"""
    lift_threshold = 1.5
    moe_threshold = 2

    units = []
    for unit in tqdm(df[unit_column].unique()):
        df_unit = df[df[unit_column] == unit].reset_index(drop=True)

        mean = df_unit[y_column].mean()
        std = df_unit[y_column].std()

        lift = df_unit[y_column] / mean
        moe = (df_unit[y_column] - mean) / std

        # if (lift <= lift_threshold).all() and (lift >= 1 / lift_threshold).all() and (moe <= moe_threshold).all() and (-moe_threshold <= moe).all():
        if (lift <= lift_threshold).all() and (moe <= moe_threshold).all() and (-moe_threshold <= moe).all():
            # print(f'{unit} keep: {lift.min():.4f} {lift.max():.4f} {moe.min():.4f} {moe.max():.4f} {mean:.4f} {std:.4f}')
            units.append(unit)
        # else:
        #     print(f'{unit} removed: {lift.min():.4f} {lift.max():.4f} {moe.min():.4f} {moe.max():.4f} {mean:.4f} {std:.4f}')

    return df[df[unit_column].isin(units)].reset_index(drop=True)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.