Comments (10)
Sounds great @nsalas24 Yes, makes sense to start with simple cross-validation or permutation tests. Feel free to ping if you have any questions as you work on the refuter.
from dowhy.
Thanks for starting this discussion @j-chou . These are important questions, I share my thoughts below.
-
For double ML, ideally we would like to allow the user to choose models and parameters. However, given that econml already implements double ML, it might make the most sense to integrate with econml and call their implementation from within DoWhy. On cross-validation, that's a great point. The standard double ML method does not talk about cross-validation for the ML models and I think that's a necessary step to be able to choose suitable predictors (although it does use cross-fitting for the final estimate for unbiasedness).
-
I can think of a few refutations that are relevant whenever a causal inference method (CI method) is conditioning on high-dimensional confounders:
a) Identification: There is a tendency to include all known variables as confounders. What if one of the "confounders" is actually an instrument? This could be implemented by choosing a confounder variable and moving it in the causal graph to be an instrument, and then rerunning the CI method. It can be especially useful if the user has a "good guess" about which variables might be candidates for being an instrument. Interestingly, this could be useful even for refuting the average treatment effect (simply removing a confounder, opposite to adding another one).
b) Estimation: Many of these CI methods depend themselves on complex ML models. So perturbing the hyperparameters of these models or even simply resetting the random seed can be useful ways to check sensitivity of the estimates. Of course, we might want to find an efficient way of rerunning these CI methods because many of them can take a longer time to execute.
c) Another refutation could be through an independent average treatment effect estimator. Given a set of disjoint subsets on which heterogeneous treatments are estimated, their weighted combination can be used to derive an estimate for ATE. This estimate, ideally should match that from another ATE method.
d) In addition, for any given heterogeneous treatment effect, the refutation methods in DoWhy still apply. For example, we could consider the subgroup on which conditional ATE is estimated, then artificially make the treatment random for that subgroup of people (and thus zero conditional ATE) and then rerun the CI method for the conditional ATE.
These are some that I'm thinking about---I'm sure that there are others that could be also important. Would love to know if you have any ideas?
from dowhy.
Thanks for the response and sorry for the late reply!
-
Sounds good. I'll look into calling econml's double ML implementation.
-
I really like the general approach of perturbing the DAG for sensitivity analysis as you suggest. Perhaps we could implement a set of functions for basic graph operations for a start:
- confounder_to_iv(variable) - takes a variable as input, converts it to an IV and returns the 2SLS estimate of the ATE
- add_confounder(variable_a, variable_b, coef_a, coef_b) - adds a linear confounder between variables and b with the specified strengths and returns a new regression estimate of the ATE
- add_collider(variable_a, variable_b, coef_a, coef_b) - adds a linear collider between variables a and b with specified strengths and returns a new regression estimate of the ATE
Are there other graph operations you think would be good to include?
from dowhy.
Thanks @j-chou. The three operations you suggest are great, and we should definitely try to add them to DoWhy. In addition, @emrekiciman and I have been discussing on the broader issue of conditional effects (e.g. CATE) and how to incorporate them in the current DoWhy API. We would like to support different CATE methods in addition to double-ml, but also ensure a simple API that works for all such methods.
I am preparing a document that lays down specs for a general (updated) API for DoWhy. I will include the three functions you suggest above, but also frame them in the general setup for CATE, ATE, ATT and how to refute them.
Would you like to help us in arriving at the correct specs for the API? One way is that I could start a Wiki article on Github that we all can comment on.
from dowhy.
@amit-sharma Would love to help out with the API. A Wiki article sounds great!
from dowhy.
@j-chou thanks for your patience. I have added a wiki roadmap here: https://github.com/microsoft/dowhy/wiki/Roadmap
Will really appreciate your feedback on it.
from dowhy.
First, thanks for open sourcing this package, I've learned a lot from it!
To add to the discussion regarding conditional effects- https://github.com/uber/causalml appears promising in terms of 1) implementing a variety of meta-learners for estimating heterogeneous treatment effects 2) flexibility in model parameterization. Perhaps inspiration can be drawn from it as well as EconML.
As mentioned, I think cross-validation is vital to any would be user of the meta-learner, and the authors of the R-learner implement it (https://github.com/xnie/rlearner/tree/master) in a small R package.
Something @amit-sharma brought up that I think would be a great refutation to add for these methods would be perturbing the meta-learner's hyper-parameters to measure the change in distribution of CATE's, change in ATE, etc. I don't see this in the project roadmap, is this a functionality worth adding?
from dowhy.
Thanks for your comment @nsalas24 and sorry for the super late reply--somehow missed responding here. I've just integrated the metalearners from econml into DoWhy so you can directly call a metalearner as shown here: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy-conditional-treatment-effects.ipynb
Would be great to add refutations based on cross-validation or parameter perturbation? Will you be interested in contributing?
from dowhy.
Hey @amit-sharma,
On the surface, the integration with EconML looks great. It looks like they've introduced several new 'categories' of estimation methods and doWhy can now call upon them. So yes I can start working on building an new refutation method to test the consistency in CATEs of some of these more complex estimation methods. I think it makes most sense to start with a simple cross-validation or random-seed permutation, as some of these methods actually invoke several supervised models (e.g. XLearner) making the hyper-parameter space search a bit prohibitive.
from dowhy.
Closing due to inactivity. @nsalas24 if you'd still like to contribute refuters for cate estimators, let me know.
from dowhy.
Related Issues (20)
- Counterfactual Reasoning with Categorical Variables HOT 3
- Time complexity of constructing cause-and-effect diagram HOT 3
- Support polars data frames HOT 2
- What is the purpose of providing observation data in gcm.conventional_samples()?
- Python 3.12 support HOT 9
- Clarify the differences among refute methods HOT 11
- Feature relevance/Influence HOT 26
- Graphviz installation : --include-path not recognized anymore HOT 4
- Does this package support non-English languages? HOT 3
- Question about Dummy Outcome Refuter HOT 2
- Inconsistency in the placebo_treatment_refuter when using estimate_effect of IV HOT 1
- numpy.dual is dropped but it still occurs in dowhy HOT 2
- NetworkXError: graph should be directed acyclic HOT 4
- Refutation & Overlap Error ("data_subset_refuter", "add_unobserved_common_cause", assess_support_and_overlap_overrule) HOT 2
- No Backdoor Path Available
- Clarification on how to use gcm properly for confounders adjustment HOT 5
- Can you provide code demo for each function? HOT 2
- How is propensity score matching implemented? HOT 2
- Interpreting mean while using logistic regression to estimate causal effect. HOT 1
- model.estimate_effect and model.refute_astimate throws 'A column-vector y was passed ...' error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dowhy.