Hi, I have some questions regarding the project for implementing other causal estimati

Sounds great <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Thanks for starting this discussion <a class="user-mention notranslate" data-hovercard

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for your comment <a class="user-mention notranslate" data-hovercard-type="user"

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Closing due to inactivity. <a class="user-mention notranslate" data-hovercard-type="us

Clarification on implementation of double ML about dowhy HOT 10 CLOSED

py-why commented on July 19, 2024

Clarification on implementation of double ML

from dowhy.

Comments (10)

amit-sharma commented on July 19, 2024 1

Sounds great @nsalas24 Yes, makes sense to start with simple cross-validation or permutation tests. Feel free to ping if you have any questions as you work on the refuter.

from dowhy.

amit-sharma commented on July 19, 2024

Thanks for starting this discussion @j-chou . These are important questions, I share my thoughts below.

For double ML, ideally we would like to allow the user to choose models and parameters. However, given that econml already implements double ML, it might make the most sense to integrate with econml and call their implementation from within DoWhy. On cross-validation, that's a great point. The standard double ML method does not talk about cross-validation for the ML models and I think that's a necessary step to be able to choose suitable predictors (although it does use cross-fitting for the final estimate for unbiasedness).
I can think of a few refutations that are relevant whenever a causal inference method (CI method) is conditioning on high-dimensional confounders:
a) Identification: There is a tendency to include all known variables as confounders. What if one of the "confounders" is actually an instrument? This could be implemented by choosing a confounder variable and moving it in the causal graph to be an instrument, and then rerunning the CI method. It can be especially useful if the user has a "good guess" about which variables might be candidates for being an instrument. Interestingly, this could be useful even for refuting the average treatment effect (simply removing a confounder, opposite to adding another one).

b) Estimation: Many of these CI methods depend themselves on complex ML models. So perturbing the hyperparameters of these models or even simply resetting the random seed can be useful ways to check sensitivity of the estimates. Of course, we might want to find an efficient way of rerunning these CI methods because many of them can take a longer time to execute.

c) Another refutation could be through an independent average treatment effect estimator. Given a set of disjoint subsets on which heterogeneous treatments are estimated, their weighted combination can be used to derive an estimate for ATE. This estimate, ideally should match that from another ATE method.

d) In addition, for any given heterogeneous treatment effect, the refutation methods in DoWhy still apply. For example, we could consider the subgroup on which conditional ATE is estimated, then artificially make the treatment random for that subgroup of people (and thus zero conditional ATE) and then rerun the CI method for the conditional ATE.

These are some that I'm thinking about---I'm sure that there are others that could be also important. Would love to know if you have any ideas?

from dowhy.

j-chou commented on July 19, 2024

Thanks for the response and sorry for the late reply!

Sounds good. I'll look into calling econml's double ML implementation.
I really like the general approach of perturbing the DAG for sensitivity analysis as you suggest. Perhaps we could implement a set of functions for basic graph operations for a start:

confounder_to_iv(variable) - takes a variable as input, converts it to an IV and returns the 2SLS estimate of the ATE
add_confounder(variable_a, variable_b, coef_a, coef_b) - adds a linear confounder between variables and b with the specified strengths and returns a new regression estimate of the ATE
add_collider(variable_a, variable_b, coef_a, coef_b) - adds a linear collider between variables a and b with specified strengths and returns a new regression estimate of the ATE

Are there other graph operations you think would be good to include?

from dowhy.

amit-sharma commented on July 19, 2024

Thanks @j-chou. The three operations you suggest are great, and we should definitely try to add them to DoWhy. In addition, @emrekiciman and I have been discussing on the broader issue of conditional effects (e.g. CATE) and how to incorporate them in the current DoWhy API. We would like to support different CATE methods in addition to double-ml, but also ensure a simple API that works for all such methods.

I am preparing a document that lays down specs for a general (updated) API for DoWhy. I will include the three functions you suggest above, but also frame them in the general setup for CATE, ATE, ATT and how to refute them.

Would you like to help us in arriving at the correct specs for the API? One way is that I could start a Wiki article on Github that we all can comment on.

from dowhy.

j-chou commented on July 19, 2024

@amit-sharma Would love to help out with the API. A Wiki article sounds great!

from dowhy.

amit-sharma commented on July 19, 2024

@j-chou thanks for your patience. I have added a wiki roadmap here: https://github.com/microsoft/dowhy/wiki/Roadmap

Will really appreciate your feedback on it.

from dowhy.

nsalas24 commented on July 19, 2024

First, thanks for open sourcing this package, I've learned a lot from it!

To add to the discussion regarding conditional effects- https://github.com/uber/causalml appears promising in terms of 1) implementing a variety of meta-learners for estimating heterogeneous treatment effects 2) flexibility in model parameterization. Perhaps inspiration can be drawn from it as well as EconML.

As mentioned, I think cross-validation is vital to any would be user of the meta-learner, and the authors of the R-learner implement it (https://github.com/xnie/rlearner/tree/master) in a small R package.

Something @amit-sharma brought up that I think would be a great refutation to add for these methods would be perturbing the meta-learner's hyper-parameters to measure the change in distribution of CATE's, change in ATE, etc. I don't see this in the project roadmap, is this a functionality worth adding?

from dowhy.

amit-sharma commented on July 19, 2024

Thanks for your comment @nsalas24 and sorry for the super late reply--somehow missed responding here. I've just integrated the metalearners from econml into DoWhy so you can directly call a metalearner as shown here: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy-conditional-treatment-effects.ipynb

Would be great to add refutations based on cross-validation or parameter perturbation? Will you be interested in contributing?

from dowhy.

nsalas24 commented on July 19, 2024

Hey @amit-sharma,

On the surface, the integration with EconML looks great. It looks like they've introduced several new 'categories' of estimation methods and doWhy can now call upon them. So yes I can start working on building an new refutation method to test the consistency in CATEs of some of these more complex estimation methods. I think it makes most sense to start with a simple cross-validation or random-seed permutation, as some of these methods actually invoke several supervised models (e.g. XLearner) making the hyper-parameter space search a bit prohibitive.

from dowhy.

amit-sharma commented on July 19, 2024

Closing due to inactivity. @nsalas24 if you'd still like to contribute refuters for cate estimators, let me know.

from dowhy.

Clarification on implementation of double ML about dowhy HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent