Giter VIP home page Giter VIP logo

Comments (10)

amit-sharma avatar amit-sharma commented on July 19, 2024 1

Sounds great @nsalas24 Yes, makes sense to start with simple cross-validation or permutation tests. Feel free to ping if you have any questions as you work on the refuter.

from dowhy.

amit-sharma avatar amit-sharma commented on July 19, 2024

Thanks for starting this discussion @j-chou . These are important questions, I share my thoughts below.

  1. For double ML, ideally we would like to allow the user to choose models and parameters. However, given that econml already implements double ML, it might make the most sense to integrate with econml and call their implementation from within DoWhy. On cross-validation, that's a great point. The standard double ML method does not talk about cross-validation for the ML models and I think that's a necessary step to be able to choose suitable predictors (although it does use cross-fitting for the final estimate for unbiasedness).

  2. I can think of a few refutations that are relevant whenever a causal inference method (CI method) is conditioning on high-dimensional confounders:
    a) Identification: There is a tendency to include all known variables as confounders. What if one of the "confounders" is actually an instrument? This could be implemented by choosing a confounder variable and moving it in the causal graph to be an instrument, and then rerunning the CI method. It can be especially useful if the user has a "good guess" about which variables might be candidates for being an instrument. Interestingly, this could be useful even for refuting the average treatment effect (simply removing a confounder, opposite to adding another one).

b) Estimation: Many of these CI methods depend themselves on complex ML models. So perturbing the hyperparameters of these models or even simply resetting the random seed can be useful ways to check sensitivity of the estimates. Of course, we might want to find an efficient way of rerunning these CI methods because many of them can take a longer time to execute.

c) Another refutation could be through an independent average treatment effect estimator. Given a set of disjoint subsets on which heterogeneous treatments are estimated, their weighted combination can be used to derive an estimate for ATE. This estimate, ideally should match that from another ATE method.

d) In addition, for any given heterogeneous treatment effect, the refutation methods in DoWhy still apply. For example, we could consider the subgroup on which conditional ATE is estimated, then artificially make the treatment random for that subgroup of people (and thus zero conditional ATE) and then rerun the CI method for the conditional ATE.

These are some that I'm thinking about---I'm sure that there are others that could be also important. Would love to know if you have any ideas?

from dowhy.

j-chou avatar j-chou commented on July 19, 2024

Thanks for the response and sorry for the late reply!

  1. Sounds good. I'll look into calling econml's double ML implementation.

  2. I really like the general approach of perturbing the DAG for sensitivity analysis as you suggest. Perhaps we could implement a set of functions for basic graph operations for a start:

  • confounder_to_iv(variable) - takes a variable as input, converts it to an IV and returns the 2SLS estimate of the ATE
  • add_confounder(variable_a, variable_b, coef_a, coef_b) - adds a linear confounder between variables and b with the specified strengths and returns a new regression estimate of the ATE
  • add_collider(variable_a, variable_b, coef_a, coef_b) - adds a linear collider between variables a and b with specified strengths and returns a new regression estimate of the ATE

Are there other graph operations you think would be good to include?

from dowhy.

amit-sharma avatar amit-sharma commented on July 19, 2024

Thanks @j-chou. The three operations you suggest are great, and we should definitely try to add them to DoWhy. In addition, @emrekiciman and I have been discussing on the broader issue of conditional effects (e.g. CATE) and how to incorporate them in the current DoWhy API. We would like to support different CATE methods in addition to double-ml, but also ensure a simple API that works for all such methods.

I am preparing a document that lays down specs for a general (updated) API for DoWhy. I will include the three functions you suggest above, but also frame them in the general setup for CATE, ATE, ATT and how to refute them.

Would you like to help us in arriving at the correct specs for the API? One way is that I could start a Wiki article on Github that we all can comment on.

from dowhy.

j-chou avatar j-chou commented on July 19, 2024

@amit-sharma Would love to help out with the API. A Wiki article sounds great!

from dowhy.

amit-sharma avatar amit-sharma commented on July 19, 2024

@j-chou thanks for your patience. I have added a wiki roadmap here: https://github.com/microsoft/dowhy/wiki/Roadmap

Will really appreciate your feedback on it.

from dowhy.

nsalas24 avatar nsalas24 commented on July 19, 2024

First, thanks for open sourcing this package, I've learned a lot from it!

To add to the discussion regarding conditional effects- https://github.com/uber/causalml appears promising in terms of 1) implementing a variety of meta-learners for estimating heterogeneous treatment effects 2) flexibility in model parameterization. Perhaps inspiration can be drawn from it as well as EconML.

As mentioned, I think cross-validation is vital to any would be user of the meta-learner, and the authors of the R-learner implement it (https://github.com/xnie/rlearner/tree/master) in a small R package.

Something @amit-sharma brought up that I think would be a great refutation to add for these methods would be perturbing the meta-learner's hyper-parameters to measure the change in distribution of CATE's, change in ATE, etc. I don't see this in the project roadmap, is this a functionality worth adding?

from dowhy.

amit-sharma avatar amit-sharma commented on July 19, 2024

Thanks for your comment @nsalas24 and sorry for the super late reply--somehow missed responding here. I've just integrated the metalearners from econml into DoWhy so you can directly call a metalearner as shown here: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy-conditional-treatment-effects.ipynb

Would be great to add refutations based on cross-validation or parameter perturbation? Will you be interested in contributing?

from dowhy.

nsalas24 avatar nsalas24 commented on July 19, 2024

Hey @amit-sharma,

On the surface, the integration with EconML looks great. It looks like they've introduced several new 'categories' of estimation methods and doWhy can now call upon them. So yes I can start working on building an new refutation method to test the consistency in CATEs of some of these more complex estimation methods. I think it makes most sense to start with a simple cross-validation or random-seed permutation, as some of these methods actually invoke several supervised models (e.g. XLearner) making the hyper-parameter space search a bit prohibitive.

from dowhy.

amit-sharma avatar amit-sharma commented on July 19, 2024

Closing due to inactivity. @nsalas24 if you'd still like to contribute refuters for cate estimators, let me know.

from dowhy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.