Giter VIP home page Giter VIP logo

matheusfacure / python-causality-handbook Goto Github PK

View Code? Open in Web Editor NEW
2.6K 41.0 457.0 48.04 MB

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.

Home Page: https://matheusfacure.github.io/python-causality-handbook/landing-page.html

License: MIT License

Jupyter Notebook 99.94% Python 0.06%
causal-inference python causality data-science econometrics impact-estimation harmless-econometrics

python-causality-handbook's Introduction

Causal Inference for The Brave and True

img

DOI

A light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis. All in Python and with as many memes as I could find.

Check out the book here!

If you want to read the book in Brazilian Portuguese, @rdemarqui made this awesome translation: Inferência Causal para os Corajosos e Verdadeiros

If you want to read the book in Chinese, @xieliaing was very kind to make a translation: 因果推断:从概念到实践

If you want to read the book in Spanish, @donelianc was very kind to make a translation: Inferencia Causal para los Valientes y Verdaderos

If you want to read it in Korean, @jsshin2019 has put up a team to make the that translation possible: Python으로 하는 인과추론 : 개념부터 실습까지

Also, some really kind folks (@vietecon, @dinhtrang24 and @anhpham52) also translated this content into Vietnamese: Nhân quả Python

I like to think of this entire series as a tribute to Joshua Angrist, Alberto Abadie and Christopher Walters for their amazing Econometrics class. Most of the ideas here are taken from their classes at the American Economic Association. Watching them is what is keeping me sane during this tough year of 2020.

I'd also like to reference the amazing books from Angrist. They have shown me that Econometrics, or 'Metrics as they call it, is not only extremely useful but also profoundly fun.

Finally, I'd like to reference Miguel Hernan and Jamie Robins' book. It has been my trustworthy companion in the most thorny causal inference questions I've had to answer.

How to Support This Work

Causal Inference for the Brave and True is an open-source resource primarily focused on econometrics and the statistics of science. It exclusively utilizes free software, grounded in Python. The primary objective is to ensure accessibility, not only from a financial standpoint but also from an intellectual perspective. I've tried my best to keep the content entertaining while maintaining the necessary scientific rigor.

If you want to show your appreciation for this work, consider going to https://www.patreon.com/causal_inference_for_the_brave_and_true. Alternatively, you can purchase my book, Causal Inference in Python, which provides more insights into applying causal inference in the industry.

python-causality-handbook's People

Contributors

donelianc avatar dwitvliet avatar ethanknights avatar farhanreynaldo avatar gabrieltempass avatar griffinshufeldt avatar hellobiondi avatar iamlostcoast avatar juanitorduz avatar keesterbrugge avatar loudly-soft avatar matheusfacure avatar maxgrenderjones avatar mgermy avatar padarn avatar paullo0106 avatar petrkaderabek avatar raulpl avatar robinseaside avatar scottlyden avatar sergylog avatar stanton119 avatar teej avatar viniciusmsousa avatar white1033 avatar wolfherz avatar xiaowei-zhang avatar y1-y0 avatar zenogantner avatar zvibaratz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-causality-handbook's Issues

Issue on page /15-Synthetic-Control.html

Typo in "We Have Time" section
"Also assume that the data we have span T time periods, whith"

Also in the sentence (and some subsequent ones)
"Since unit j=1 is the treated one, Y^I_{jt} is factual but Y^N_{jt} is not. The challenge then becomes how do we estimate Y^N_{jt}" might be clearer to actually replace j = 1 in the subscripts since you're specifically talking about the treated group

Chapter 7: Bad cop unclear what exactly is not ok

In chapter 7, under section "bad cop", it's not clear to me what the mistake is exactly. Is it that the equality:
E[Y_i | T_i] = E[Y_i | Y_i > 0, T_i] * P(Y_i > 0 | T_i)
does not hold? This looks like an application of the law of total probability to me, so I'm not sure where the mistake is? If this equality indeed does not hold, I would strongly advise explicitly stating that in the text with an explanation of why not.

If however this equality is correct, then it's not at all clear to me what the mistake is that the text is warning about. The worked example that follows is extremely vague, in that it makes a reference to the selection bias term we calculated. But as we can see in the math above, there are a lot more terms to be considered when calculating the causal effect that way. So as long as we calculate all of these, it looks to me like there is no problem.

In any case I really like chapter 7 in general as it gives a lot of intuition about what to do and what not to do. Only this bad cop part needs some more explicit clarification (at least for me).

Issue on page /21-Debiasing-with-Orthogonalization.html

Should m3 be using the debiased values of the training set? I thought the point of this was to be able to train/regress a better model with the debiased values. I think this is used in the 3rd plot, but it is confusing to build up to these debiased values and not show a model that uses them

m3 = smf.ols(f"sales ~ pricecost + priceC(weekday) + price*temp", data=train).fit()

Why does this code use 'price' and 'sales' for predictions instead of
price-Mt(X) and sales-My(X)?

These Debiased values are used in the graphs, but I think it would be useful to see them make an individual prediction from the test data.

def predict_elast(model, price_df, h=0.01):
return (model.predict(price_df.assign(price=price_df["price"]+h))
- model.predict(price_df)) / h

debiased_test_pred = debiased_test.assign(**{
"m3_pred": predict_elast(m3, debiased_test),
})

debiased_test_pred.head()

Chapter 10

There is an issue in Chapter 10 on github.io, in the last paragraph of The Subclassification Estimator:

\\\\(\\bar{Y}_{k1}\\\\) and \\\\(\\bar{Y}_{k0}\\\\) are not shown as equations, but simply as raw text (\(\bar{Y}{k1}\) and \(\bar{Y}{k0}\)).

It's an odd error, as the equations are correctly shown in my local jupyter environment.

[FIX] Small typo in chapter 01

There is an issue on chapter 01, in the following paragraph

Answering a Different Kind of Question

[Last paragraph]

(...) But actually explaining why that is the case is a bit more involved. This is what this introduction to causal inference is all about. As for fhe rest of this book, it will be dedicated to figuring how to make association be causation.

It should be:

As for the rest of this book, it will be dedicated to figuring how to make association be causation.


One suggestion

Bias

[1th paragraph]

Bias is what makes association different from causation. Fortunately, it too can be easily understood with our intuition.

It could be:

Bias is what makes association different from causation. Fortunately, it can be easily understood with our intuition.

Question chapter 1: how to equate ATET to ATE

In the first chapter "Introduction to causality" (in subchapter "Bias") you state that given
E[Y0|T=0] = E[Y0|T=1]
we can deduct that
E[Y|T=1] − E[Y|T=0] = E[Y1 − Y0|T=1].

However, a few lines later you state:
E[Y|T=1] − E[Y|T=0] = ATE = ATET
In other words you say:
E[Y|T=1] − E[Y|T=0] = E[Y1−Y0] = E[Y1−Y0|T=1]

I can see why E[Y|T=1] − E[Y|T=0] = E[Y1−Y0|T=1] since you derive it. But I can't figure out why this would also be equal to E[Y1−Y0] (the overall average treatment effect). Do you not also need that E[Y1|T=0] = E[Y1|T=1]? If so why not? Could you elaborate? I think the easiest and clearest way to show this would be to just show the mathematical derivation from E[Y1−Y0] to either E[Y1−Y0|T=1] or E[Y|T=1] − E[Y|T=0].

Intuitively there also seems to be something missing for me. In your own words: E[Y0|T=0] = E[Y0|T=1] is to say that treatment and control group are comparable before the treatment. But this condition, intuitively, does not seem sufficient to say that E[Y1−Y0|T=1] = E[Y1−Y0|T=0]. Because both groups may indeed be comparable before the treatment, but the effect of the treatment might still be quite different for both groups. So in order to state E[Y1−Y0|T=1] = E[Y1−Y0|T=0], I intuitively feel another condition needs to be fulfilled, which is that the effect of the treatment is the same for both groups. In other words: E[Y1|T=0] = E[Y1|T=1]. But perhaps I am missing something?

Issue in chapter 22

  1. To be clar -> to be clear
  2. problems with positivity are problems of the data itself -> positivity issues/challenges are problems of the data itself

Typo in chapter 03

There is an issue on chapter 03, in the following paragraph

What we are seeing above is exactly what is expected according to the Moivre’s equation. As the number of students grows, the average score becomes more and more precise. Schools with very few samples can have very high and very low scores simply due to chance. This is less likley to occur with large schools. Moivre’s equation talks about a fundamental fact about the reality of information and records in the form of data: it is always imprecise. The question then becomes how imprecise.

It should be:

What we are seeing above is exactly what is expected according to the Moivre’s equation. As the number of students grows, the average score becomes more and more precise. Schools with very few samples can have very high and very low scores simply due to chance. This is less likely to occur with large schools. Moivre’s equation talks about a fundamental fact about the reality of information and records in the form of data: it is always imprecise. The question then becomes how imprecise.

treated/untreated wrong way round?

There is an issue on chapter 7 , in the following paragraph

When we do that, the treated and control are no longer comparable. As we can see, the treated is now only composed of the segment of customers that will spend even without the campaign. Also notice that we can even know the direction of the bias here.

It should be (right?)

When we do that, the treated and control are no longer comparable. As we can see, the untreated is now only composed of the segment of customers that will spend even without the campaign. Also notice that we can even know the direction of the bias here.

Typo in chapter 09 (bis)

There is an issue on chapter XXX, in the following paragraph

This shows that the result with 2SLS is much lower than the one we got with OLS: 3.29 against 27.60. This makes sense, since the causal effect estimated with OLS is positively biased. We also need to remember about LATE. 3.29 is the average causal effect on compilers. Unfortunately, we can’t say anything about those never takers. This means that we are estimating the effect on the richer segment of the population that have newer phones.

It should be:

This shows that the result with 2SLS is much lower than the one we got with OLS: 3.29 against 27.60. This makes sense, since the causal effect estimated with OLS is positively biased. We also need to remember about LATE. 3.29 is the average causal effect on compliers. Unfortunately, we can’t say anything about those never takers. This means that we are estimating the effect on the richer segment of the population that have newer phones.

(Same typo in the Key Ideas paragraphs.)

Couple typos on page /04-Graphical-Causal-Models.html

Hi,

I noticed a couple small typos:

As we will see, this causal graphical models language will help us make our thinking about causality more more clear, as it makes it explicit our beliefs about how the world works.

By the same token, in our exemple,

And in the causal graph it says "Inteligence" just above the following:

Unfortunately, it is not always possible to control for all common causes

Thanks for the great book!

Typo in Chapter 08

There is an issue on chapter 08, in the following paragraph

However, I’m interested in the impact of T on Y, not Z on Y. So, I’ma estimate the easy effect of Z on Y and scale it by the effect of Z on T, to convert the effect to T units instead of Z units”.

It should be

However, I’m interested in the impact of T on Y, not Z on Y. So, I’ll estimate the easy effect of Z on Y and scale it by the effect of Z on T, to convert the effect to T units instead of Z units”.

Issue on page /17-Predictive-Models-101.html

Hi! I love your book, I'm reading it again and I'll probably file a lot of issues.

I think you should use dummy encoding on the region feature instead of transforming it into a region net income proxy. I think this might have crippled the ML model.

If this was shown at least as an alternative, I'd trust the conclusion more.

'To help it, we will take the region feature, which is categorical, and encode it with the lower end of the confidence interval of net income for that region. Remember that we have those stored in the regions_to_net dictionary?'

Typo in Chapter 4, incorrect node order

There is an issue on chapter 4, in the following paragraph

"First, look at this very simple graph. A causes B which causes C. Or X causes Y which causes Z."

g = gr.Digraph()
g.edge("A", "C")
g.edge("C", "B")

It should be

"First, look at this very simple graph. A causes B which causes C. Or X causes Y which causes Z."

g = gr.Digraph()
g.edge("A", "B")
g.edge("B", "C")

Alternatively, the wording of the sentence can be reordered, however given expected ABC order from the reader the original correction is probably best.

Maybe even A causes B, B causes C would be clear.

Really really enjoying this book, learning a lot in a short amount of time. Keep up the good work!!

Clarification of time effects (Chapter 13)

In Chapter 13, it is stated that "adding a time dummy would control for variables that are fixed across time". However, isn't it the case that time effects are fixed within each time point, but variable across time?

Intuition behind 𝜅

There's this line:

We can show that the impact of T on Z is equal to the impact of Z on Y, scaled by the impact of Z on T.

I'm confused, shouldn't it be Y instead of Z? Such that: "We can show that the impact of T on Y is equal to the impact of Z on Y, scaled by the impact of Z on T."

Part 1 - Randomised Experiments

Fourth paragraph into "In a school far, far away":

Or, on the flip side, it could be that online classes are cheaper and are composed mostly of less wealthy students, who might have to work besides studying. In this case, these students would do worse than those from the presidential schools even if they took presential classes. If this was the case, we would have bias in the other direction, where the treated are academically worse than the untreated: 𝐸[𝑌0|𝑇=1]<𝐸[𝑌0|𝑇=0].

It should be

presential

Issue on page /02-Randomised-Experiments.html

A bit confused by wording. Seems like students were, after all, randomized.

They’ve randomized not the students, but the classes. Some of them were randomly assigned to have face-to-face lectures, others, to have only online lectures and a third group, to have a blended format of both online and face-to-face lectures.

Typo in chapter 09

There is an issue on chapter XXX, in the following paragraph

Just so we can remember,

This is the population where those that get the instrument turned on have the treatment level higher than if they had the instrument turned off. In other words, this is the compiler population. Just so we can remember,

Compilers means that

It should be:

This is the population where those that get the instrument turned on have the treatment level higher than if they had the instrument turned off. In other words, this is the compiler population. Just so we can remember,

Compliers mean that

[suggestion here]

Typo in Chapter 16

There is an issue on chapter XXX, in the following paragraph

We can think of fuzzy RD as a sort of non compliance. Passing the threshold should make everyone receive the diploma, but some students, the never takers, don’t get it. Likewise, being below the threshold should prevent you from getting a diploma, but some students, the allways takers, manage to get it anyway.

It should be
We can think of fuzzy RD as a sort of non compliance. Passing the threshold should make everyone receive the diploma, but some students, the never takers, don’t get it. Likewise, being below the threshold should prevent you from getting a diploma, but some students, the always takers, manage to get it anyway.

Issue on page /14-Difference-in-Difference.html

Possible typo in section "DID Estimator"

"What this does is take the treated unit before the treated unit before the intervention and adds a trend component to it"

think the extra "before the treated unit" should be deleted?

"What this does is take the treated unit before the intervention and adds a trend component to it"

Typo in Chaper 05

There is an issue on chapter 05, in the following paragraph

Now, let’s appreciate how cool this is. It means that the coefficient of a multivariate regression is the bivariate coefficient of the same regressor after accounting for the effect of other variables in the model. In causal inference terms, 𝜅
κ is the bivariate coeficiente of 𝑇 T after having used all other variables to predict it.

It should be

Now, let’s appreciate how cool this is. It means that the coefficient of a multivariate regression is the bivariate coefficient of the same regressor after accounting for the effect of other variables in the model. In causal inference terms, 𝜅
κ is the bivariate coefficient of 𝑇 T after having used all other variables to predict it.

Issue on page /12-Doubly-Robust-Estimation.html

In the last paragraph, there is a statment " Still, doubly robust estimation can combine those two wrong models to make them less wrong. "
can you give more details about this? because practically, in real world, both the two models are wrong/biased, in such case will the DRE be less biased( and with bigger variance?)
I've read from some documents that "If both models are slightly incorrect, doubly robust estimator can be more biased"
if that's the case, then DRE will have problems in real world.

Issue on page /18-When-Prediction-Fails.html

When training the ML model for policy, why don't you don't include coupon value?

It is a good indicator for net value, and would surely improve model performance.
Then you could just try [model.predict(customer, coupon) for coupon in coupons] and select the best one.

Issue on page /05-The-Unreasonable-Effectiveness-of-Linear-Regression.html

model_2 = smf.ols('lhwage ~ educ +' + '+'.join(controls), data=wage).fit()
model_2.summary().tables[1]

How come the coef of educ from smf.ols, which is 0.0411 is the same as the one from kappa = t_tilde.cov(y) / t_tilde.var()? I think smf.ols('lhwage ~ educ +' + '+'.join(controls), data=wage).fit() is fitting a ols with lhwage being the indigenous variable.

Would appreciate it if you could address this question. Thanks a lot!

Typo in Chapter 12

There is an issue on chapter XXX, in the following paragraph

Keys Ideas¶

It should be
Key Ideas¶

Clarifications on Chapter 4

I would appreciate if you could clarify the following:

  1. Is any indirect path between T and Y - backdoor path?

A path is blocked if and only if:
It contains a non collider that has been conditioned on

Could you please clarify why did you use "non collider" instead of simply "confounder"? Non colliders are either confounders or mediators. But conditioning on mediator results in the selection bias. So that leaves us with confounders.

Thank you!

Issue on page /01-Introduction-To-Causality.html

Hi Matheus, thank you very much for the hard work you have put into this book! I am just at the beginning and I already learned a lot.

I think there is a typo problem on the math of ATE shown on the figure below, as it should be 125.

image

Issue on page /01-Introduction-To-Causality.html

Below the scatterplot with conterfactuals in light color, the referemce to the bias should be the right figure. The text should instead read:

the right plot, we depicted what is the bias that we’ve talked about before. We get the bias if we set everyone to not receive the treatment

Typo in Chapter 21

There is an issue on chapter XXX, in the following paragraph

The idea was to estimate the elasticity 𝛿𝑦𝛿𝑡 as the coeficiente of a single variable linear regression of y ~ t. However, this only works if the treatment is randomly assigned. If it isn’t, we get into trouble due to omitted variable bias.

It should be
The idea was to estimate the elasticity 𝛿𝑦𝛿𝑡 as the coefficient of a single variable linear regression of y ~ t. However, this only works if the treatment is randomly assigned. If it isn’t, we get into trouble due to omitted variable bias.

Typo in Chapter 18

There is an issue on chapter XXX, in the following paragraph

As a consequence, the perceived effect ends up looking smaller than the actual effect. So there you have if. We used another explanation to get to the exact same conclusion as before: segmenting units by a predictive model hinders our ability to identify the causal effect.

It should be:

As a consequence, the perceived effect ends up looking smaller than the actual effect. So there you have it. We used another explanation to get to the exact same conclusion as before: segmenting units by a predictive model hinders our ability to identify the causal effect.

Issue: Chapter Beyond Confounders

There is an issue on chapter Beyond Confounders, in the following paragraph

image

I think it should be T_i = 0 on the right hand side. But if I incorrectly raise this issue, you may close it immediately.

Grammar correction in Chapter 1 subheading "Key Ideas" ; first paragraph

There is an issue on chapter 1, in the following paragraph:

So far, we’ve seen that association is not causation. Most importantly, we’ve seen precisely why it isn’t and how can we make association be causation.

It should be:
So far, we’ve seen that association is not causation. Most importantly, we’ve seen precisely why it isn’t and how we can make association be causation.

Chapter 19 | Section Predicting Elasticity

When building the m3 model, the presented formula is:

sales_i = β0 + β1price_i + β2X_i∗price_i + β3X_i + e_i

but the fitted model in the stats model takes the following formula:

sales ~ price*cost + price*C(weekday) + price*temp

which is missing the individual terms. If my understanding is right, shoudn't the formula be:

sales ~ price + cost + temp + C(weekday) + price*cost + price*C(weekday) + price*temp

or the derative:

δSales/δPrice = β2X_i
?

If this is right, I can submit a PR with the change. I just wanted to check if my reasoning was correct before doing it.

social image

Set the social image in github settings so that that cool splash image on the readme appears when someone posts it on LI or whatever.

wrong reference to plot

There is an issue on chapter 06-Grouped-and-Dummy-Regression, in the following paragraph
The parameter estimate is larger. What is happening here is that the regression is placing equal weight for all points. If we plot the model along the grouped points, we see that the non weighted model is giving more importance to small points in the lower right than it should. As a consequence, the line has a higher slope.
It should be
The parameter estimate is larger. What is happening here is that the regression is placing equal weight for all points. If we plot the model along the grouped points, we see that the non weighted model is giving more importance to small points in the lower left than it should. As a consequence, the line has a higher slope.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.