You're reading from Causal Inference and Discovery in Python Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more

Product type Paperback

Published in May 2023

Publisher Packt

ISBN-13 9781804612989

Length 456 pages

Edition 1st Edition

Languages

Python

Tools

PyTorch

Concepts

Data Science

Author (1):

Aleksander Molak

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Causality – an Introduction

2. Chapter 1: Causality – Hey, We Have Machine Learning, So Why Even Bother? FREE CHAPTER

3. Chapter 2: Judea Pearl and the Ladder of Causation

4. Chapter 3: Regression, Observations, and Interventions

5. Chapter 4: Graphical Models

6. Chapter 5: Forks, Chains, and Immoralities

7. Part 2: Causal Inference

8. Chapter 6: Nodes, Edges, and Statistical (In)dependence

9. Chapter 7: The Four-Step Process of Causal Inference

10. Chapter 8: Causal Models – Assumptions and Challenges

11. Chapter 9: Causal Inference and Machine Learning – from Matching to Meta-Learners

12. Chapter 10: Causal Inference and Machine Learning – Advanced Estimators, Experiments, Evaluations, and More

13. Chapter 11: Causal Inference and Machine Learning – Deep Learning, NLP, and Beyond

14. Part 3: Causal Discovery

15. Chapter 12: Can I Have a Causal Graph, Please?

16. Chapter 13: Causal Discovery and Machine Learning – from Assumptions to Applications

17. Chapter 14: Causal Discovery and Machine Learning – Advanced Deep Learning and Beyond

18. Chapter 15: Epilogue

19. Index

20. Other Books You May Enjoy

How not to lose money… and human lives

We learned that randomized experiments can help us avoid confounding. Unfortunately, they are not always available. Sometimes, experiments can be too costly to perform, unethical, or virtually impossible (for example, running an experiment where the treatment is a migration of a large group of some population). In this section, we’ll look at a couple of scenarios where we’re limited to observational data but we still want to draw causal conclusions. These examples will provide us with a solid foundation for the next chapters.

A marketer’s dilemma

Imagine you are a tech-savvy marketer and you want to effectively allocate your direct marketing budget. How would you approach this task? When allocating the budget for a direct marketing campaign, we’d like to understand what return we can expect if we spend a certain amount of money on a given person. In other words, we’re interested in estimating the effect of our actions on some customer outcomes (Gutierrez, Gérardy, 2017). Perhaps we could use supervised learning to solve this problem? To answer this question, let’s take a closer look at what exactly we want to predict.

We’re interested in understanding how a given person would react to our content. Let’s encode it in a formula:

τ i = Y i(1) − Y i(0)

In the preceding formula, the following applies:

τ i is the treatment effect for person i
Y i(1) is the outcome for person i when they received the treatment T (in our example, they received marketing content from us)
Y i(0) is the outcome for the same person i given they did not receive the treatment T

What the formula says is that we want to take the person i’s outcome Y i when this person does not receive treatment T and subtract it from the same person’s outcome when they receive treatment T.

An interesting thing here is that to solve this equation, we need to know what person i’s response is under treatment and under no treatment. In reality, we can never observe the same person under two mutually exclusive conditions at the same time. To solve the equation in the preceding formula, we need counterfactuals.

Counterfactuals are estimates of how the world would look if we changed the value of one or more variables, holding everything else constant. Because counterfactuals cannot be observed, the true causal effect τ is unknown. This is one of the reasons why classic machine learning cannot solve this problem for us. A family of causal techniques usually applied to problems like this is called uplift modeling, and we’ll learn more about it in Chapter 9 and 10.

Let’s play doctor!

Let’s take another example. Imagine you’re a doctor. One of your patients, Jennifer, has a rare disease, D. Additionally, she was diagnosed with a high risk of developing a blood clot. You study the information on the two most popular drugs for D. Both drugs have virtually identical effectiveness on D, but you’re not sure which drug will be safer for Jennifer, given her diagnosis. You look into the research data presented in Table 1.1:

Drug	A		B
Blood clot	Yes	No	Yes	No
Total	27	95	23	99
Percentage	22%	78%	19%	81%

Table 1.1 – Data for drug A and drug B

The numbers in Table 1.1 represent the number of patients diagnosed with disease D who were administered drug A or drug B. Row 2 (Blood clot) gives us information on whether a blood clot was found in patients or not. Note that the percentage scores are rounded. Based on this data, which drug would you choose? The answer seems pretty obvious. 81% of patients who received drug B did not develop blood clots. The same was true for only 78% of patients who received drug A. The risk of developing a blood clot is around 3% lower for patients receiving drug B compared to patients receiving drug A.

This looks like a fair answer, but you feel skeptical. You know that blood clots can be very risky and you want to dig deeper. You find more fine-grained data that takes the patient’s gender into account. Let’s look at Table 1.2:

Drug	A		B
Blood clot	Yes	No	Yes	No
Female	24	56	17	25
Male	3	39	6	74
Total	27	95	23	99
Percentage	22%	78%	18%	82%
Percentage (F)	30%	70%	40%	60%
Percentage (M)	7%	93%	7.5%	92.5%

Table 1.2 – Data for drug A and drug B with gender-specific results added. F = female, M = male. Color-coding added for ease of interpretation, with better results marked in green and worse results marked in orange.

Something strange has happened here. We have the same numbers as before and drug B is still preferable for all patients, but it seems that drug A works better for females and for males! Have we just found a medical Schrödinger’s cat (https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat) that flips the effect of a drug when a patient’s gender is observed?

If you think that we might have messed up the numbers – don’t believe me, just check the data for yourself. The data can be found in data/ch_01_drug_data.csv (https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python/blob/main/data/ch_01_drug_data.csv).

What we’ve just experienced is called Simpson’s paradox (also known as the Yule-Simpson effect). Simpson’s paradox appears when data partitioning (which we can achieve by controlling for the additional variable(s) in the regression setting) significantly changes the outcome of the analysis. In the real world, there are usually many ways to partition your data. You might ask: okay, so how do I know which partitioning is the correct one?

We could try to answer this question from a pure machine learning point of view: perform cross-validated feature selection and pick the variables that contribute significantly to the outcome. This solution is good enough in some settings. For instance, it will work well when we only care about making predictions (rather than decisions) and we know that our production data will be independent and identically distributed; in other words, our production data needs to have a distribution that is virtually identical (or at least similar enough) to our training and validation data. If we want more than this, we’ll need some sort of a (causal) world model.

Associations in the wild

Some people tend to think that purely associational relationships happen rarely in the real world or tend to be weak, so they cannot bias our results too much. To see how surprisingly strong and consistent spurious relationships can be in the real world, visit Tyler Vigen’s page: https://www.tylervigen.com/spurious-correlations. Notice that relationships between many variables are sometimes very strong and they last for long periods of time! I personally like the one with space launches and sociology doctorates and I often use it in my lectures and presentations. Which one is your favorite? Share and tag me on LinkedIn, Twitter (See the Let’s stay in touch section in Chapter 15 to connect!) so we can have a discussion!