What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Causal Inference and Discovery in Python

Causality – Hey, We Have Machine Learning, So Why Even Bother?

Our journey starts here.

In this chapter, we’ll ask a couple of questions about causality.

What is it? Is causal inference different from statistical inference? If so – how?

Do we need causality at all if machine learning seems good enough?

If you have been following the fast-changing machine learning landscape over the last 5 to 10 years, you have likely noticed many examples of – as we like to call it in the machine learning community – the unreasonable effectiveness of modern machine learning algorithms in computer vision, natural language processing, and other areas.

Algorithms such as DALL-E 2 or GPT-3/4 made it not only to the consciousness of the research community but also the general public.

You might ask yourself – if all this stuff works so well, why would we bother and look into something else?

We’ll start this chapter with a brief discussion of the history of causality. Next, we’ll consider a couple of motivations for using a causal rather than purely statistical approach to modeling and we’ll introduce the concept of confounding.

Finally, we’ll see examples of how a causal approach can help us solve challenges in marketing and medicine. By the end of this chapter, you will have a good idea of why and when causal inference can be useful. You’ll be able to explain what confounding is and why it’s important.

In this chapter, we will cover the following:

A brief history of causality
Motivations to use a causal approach to modeling
How not to lose money… and human lives

A brief history of causality

Causality has a long history and has been addressed by most, if not all, advanced cultures that we know about. Aristotle – one of the most prolific philosophers of ancient Greece – claimed that understanding the causal structure of a process is a necessary ingredient of knowledge about this process. Moreover, he argued that being able to answer why-type questions is the essence of scientific explanation (Falcon, 2006; 2022). Aristotle distinguishes four types of causes (material, formal, efficient, and final), an idea that might capture some interesting aspects of reality as much as it might sound counterintuitive to a contemporary reader.

David Hume, a famous 18th-century Scottish philosopher, proposed a more unified framework for cause-effect relationships. Hume starts with an observation that we never observe cause-effect relationships in the world. The only thing we experience is that some events are conjoined:

“We only find, that the one does actually, in fact, follow the other. The impulse of one billiard-ball is attended with motion in the second. This is the whole that appears to the outward senses. The mind feels no sentiment or inward impression from this succession of objects: consequently, there is not, in any single, particular instance of cause and effect, any thing which can suggest the idea of power or necessary connexion” (original spelling; Hume & Millican, 2007; originally published in 1739).

One interpretation of Hume’s theory of causality (here simplified for clarity) is the following:

We only observe how the movement or appearance of object A precedes the movement or appearance of object B
If we experience such a succession a sufficient number of times, we’ll develop a feeling of expectation
This feeling of expectation is the essence of our concept of causality (it’s not about the world; it’s about a feeling we develop)

Hume’s theory of causality

The interpretation of Hume’s theory of causality that we give here is not the only one. First, Hume presented another definition of causality in his later work An Enquiry Concerning the Human Understanding (1758). Second, not all scholars would necessarily agree with our interpretation (for example, Archie (2005)).

This theory is very interesting from at least two points of view.

First, elements of this theory have a high resemblance to a very powerful idea in psychology called conditioning. Conditioning is a form of learning. There are multiple types of conditioning, but they all rely on a common foundation – namely, association (hence the name for this type of learning – associative learning). In any type of conditioning, we take some event or object (usually called stimulus) and associate it with some behavior or reaction. Associative learning works across species. You can find it in humans, apes, dogs, and cats, but also in much simpler organisms such as snails (Alexander, Audesirk & Audesirk, 1985).

Conditioning

If you want to learn more about different types of conditioning, check this https://bit.ly/MoreOnConditioning or search for phrases such as classical conditioning versus operant conditioning and names such as Ivan Pavlov and Burrhus Skinner, respectively.

Second, most classic machine learning algorithms also work on the basis of association. When we’re training a neural network in a supervised fashion, we’re trying to find a function that maps input to the output. To do it efficiently, we need to figure out which elements of the input are useful for predicting the output. And, in most cases, association is just good enough for this purpose.

Why causality? Ask babies!

Is there anything missing from Hume’s theory of causation? Although many other philosophers tried to answer this question, we’ll focus on one particularly interesting answer that comes from… human babies.

Interacting with the world

Alison Gopnik is an American child psychologist who studies how babies develop their world models. She also works with computer scientists, helping them understand how human babies build common-sense concepts about the external world. Children – to an even greater extent than adults – make use of associative learning, but they are also insatiable experimenters.

Have you ever seen a parent trying to convince their child to stop throwing around a toy? Some parents tend to interpret this type of behavior as rude, destructive, or aggressive, but babies often have a different set of motivations. They are running systematic experiments that allow them to learn the laws of physics and the rules of social interactions (Gopnik, 2009). Infants as young as 11 months prefer to perform experiments with objects that display unpredictable properties (for example, can pass through a wall) than with objects that behave predictably (Stahl & Feigenson, 2015). This preference allows them to efficiently build models of the world.

What we can learn from babies is that we’re not limited to observing the world, as Hume suggested. We can also interact with it. In the context of causal inference, these interactions are called interventions, and we’ll learn more about them in Chapter 2. Interventions are at the core of what many consider the Holy Grail of the scientific method: randomized controlled trial, or RCT for short.

Confounding – relationships that are not real

The fact that we can run experiments enhances our palette of possibilities beyond what Hume thought about. This is very powerful! Although experiments cannot solve all of the philosophical problems related to gaining new knowledge, they can solve some of them. A very important aspect of a properly designed randomized experiment is that it allows us to avoid confounding. Why is it important?

A confounding variable influences two or more other variables and produces a spurious association between them. From a purely statistical point of view, such associations are indistinguishable from the ones produced by a causal mechanism. Why is that problematic? Let’s see an example.

Imagine you work at a research institute and you’re trying to understand the causes of people drowning. Your organization provides you with a huge database of socioeconomic variables. You decide to run a regression model over a large set of these variables to predict the number of drownings per day in your area of interest. When you check the results, it turns out that the biggest coefficient you obtained is for daily regional ice cream sales. Interesting! Ice cream usually contains large amounts of sugar, so maybe sugar affects people’s attention or physical performance while they are in the water.

This hypothesis might make sense, but before we move forward, let’s ask some questions. How about other variables that we did not include in the model? Did we add enough predictors to the model to describe all relevant aspects of the problem? What if we added too many of them? Could adding just one variable to the model completely change the outcome?

Adding too many predictors

Adding too many predictors to the model might be harmful from both statistical and causal points of view. We will learn more on this topic in Chapter 3.

It turns out that this is possible.

Let me introduce you to daily average temperature – our confounder. Higher daily temperature makes people more likely to buy ice cream and more likely to go swimming. When there are more people swimming, there are also more accidents. Let’s try to visualize this relationship:

Figure 1.1 – Graphical representation of models with two (a) and three variables (b). Dashed lines represent the association, solid lines represent causation. ICE = ice cream sales, ACC = the number of accidents, and TMP = temperature.

In Figure 1.1, we can see that adding the average daily temperature to the model removes the relationship between regional ice cream sales and daily drownings. Depending on your background, this might or might not be surprising to you. We’ll learn more about the mechanism behind this effect in Chapter 3.

Before we move further, we need to state one important thing explicitly: confounding is a strictly causal concept. What does it mean? It means that we’re not able to say much about confounding using purely statistical language (note that this means that Hume’s definition as we presented it here cannot capture it). To see this clearly, let’s look at Figure 1.2:

Figure 1.2 – Pairwise scatterplots of relations between a, b, and c. The code to recreate the preceding plot can be found in the Chapter_01.ipynb notebook (https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python/blob/main/Chapter_01.ipynb).

In Figure 1.2, blue points signify a causal relationship while red points signify a spurious relationship, and variables a, b, and c are related in the following way:

b causes a and c
a and c are causally independent

Figure 1.3 presents a graphical representation of these relationships:

Figure 1.3 – Relationships between a, b, and c

The black dashed line with the red cross denotes that there is no causal relationship between a and c in any direction.

Hey, but in Figure 1.2 we see some relationship! Let’s unpack it!

In Figure 1.2, non-spurious (blue) and spurious (red) relationships look pretty similar to each other and their correlation coefficients will be similarly large. In practice, most of the time, they just cannot be distinguished based on solely statistical criteria and we need causal knowledge to distinguish between them.

Asymmetries and causal discovery

If fact, in some cases, we can use noise distribution or functional asymmetries to find out which direction is causal. This information can be leveraged to recover causal structure from observational data, but it also requires some assumptions about the data-generating process. We’ll learn more about this in Part 3, Causal Discovery (Chapter 13).

Okay, we said that there are some spurious relationships in our data; we added another variable to the model and it changed the model’s outcome. That said, I was still able to make useful predictions without this variable. If that’s true, why would I care whether the relationship is spurious or non-spurious? Why would I care whether the relationship is causal or not?

How not to lose money… and human lives

We learned that randomized experiments can help us avoid confounding. Unfortunately, they are not always available. Sometimes, experiments can be too costly to perform, unethical, or virtually impossible (for example, running an experiment where the treatment is a migration of a large group of some population). In this section, we’ll look at a couple of scenarios where we’re limited to observational data but we still want to draw causal conclusions. These examples will provide us with a solid foundation for the next chapters.

A marketer’s dilemma

Imagine you are a tech-savvy marketer and you want to effectively allocate your direct marketing budget. How would you approach this task? When allocating the budget for a direct marketing campaign, we’d like to understand what return we can expect if we spend a certain amount of money on a given person. In other words, we’re interested in estimating the effect of our actions on some customer outcomes (Gutierrez, Gérardy, 2017). Perhaps we could use supervised learning to solve this problem? To answer this question, let’s take a closer look at what exactly we want to predict.

We’re interested in understanding how a given person would react to our content. Let’s encode it in a formula:

τ i = Y i(1) − Y i(0)

In the preceding formula, the following applies:

τ i is the treatment effect for person i
Y i(1) is the outcome for person i when they received the treatment T (in our example, they received marketing content from us)
Y i(0) is the outcome for the same person i given they did not receive the treatment T

What the formula says is that we want to take the person i’s outcome Y i when this person does not receive treatment T and subtract it from the same person’s outcome when they receive treatment T.

An interesting thing here is that to solve this equation, we need to know what person i’s response is under treatment and under no treatment. In reality, we can never observe the same person under two mutually exclusive conditions at the same time. To solve the equation in the preceding formula, we need counterfactuals.

Counterfactuals are estimates of how the world would look if we changed the value of one or more variables, holding everything else constant. Because counterfactuals cannot be observed, the true causal effect τ is unknown. This is one of the reasons why classic machine learning cannot solve this problem for us. A family of causal techniques usually applied to problems like this is called uplift modeling, and we’ll learn more about it in Chapter 9 and 10.

Let’s play doctor!

Let’s take another example. Imagine you’re a doctor. One of your patients, Jennifer, has a rare disease, D. Additionally, she was diagnosed with a high risk of developing a blood clot. You study the information on the two most popular drugs for D. Both drugs have virtually identical effectiveness on D, but you’re not sure which drug will be safer for Jennifer, given her diagnosis. You look into the research data presented in Table 1.1:

Drug	A		B
Blood clot	Yes	No	Yes	No
Total	27	95	23	99
Percentage	22%	78%	19%	81%

Table 1.1 – Data for drug A and drug B

The numbers in Table 1.1 represent the number of patients diagnosed with disease D who were administered drug A or drug B. Row 2 (Blood clot) gives us information on whether a blood clot was found in patients or not. Note that the percentage scores are rounded. Based on this data, which drug would you choose? The answer seems pretty obvious. 81% of patients who received drug B did not develop blood clots. The same was true for only 78% of patients who received drug A. The risk of developing a blood clot is around 3% lower for patients receiving drug B compared to patients receiving drug A.

This looks like a fair answer, but you feel skeptical. You know that blood clots can be very risky and you want to dig deeper. You find more fine-grained data that takes the patient’s gender into account. Let’s look at Table 1.2:

Drug	A		B
Blood clot	Yes	No	Yes	No
Female	24	56	17	25
Male	3	39	6	74
Total	27	95	23	99
Percentage	22%	78%	18%	82%
Percentage (F)	30%	70%	40%	60%
Percentage (M)	7%	93%	7.5%	92.5%

Table 1.2 – Data for drug A and drug B with gender-specific results added. F = female, M = male. Color-coding added for ease of interpretation, with better results marked in green and worse results marked in orange.

Something strange has happened here. We have the same numbers as before and drug B is still preferable for all patients, but it seems that drug A works better for females and for males! Have we just found a medical Schrödinger’s cat (https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat) that flips the effect of a drug when a patient’s gender is observed?

If you think that we might have messed up the numbers – don’t believe me, just check the data for yourself. The data can be found in data/ch_01_drug_data.csv (https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python/blob/main/data/ch_01_drug_data.csv).

What we’ve just experienced is called Simpson’s paradox (also known as the Yule-Simpson effect). Simpson’s paradox appears when data partitioning (which we can achieve by controlling for the additional variable(s) in the regression setting) significantly changes the outcome of the analysis. In the real world, there are usually many ways to partition your data. You might ask: okay, so how do I know which partitioning is the correct one?

We could try to answer this question from a pure machine learning point of view: perform cross-validated feature selection and pick the variables that contribute significantly to the outcome. This solution is good enough in some settings. For instance, it will work well when we only care about making predictions (rather than decisions) and we know that our production data will be independent and identically distributed; in other words, our production data needs to have a distribution that is virtually identical (or at least similar enough) to our training and validation data. If we want more than this, we’ll need some sort of a (causal) world model.

Associations in the wild

Some people tend to think that purely associational relationships happen rarely in the real world or tend to be weak, so they cannot bias our results too much. To see how surprisingly strong and consistent spurious relationships can be in the real world, visit Tyler Vigen’s page: https://www.tylervigen.com/spurious-correlations. Notice that relationships between many variables are sometimes very strong and they last for long periods of time! I personally like the one with space launches and sociology doctorates and I often use it in my lectures and presentations. Which one is your favorite? Share and tag me on LinkedIn, Twitter (See the Let’s stay in touch section in Chapter 15 to connect!) so we can have a discussion!

Key benefits

Examine Pearlian causal concepts such as structural causal models, interventions, counterfactuals, and more

Discover modern causal inference techniques for average and heterogenous treatment effect estimation

Explore and leverage traditional and modern causal discovery methods

Description

Causal methods present unique challenges compared to traditional machine learning and statistics. Learning causality can be challenging, but it offers distinct advantages that elude a purely statistical mindset. Causal Inference and Discovery in Python helps you unlock the potential of causality. You’ll start with basic motivations behind causal thinking and a comprehensive introduction to Pearlian causal concepts, such as structural causal models, interventions, counterfactuals, and more. Each concept is accompanied by a theoretical explanation and a set of practical exercises with Python code. Next, you’ll dive into the world of causal effect estimation, consistently progressing towards modern machine learning methods. Step-by-step, you’ll discover Python causal ecosystem and harness the power of cutting-edge algorithms. You’ll further explore the mechanics of how “causes leave traces” and compare the main families of causal discovery algorithms. The final chapter gives you a broad outlook into the future of causal AI where we examine challenges and opportunities and provide you with a comprehensive list of resources to learn more. By the end of this book, you will be able to build your own models for causal inference and discovery using statistical and machine learning techniques as well as perform basic project assessment.

Who is this book for?

This book is for machine learning engineers, researchers, and data scientists looking to extend their toolkit and explore causal machine learning. It will also help people who’ve worked with causality using other programming languages and now want to switch to Python, those who worked with traditional causal inference and want to learn about causal machine learning, and tech-savvy entrepreneurs who want to go beyond the limitations of traditional ML. You are expected to have basic knowledge of Python and Python scientific libraries along with knowledge of basic probability and statistics.

What you will learn

Master the fundamental concepts of causal inference

Decipher the mysteries of structural causal models

Unleash the power of the 4-step causal inference process in Python

Explore advanced uplift modeling techniques

Unlock the secrets of modern causal discovery using Python

Use causal inference for social impact and community benefit

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Frequently bought together

Machine Learning with PyTorch and Scikit-Learn

$54.99

Causal Inference and Discovery in Python

$27.98 ~~$39.99~~

50 Algorithms Every Programmer Should Know

$39.98 ~~$49.99~~

Total $ 122.95 144.97 22.02 saved

Filter reviews by

All

Packt verified reviews

Feefo verified reviews

Amazon verified reviews

Christopher Payne Oct 19, 2024

The book arrived earlier than indicated. That was good. The book is written for someone that has a basic understanding of causality without going into the mathematical aspects of causality. The book needs more information on getting started with Python for non- python users.

Amazon Verified review

S. Shafei Aug 01, 2024

The book in my opinion is more like authors personal notes.Reading 7+ chapters so far and not a single in depth conversation around the topics.The author mainly relies on mentioning rules and assumptions and very briefly discussing them.There are many unnecessary paragraphs of at the beginning and at the end of sessions about what we want to do and what we did. Many of these paragraphs seems to be used to increase the number of pages without making a meaningful contribution to the learning process.Also, following its math is not fun on Kindle for iPad but I do not think it is about the book and more about Kindle.Overall, the books is more like personal notes than a book that aims at teaching you causal inference and how to code it in Python.

Rsankowski Jul 05, 2024

Beforre I came across this book I thought that the only was to causal inference was to conduct expensive experimental settings. This well written and conceptualized text gives a step-by-step introduction and code examples to build an analytical pipeline for causal inference. It covers a wide range machine and deep leagning approaches. En passant it gives an intuitive inteoduction into network anaysis. Definitely useful to all data scientists!

Eugene Slutsky May 17, 2024

On kindle for mac formulas are not rendering correctly i.e. no subscripts. Can you please fix it?

Rajith K Ravindren May 06, 2024

The paperback version has very small font size.It is very difficult to read.

Causal Inference and Discovery in Python: Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more

What do you get with eBook?

Causal Inference and Discovery in Python

Causality – Hey, We Have Machine Learning, So Why Even Bother?

A brief history of causality

Why causality? Ask babies!

Interacting with the world

Confounding – relationships that are not real

How not to lose money… and human lives

A marketer’s dilemma

Let’s play doctor!

Associations in the wild

Wrapping it up

References

Join our book's Discord space

Page 1 of 7

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

Causal Inference and Discovery in Python: Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more

What do you get with eBook?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs