Step 4 – where’s my validation set? Refutation tests
In this section, we’ll discuss ideas regarding causal model validation. We’ll introduce the idea behind refutation tests. Finally, we’ll implement a couple of refutation tests in practice.
How to validate causal models
One of the most popular ways to validate machine learning models is through cross-validation (CV). The basic idea behind CV is relatively simple:
- We split the data into folds (subsets).
- We train the model on folds and validate it on the remaining fold.
- We repeat this process times.
- At every step, we train on a different set of folds and evaluate on the remaining fold (which is also different at each step).
Figure 7.3 presents a schematic visualization of a five-fold CV scheme:
Figure 7.3 – Schematic of five-fold CV
In Figure 7.3, the blue folds denote validation sets, while the white ones denote training sets...