Step 4 – where’s my validation set? Refutation tests
In this section, we’ll discuss ideas regarding causal model validation. We’ll introduce the idea behind refutation tests. Finally, we’ll implement a couple of refutation tests in practice.
How to validate causal models
One of the most popular ways to validate machine learning models is through cross-validation (CV). The basic idea behind CV is relatively simple:
- We split the data into
folds (subsets).
- We train the model on
folds and validate it on the remaining fold.
- We repeat this process
times.
- At every step, we train on a different set of
folds and evaluate on the remaining fold (which is also different at each step).
Figure 7.3 presents a schematic visualization of a five-fold CV scheme:
![Figure 7.3 – Schematic of five-fold CV](https://static.packt-cdn.com/products/9781804612989/graphics/image/Figure_7.3_B18993.jpg)
Figure 7.3 – Schematic of five-fold CV
In Figure 7.3, the blue folds denote validation sets, while the white ones denote training sets...