The approach
You have been provided a dataset with thousands of vehicle models. It includes general, engine, pollution, drivetrain, chassis, and technical details for each model. To find the predictors for MPG, you could leverage tried and proven statistical methods such as hypothesis testing, correlation analysis, and intrinsically interpretable models such as GLMs to gain a solid data understanding. However, you would have to make sure you are using the right statistical methods on a case-by-case basis and check that your data meets their underlying assumptions. And even after all of that, your intrinsic models will lack sufficient predictive accuracy to underpin any findings. Many practitioners trust this classical approach. However, this book favors the view that black-box models can extract more knowledge from data and more reliably and efficiently than with the classical approach. Interpretable machine learning provides the toolset to do so.
To that end, let's take a...