Using mathematical transformations
We sometimes want to use features that do not have a Gaussian distribution with a machine learning algorithm that assumes our features are distributed in that way. When that happens, we either need to change our minds about which algorithm to use (choose KNN or random forest rather than linear regression, for example) or transform our features so that they approximate a Gaussian distribution. We go over a couple of strategies for doing the latter in this recipe.
Getting ready
We will use the transformation module from feature engine in this recipe. We continue to work with the COVID-19 data, which has one row for each country with the total cases and deaths and some demographic data.
How to do it...
- We start by importing the
transformation
module fromfeature_engine
,train_test_split
fromsklearn
, andstats
fromscipy
. We also create a training and testing DataFrame with the COVID-19 data:import pandas as pd from feature_engine...