Using mathematical transformations
Sometimes, we want to use features that do not have a Gaussian distribution with a machine learning algorithm that assumes our features are distributed in that way. When that happens, we either need to change our minds about which algorithm to use (for example, we could choose KNN rather than linear regression) or transform our features so that they approximate a Gaussian distribution. In this section, we will go over a couple of strategies for doing the latter:
- We start by importing the transformation module from
feature_engine
,train_test_split
fromsklearn
, andstats
fromscipy
. Additionally, we create training and testing DataFrames with the COVID-19 data:import pandas as pd from feature_engine import transformation as vt from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt from scipy import stats covidtotals = pd.read_csv("data/covidtotals.csv") feature_cols = ['location','population...