Feature scaling
Often, the features we want to use in our model are on very different scales. Put simply, the distance between the minimum and maximum values, or the range, varies substantially across possible features. For example, in the COVID-19 data, the total cases feature goes from 1 to almost 34 million, while aged 65 or older goes from 9 to 27 (the number represents the percentage of the population).
Having features on very different scales impacts many machine learning algorithms. For example, KNN models often use Euclidean distance, and features with greater ranges will have a greater influence on the model. Scaling can address this problem.
In this section, we will go over two popular approaches to scaling: min-max scaling and standard (or z-score) scaling. Min-max scaling replaces each value with its location in the range. More precisely, the following happens:
=
Here, is the min-max score, is the value for the observation of the feature, and and are...