Bringing features onto the same scale
Feature scaling is a crucial step in our preprocessing pipeline that can easily be forgotten. Decision trees and random forests are two of the very few machine learning algorithms where we don't need to worry about feature scaling. Those algorithms are scale invariant. However, the majority of machine learning and optimization algorithms behave much better if features are on the same scale, as we have seen in Chapter 2, Training Simple Machine Learning Algorithms for Classification, when we implemented the gradient descent optimization algorithm.
The importance of feature scaling can be illustrated by a simple example. Let's assume that we have two features where one feature is measured on a scale from 1 to 10 and the second feature is measured on a scale from 1 to 100,000, respectively. When we think of the squared error function in Adaline in Chapter 2, Training Simple Machine Learning Algorithms for Classification, it is intuitive to say that the...