Feature scaling
If you have several features and their ranges differ significantly, many machine learning algorithms may have taught times with your data: the large feature may overwhelm the features with small absolute values. A standard way to deal with this obstacle is feature scaling (also known as feature/data normalization). There are several methods to perform it, but the two most common are rescaling and standardization. This is something you want to do as a preprocessing step before feeding your data into the learner.
The least squares method is almost the same as the Euclidean distance between two points. If we want to calculate how close two points are, we want each dimension to make an equal contribution to the result. In the case of the linear regression features, contributions depend on absolute values of each feature. That's why feature scaling is a must before linear regression. Later, we will meet similar technique batch normalization when we talk about deep learning neural...