Transforming Numerical Variables
Statistical methods used in data analysis make certain assumptions about the data. For example, in the general linear model, it is assumed that the values of the dependent variable (the target) are independent, that there is a linear relationship between the target and the independent (predictor) variables, and that the residuals – that is, the difference between the predictions and the real values of the target – are normally distributed and centered at 0. When these assumptions are not met, the resulting probabilistic statements might not be accurate. To correct for failure in the assumptions and thus improve the performance of the models, we can transform variables before the analysis.
Variable transformation consists of replacing the original variable values with a function of that variable. More generally, transforming variables with mathematical functions helps reduce variable skewness, improve the value spread, and sometimes...