Linear and logistic regression assume that the variables are normally distributed. If they are not, we can often apply a mathematical transformation to change their distribution into Gaussian, and sometimes even unmask linear relationships between variables and their targets. This means that transforming variables may improve the performance of linear machine learning models. Commonly used mathematical transformations include the logarithm, reciprocal, power, square and cube root transformations, as well as the Box-Cox and Yeo-Johnson transformations. In this chapter, we will learn how to implement all of these operations on the variables in our dataset using the NumPy, SciPy, scikit-learn, and Feature-engine libraries.
This chapter will cover the following recipes:
- Transforming variables with the logarithm
- Transforming variables...