Applying data transformations
Data transformations are vital steps in the data preparation journey. It ensures that data is prepped for data models with unique assumptions. This is achieved by transforming data from its current shape (or distribution) to another.In other words, transforming data from the empirical distribution to theoretical distributions.
In some cases, we need to transform our input variables to ensure that they’re interpretable by the machine learning algorithm. An input variable (also known as a feature) is the columns of data, which typically explain some attribute of the data. In other cases, machine learning models require your output (aka a response) variable to have a certain distribution. An output variable is the column that we are trying to predict.
It certainly would be nice if the world accommodated our needs, but real-world data comes in all varieties! To remedy this scenario, you may have to perform a data transformation. In this section...