The crucial question is this: why do we need data to be transformed for data science? There are two principal reasons for this. The first of these reasons is to obtain datasets or small amounts of datasets because data science models are commonly based on the statistical population dataset. We can do JOINs in our data before they are analyzed or used for machine learning training, for example, but this often leads to unnecessary complications in the model, and it could also have a performance impact on the training time.
The second reason is a bit more complicated. The world is full of data, and the volume of it is always growing. The previous Chapter 3, Data Sources for Analytics, showed a lot of data sources and data creation methods. Let's summarize the increase of data from a different point of view. We can think about data from the perspective...