The whys of data transformation and massaging
Data transformation comes at the very last stage of data preprocessing, right before using the analytic tools. At this stage of data preprocessing, the dataset already has the following characteristics.
- Data cleaning: The dataset is cleaned at all three cleaning levels (Chapters 9–11).
- Data integration: All the potentially beneficial data sources are recognized and a dataset that includes the necessary information is created (Chapter 12, Data Fusion and Integration).
- Data reduction: If needed, the size of the dataset has been reduced (Chapter 13, Data Reduction).
At this stage of data preprocessing, we may have to make some changes to the data before moving to the analyzing stage. The dataset will undergo the changes for one of the following reasons: we will call them necessity, correctness, and effectiveness. The following list provides more detail for each reason.
- Necessity: The analytic method cannot...