Data Cleansing and Transformation
The success of a data pipeline is measured by its ability to transform the input data into the required attributes of the output data. It’s the finesse of the transformation stage that separates a nice toy pipeline from a powerful and impactful enterprise pipeline. The accuracy and optimization of data transformations are manifested via the use of methodical approaches to construct each task performed.
In this chapter, we will explore various data transformation techniques in Python, and how these techniques can be used to massage data into the desired format. You will walk away from this chapter with a firm basis in the following areas of data manipulation:
- Data cleansing and transformation
- The importance of accuracy and consistency
- Data cleansing with Python
- Workflow for data transformation
- Creating a data transformation activity in Python
As this book is geared toward creating data pipelines, we will be covering...