Chapter 10
Data Cleaning Features
There are a number of techniques for validating and converting data to native Python objects for subsequent analysis. This chapter guides you through three of these techniques, each appropriate for different kinds of data. The chapter moves on to the idea of standardization to transform unusual or atypical values into a more useful form. The chapter concludes with the integration of acquisition and cleansing into a composite pipeline.
This chapter will expand on the project in Chapter 9, Project 3.1: Data Cleaning Base Application. The following additional skills will be emphasized:
CLI application extension and refactoring to add features.
Pythonic approaches to validation and conversion.
Techniques for uncovering key relationships.
Pipeline architectures. This can be seen as a first step toward a processing DAG (Directed Acyclic Graph) in which various stages are connected.
We’ll start with a description of the first project to expand...