Chapter 2. Cleaning and Validating Data
In this chapter, we will cover the following recipes:
- Cleaning data with regular expressions
- Maintaining consistency with synonym maps
- Identifying and removing duplicate data
- Regularizing numbers
- Calculating relative values
- Parsing dates and times
- Lazily processing very large data sets
- Sampling from very large data sets
- Fixing spelling errors
- Parsing custom data formats
- Validating data with Valip