You already have a set of tools to manipulate data. This chapter offers different ways of applying the learned concepts to cleanse data and also deal with invalid data, either by discarding it or fixing it.
We will cover the following topics in this chapter:
- Standardizing information and improving the quality of data
- Introducing some steps useful for data cleansing
- Dealing with non-exact matches
- Validating data
- Treating invalid data by splitting and merging streams