Summary
This chapter covered how to clean and process data. First, we covered the difference between duplicate data and redundant data and how to deal with each. Then, we talked about the highly debated question of what to do with missing data, which covered the different types of missing data, different methods of deleting missing data, different types of imputation, and interpolation. Next, we went over common issues such as invalid data, specification mismatch, and data type validation. Then, we covered non-parametric data, what it is, and what that means for you. Finally, we discussed outliers and how to address them. This wraps up how to clean your data. In the next chapter, we will cover how to wrangle your data and get it into a shape you can use!