Summary
In this chapter, we learned about interesting ways to deal with list data by using a generator expression. They are easy and elegant and, once mastered, they give us a powerful trick that we can use repeatedly to simplify several common data wrangling tasks. We also examined different ways to format data. Formatting data is not only useful for preparing beautiful reports – it is often very important to guarantee data integrity for the downstream system.
We ended this chapter by checking out some methods to identify and remove outliers. This is important for us because we want our data to be properly prepared and ready for all our fancy downstream analysis jobs. We also observed how important it is to take the time to and use domain expertise to set up rules for identifying outliers, as doing this incorrectly can do more harm than good.
In the next chapter, we will cover how to read web pages, XML files, and APIs.