Summary
In this chapter, we performed various data cleaning, preparation, and analysis techniques on the Online Retail II dataset and observed the importance of these processes. We learned how to make the decision between keeping outlier instances and deleting them and also how to break one feature into several features to enhance the analysis. Lastly, we learned how to ask our data the right questions and manipulate it to provide the answers—the definition of successful data analysis.
In the following chapter, we will follow a similar path with a different dataset and, thus, a new domain—that of appliance energy consumption. The techniques used depend on the data we have, and so while some of the actions might be repeated, some will be new.