Chapter 2. Exploring Data
When starting to work on a new dataset, it is essential to first get an idea of what conclusions can be drawn from the data. Before we can do things such as inference and hypothesis testing, we need to develop an understanding of what questions the data at hand can answer. This is the key to exploratory data analysis, which is the skill and science of developing intuition and identifying statistical patterns in the data. In this chapter, we will present graphical and numerical methods that help in this task. You will notice that there are no hard and fast rules of how to proceed at each step, but instead, we give recommendations on what techniques tend to be suitable in each case. The best way to develop the set of skills necessary to be an expert data explorer is to see lots of examples and, perhaps more importantly, work on our own datasets. More specifically, this chapter will cover the following topics:
- Performing the initial exploration and cleaning...