Summary
In this chapter, we looked at some common techniques for exploring data. We learned how to retrieve subsets of data when that is required for our analysis. We also used pandas methods to generate key statistics on features such as mean, interquartile range, and skew. This gave us a better sense of the central tendency, spread, and shape of the distribution of each feature. It also put us in a better position to identify outliers. Finally, we used the Matplotlib and Seaborn libraries to create histograms, boxplots, and violin plots. This yielded additional insights about the distribution of features, such as the length of the tail and divergence from the normal distribution.
Visualizations are a great supplement to the tools for univariate analysis that we have discussed in this chapter. Histograms, boxplots, and violin plots display the shape and spread of each feature's distribution. Graphically, they show what we may miss by examining a few summary statistics, such as where there is a bulge (or bulges) in the distribution and where the extreme values are. These visualizations will be every bit as helpful when we explore bivariate and multivariate relationships, which we will do in Chapter 2, Examining Bivariate and Multivariate Relationships between Features and Targets.