Using the built-in statistics library
A great deal of exploratory data analysis (EDA) involves getting a summary of the data. There are several kinds of summary that might be interesting:
- Central Tendency: Values such as the mean, mode, and median can characterize the center of a set of data.
- Extrema: The minimum and maximum are as important as the central measures of a set of data.
- Variance: The variance and standard deviation are used to describe the dispersal of the data. A large variance means the data is widely distributed; a small variance means the data clusters tightly around the central value.
This recipe will show how to create basic descriptive statistics in Python.
Getting ready
We'll look at some simple data that can be used for statistical analysis. We've been given a file of raw data, called anscombe.json
. It's a JSON document that has four series of (x,y) pairs.
We can read this data with the...