Creating statistical summaries
One essential kind of statistical summary is the measure of central tendency. There are several variations on this theme; mean, mode, and median, which are explained as follows:
The mean, also known as the average, combines all of the values into a single value
The median is the middlemost value—the data must be sorted to locate the one in the middle
The mode is the most common value
None of these is perfect to describe a set of data. Data that is truly random can often be summarized by the mean. Data that isn't random, however, can be better summarized by the median. With continuous data, each value might differ slightly from another. Every measurement in a small set of samples may be unique, making a mode meaningless.
As a consequence, we'll need algorithms to compute all three of these essential summaries. First, we need some data to work with.
In Chapter 2, Acquiring Intelligence Data, HQ asked us to gather cheese consumption data. We used the URL http://www...