Analyzing the mean of a dataset
The mean is considered the average of a dataset. It is typically used on tabular data, and it provides us with a sense of where the center of the dataset lies. To calculate the mean, we need to sum up all the data points and divide the sum by the number of data points in our dataset. The mean is very sensitive to outliers. Outliers are unusually high or unusually low data points that are far from other data points in our dataset. They typically lead to anomalies in the output of our analysis. Since unusually high or low numbers will affect the sum of data points without affecting the number of data points, these outliers can heavily influence the mean of a dataset. However, the mean is still very useful for inspecting a dataset to get quick insights into the average of the dataset.
To analyze the mean of a dataset, we will use the mean
method in the numpy
library in Python.
Getting ready
We will work with one dataset in this chapter: the counts...