Summary statistics
Summary statistics provide a quick way to understand the basic properties and distribution of the data. In this section, we introduce two powerful pandas methods: pd.Series.value_counts
and pd.Series.describe
, which can serve as useful starting points for exploration.
How to do it
The pd.Series.value_counts
method attaches frequency counts to each distinct data point, making it easy to see how often each value occurs. This is particularly useful for discrete data:
ser = pd.Series(["a", "b", "c", "a", "c", "a"], dtype=pd.StringDtype())
ser.value_counts()
a 3
c 2
b 1
Name: count, dtype: Int64
For continuous data, pd.Series.describe
is a heap of calculations packaged together into one method call. Through invocation of this particular method, you can easily see the count, mean, minimum, and maximum, alongside a high-level distribution of your data:
ser = pd.Series([0, 42...