Plotting distributions of non-aggregated data
Visualizations can be of immense help in recognizing patterns and trends in your data. Is your data normally distributed? Does it skew left? Does it skew right? Is it multimodal? While you may be able to work out the answers to these questions, a visualization can very easily highlight these patterns for you, yielding deeper insight into your data.
In this recipe, we are going to see how easy pandas makes it to visualize the distribution of your data. Histograms are a very popular choice for plotting distributions, so we will start with them before showcasing the even more powerful Kernel Density Estimate (KDE) plot.
How to do it
Let’s create a pd.Series
using 10,000 random records that are known to follow a normal distribution. NumPy can be used to easily generate this data:
np.random.seed(42)
ser = pd.Series(
np.random.default_rng().normal(size=10_000),
dtype=pd.Float64Dtype(),
)
ser
0 0.049174...