Data aggregation with Pandas DataFrames
Data aggregation is a term used in the field of relational databases. In a database query, we can group data by the value in a column or columns. We can then perform various operations on each of these groups. The Pandas DataFrame has similar capabilities. We will generate data held in a Python dict and then use this data to create a Pandas DataFrame. We will then practice the Pandas aggregation features:
Seed the NumPy random generator to make sure that the generated data will not differ between repeated program runs. The data will have four columns:
Weather
(a string)Food
(also a string)Price
(a random float)Number
(a random integer between one and nine)
The use case is that we have the results of some sort of consumer-purchase research, combined with weather and market pricing, where we calculate the average of prices and keep a track of the sample size and parameters:
import pandas as pd from numpy.random import seed...