Counting the number of weekly crimes
The Denver crime dataset is huge, with over 460,000 rows each marked with a reported date. Counting the number of weekly crimes is one of many queries that can be answered by grouping according to some period of time. The .resample
method provides an easy interface to grouping by any possible span of time.
In this recipe, we will use both the .resample
and .groupby
methods to count the number of weekly crimes.
How to do it…
- Read in the crime hdf5 dataset, set the index as the
REPORTED_DATE
, and then sort it to increase performance for the rest of the recipe:>>> crime_sort = (pd.read_hdf('data/crime.h5', 'crime') ... .set_index('REPORTED_DATE') ... .sort_index() ... )
- To count the number of crimes per week, we need to form a group for each week. The
.resample
method takes aDateOffset
object or alias and returns an object ready to perform an action...