Summarizing data
Summarizing data is one of the most important tasks in data analysis, as this is the step where a data analyst will convert a large amount of data into a few main aggregates that represent a summary of the data. First, you will learn about the basics of data aggregation with pandas. Then, we will move on to a more advanced topic with pivot tables.
Grouping and aggregation
In general, datasets are made of a single observation per row, which means that you can end up with datasets comprising millions of rows. Of course, deriving any data analysis on dozens of rows is not the same as millions of rows. In these situations, grouping/summarizing rows together based on common variables is a good solution.
Consider the following example. You are given a file containing the yearly sales of a number of stores, as follows:
And you have been asked to summarize the sales for each store, which should...