Grouping data is vital to arrive at key conclusions at an initial exploratory analysis phase. For example, when you deal with a retail dataset with variables such as OrderID, CustomerID, Shipping Date, Product Category, Sales Region, Quantity Ordered, Cancelation Status, Total Sales, Profit, Discount, and others,grouping the data and aggregating it helps you to arrive at answers to questions such as those that follow:
- Which region was the most profitable?
- Which product category had the most cancelations?
- What percent of customers contribute to 80% of the profit?
Grouping involves aggregating across each category. Aggregation may involve operations such as count, sum, exponent, or implementing a complex user-defined function. The groupby function of pandas helps with grouping. This is not much different from the groupby query in SQL.