Grouping the data – aggregation, filtering, and transformation
In this section, you will learn how to aggregate data over categorical variables. This is a very common practice when the data consists of categorical variables. This analysis enables us to conduct a category-wise analysis and take further decisions regarding the modelling.
To illustrate the concepts of grouping and aggregating data better, let's create a simple dummy data frame that has a rich mix of both numerical and categorical variables. Let's use whatever we have explored till now about random numbers to create this data frame, as shown in the following snippet:
import numpy as np import pandas as pd a=['Male','Female'] b=['Rich','Poor','Middle Class'] gender=[] seb=[] for i in range(1,101): gender.append(np.random.choice(a)) seb.append(np.random.choice(b)) height=30*np.random.randn(100)+155 weight=20*np.random.randn(100)+60 age=10*np.random.randn(100)+35 income=1500*np.random.randn(100)+15000 df=pd.DataFrame({'Gender...