Data aggregation with dplyr
Data aggregation refers to a set of techniques that summarizes the dataset at an aggregate level and characterizes the original dataset at a higher level. Compared to data transformation, it operates at the row level for the input and the output.
We have already encountered a few aggregation functions, such as calculating the mean of a column. This section will cover some of the most widely used aggregation functions provided by dplyr
. We will start with the count()
function, which returns the number of observations/rows for each category of the specified input column.
Counting observations using the count() function
The count()
function automatically groups the dataset into different categories according to the input argument and returns the number of observations for each category. The input argument could include one or more columns of the dataset. Let’s go through an exercise and apply it to the iris
dataset.