Summarizing data by category
The summarize()
function reduces the columns of a dataframe to a summary. The arguments to the summarize()
function are expressions which create new variables a function of the rows of other columns. Here are a couple examples of possible arguments to the summarize()
function:
avg.column.1 = mean(column.1)
sum.column.2 = sum(column.2)
The group_by()
function causes all of the subsequent operations to be performed by group. The arguments to the group_by()
function are the names of columns that the result should be grouped by. When the group_by()
function is followed by the summarize()
function, the summary is applied to each unique group.
The best way to understand the group_by()
function is with a demonstration. In the following continuation of dplyr_intro.R
, the fuel economy data is grouped by year and summarized by the mean value of barrels08
. Additionally, the filter()
function is used to filter
the data to include only Toyota Camry
models.
Note
The barrels08...