Introduction
Most R users will agree that data frames provide a flexible and expressive structure for tabular data. While data frames are effective for small datasets, they are not ideal to use when processing data that is larger than a Gigabyte in size. Additionally, it is not easy to summarize data within the data frame itself; we need to load an additional package, such as plyr
or reshape2
, to perform advanced aggregation. Therefore, we would like to introduce how to use data.table
and dplyr
to perform descriptive statistics.
We first illustrate what these two packages do:
data.table
: This is an extension ofdata.frame
; it provides the ability to quickly aggregate and process large datasets. Additionally, it provides a much more readable and less confusing syntax compared to data frames.dplyr
: This provides users with SQL-like functions so that we can quickly aggregate and summarize data from various sources.
These two packages can help users quickly and easily generate descriptive statistics...