Using data.table to manipulate data
In the first section, we reviewed some built-in functions used to manipulate data frames. Then, we introduced sqldf
, which makes simple data query and summary easier. However, both approaches have their limitations. Using built-in functions can be verbose and slow, and it is not easy to summarize data because SQL is not as powerful as the full spectrum of R functions.
The data.table
package provides a powerful enhanced version of data.frame
. It is blazing fast and has the ability to handle large data that fits into memory. It invents a natural syntax of data manipulation using []
. Run the following command to install the package from CRAN if you don't have it yet:
install.packages("data.table")
Once the package is successfully installed, we will load the package and see what it offers:
library(data.table) ## ## Attaching package: 'data.table' ## The following objects are masked from 'package:reshape2': ...