To the memory limits and beyond
We will start off by introducing you to three very useful and versatile packages which facilitate out-of-memory data processing: ff
, ffbase
, and ffbase2
.
Data transformations and aggregations with the ff and ffbase packages
Although the ff
package authored by Adler, Glaser, Nenadic, Ochlschlagel, and Zucchini, is several years old it still proves to be a popular solution to large data processing with R. The title of the package Memory-efficient storage of large data on disk and fast access functions roughly explains what it does. It chunks the dataset, and stores it on a hard drive, while the ff
data structure (or ffdf
data frame), which is held in RAM, like the other R data structures, provides mapping to the partitioned dataset. The chunks of raw data are simply binary flat files in native encoding, whereas the ff
objects keep the metadata, which describe and link to the created binary files. Creating ff
structures and binary files from the raw data does...