Data manipulation in R
For students working with perfectly prepared data from various R packages on relatively small scale problems, data manipulation is not the big issue. However, in the daily practice of a data scientist, most of the time working on data analysis does not involve applying a suitable function to an already perfectly prepared piece of data. The majority of work is done on data manipulation, in order to collect data from several sources, shape the data into a suitable format, and extract the relevant information. Thus, data manipulation is the core work, and data scientists and statisticians should possess strong data manipulation skills.
Whenever you work with data frames, the package dplyr
provides user-friendly and computationally efficient code. One package that supports even more efficient data manipulation is the data.table
package (Dowle et al., 2015). However, since both packages have their advantages, we report both. Also, data.table
works with two dimensional data...