dplyr versus data.table
You might now be wondering, "which package should we use?"
The dplyr
and data.table
packages provide a spectacularly different syntax and a slightly less determinative difference in performance. Although data.table
seems to be slightly more effective on larger datasets, there is no clear winner in this spectrum—except for doing aggregations on a high number of groups. And to be honest, the syntax of dplyr
, provided by the magrittr
package, can be also used by the data.table
objects if needed.
Also, there is another R package that provides pipes in R, called the pipeR
package, which claims to be a lot more effective on larger datasets than magrittr
. This performance gain is due to the fact that the pipeR
operators do not try to be smart like the F# language's |>
-compatible operator in magrittr
. Sometimes, this performance overhead is estimated to be 5-15 times more than the ones where no pipes are used at all.
One should take into account the community and support...