Other R packages for large scale machine learning
Apart from RHadoop and SparkR, there are several other native R packages specifically built for large-scale machine learning. Here, we give a brief overview of them. Interested readers should refer to CRAN Task View: High-Performance and Parallel Computing with R (reference 10 in the References section of the chapter).
Though R is single-threaded, there exists several packages for parallel computation in R. Some of the well-known packages are Rmpi (R version of the popular message passing interface), multicore, snow (for building R clusters), and foreach. From R 2.14.0, a new package called parallel started shipping with the base R. We will discuss some of its features here.
The parallel R package
The parallel package is built on top of the multicore and snow packages. It is useful for running a single program on multiple datasets such as K-fold cross validation. It can be used for parallelizing in a single machine over multiple CPUs/cores...