RHadoop for using Hadoop from R
RHadoop is a collection of open source packages using which an R user can manage and analyze data stored in the Hadoop Distributed File System (HDFS). In the background, RHadoop will translate these as MapReduce operations in Java and run them on HDFS.
The various packages in RHadoop and their uses are as follows:
rhdfs: Using this package, a user can connect to an HDFS from R and perform basic actions such as read, write, and modify files.
rhbase: This is the package to connect to a HBASE database from R and to read, write, and modify tables.
plyrmr: Using this package, an R user can do the common data manipulation tasks such as the slicing and dicing of datasets. This is similar to the function of packages such as plyr or reshape2.
rmr2: Using this package, a user can write MapReduce functions in R and execute them in an HDFS.
Unlike the other packages discussed in this book, the packages associated with RHadoop are not available from CRAN. They can be downloaded...