Chapter 4. Hadoop and MapReduce Framework for R
In this chapter we are entering the diverse world of Big Data tools and applications that can be relatively easily integrated with the R language. In this chapter, we will present you with a set of guidelines and tips on the following topics:
- Deploying cloud-based virtual machines with Hadoop, the ready-to-use Hadoop Distributed File System (HDFS), and MapReduce frameworks
- Configuring your instance/virtual machine to include essential libraries and useful supplementary tools for data management in HDFS
- Managing HDFS using shell/Terminal commands and running a simple MapReduce word count in Java for comparison
- Integrating R statistical environment with Hadoop on a single-node cluster
- Managing files in HDFS and run simple MapReduce jobs using the
rhadoop
bundle of R packages - Carrying out more complex MapReduce tasks on large-scale electricity meter readings datasets on a multi-node HDInsight cluster on Microsoft Azure
However...