This chapter is written to help current R users who are novices in Hadoop understand and select solutions to evaluate. As with most things open source, the first consideration is of course monetary. Isn't it always? The good news is that there are multiple alternatives that are free, and additional capabilities are under development in various open source projects.
We generally see four options for building R and Hadoop integration using entirely open source stacks:
- Install R on workstations and connect to the data in Hadoop
- Install R on a shared server and connect to Hadoop
- Utilize Revolution R Open
- Execute R inside of MapReduce using RMR2
Let's walk through each option in detail in the following sections.