This chapter provides an introduction to R and how to use R to perform statistical computing on big data using Hadoop. We will see alternatives ranging from open source R on workstations to parallelized commercial products such as Revolution R Enterprise, and many other options in between will present themselves. Between these extremes lie a range of options with unique abilities: scaling data, performance, capability, and ease of use. And so, the right choice or choices depend on your data size, budget, skill, patience, and governance limitations.
In this chapter, we will summarize the alternatives and some of their advantages using pure open source R. Also, we will describe the options for achieving even greater scale, speed, stability, and ease of development by combining open source and commercial technologies.
In a nutshell...