Chapter 3. Processing – MapReduce and Beyond
In Hadoop 1, the platform had two clear components: HDFS for data storage and MapReduce for data processing. The previous chapter described the evolution of HDFS in Hadoop 2 and in this chapter we'll discuss data processing.
The picture with processing in Hadoop 2 has changed more significantly than has storage, and Hadoop now supports multiple processing models as first-class citizens. In this chapter we'll explore both MapReduce and other computational models in Hadoop2. In particular, we'll cover:
- What MapReduce is and the Java API required to write applications for it
- How MapReduce is implemented in practice
- How Hadoop reads data into and out of its processing jobs
- YARN, the Hadoop2 component that allows processing beyond MapReduce on the platform
- An introduction to several computational models implemented on YARN