The last chapter was all about managing resources on a Hadoop cluster and we went through details of the YARN architecture, execution, and a few examples. In this chapter, we will talk more about the MapReduce processing framework and how it has evolved over time. We will try to simplify how the overall MapReduce processing works and learn about what the major steps involved in the process are. The topics that will be covered in this chapter are as follows:
- Deep dive into the Hadoop MapReduce framework
- YARN and MapReduce
- MapReduce workflow in a Hadoop framework
- Important MapReduce parameters
- Common MapReduce patterns
- MapReduce examples in our use case
- Optimizing MapReduce
- MapReduce command reference