MapReduce is a framework used to compute a large amount of data in a Hadoop cluster. MapReduce uses YARN to schedule the mappers and reducers as tasks, using the containers.Â
An example of a MapReduce job to count frequencies of words is shown in the following diagram:
MapReduce works closely with YARN to plan the job and the various tasks in the job, requests computing resources from the cluster manager (resource manager), schedules the execution of the tasks on the compute resources of the cluster, and then executes the plan of execution. Using MapReduce, you can read write many different types of files of varying formats and perform very complex computations in a distributed manner. We will see more of this in the next chapter on MapReduce frameworks.