Running a simple MapReduce program
In this recipe, we will look at how to make sense of the data stored on HDFS and extract useful information out of the files like the number of occurrences of a string, a pattern, or estimations, and various benchmarks. For this purpose, we can use MapReduce, which is a computation framework that helps us answer many questions we might have about the data.
With Hadoop, we can process huge amount of data. However, to get an understanding of its working, we'll start with a simple program such as pi estimation or a word count example.
ResourceManager is the master for Yet another Resource Negotiator (YARN). The Namenode stores the file metadata and the actual blocks/data reside on the slave nodes called Datanodes. All the jobs are submitted to the ResourceManager and it then assigns tasks to its slaves, called NodeManagers.
When a job is submitted to ResourceManager (RM), it will check for the job queue it is submitted to and whether the user has permissions...