Now we have completed some hands-on experience using the MapReduce framework. You should now be able to recall that in MapReduce, we see a service called Job Tracker. Job Tracker does the following things:
- It gets the information from NameNode to figure out where the data that will be required by the client program resides
- It coordinates with respective Task Trackers to assign jobs
- It monitors the life cycle of a Task Tracker's activities
In a normal scenario, it seems perfectly acceptable that a single machine can handle the preceding activities. We should instead be thinking of these services taking place in a production environment with a large cluster, where thousands of jobs will be running. This means that the Job Tracker will be performing several activities, a burden for the Job Tracker and an inelegant solution. This way of working can also lead to a performance...