From the very beginning of Hadoop's existence, it has consisted of two major parts, the storage part, which is known as the Hadoop Distributed File System (HDFS), and the processing part, which is known as MapReduce. In the previous chapter, we discussed the Hadoop Distributed File System, its architecture, and its internals. In Hadoop version 1, the only job that can be submitted and executed to Hadoop is MapReduce. In the present era of data processing, real-time and near real-time processing are favored over batch processing. Thus, there is a need for a generic application executor and Resource Manager that can schedule and execute all types of applications, including MapReduce, in real time or near real time. In this chapter, we will learn about YARN and will cover the following topics:
- YARN architecture
- YARN job...