Batch layer for data processing
The core of Hadoop technology has been its ability to perform faster, performant, and optimized batch processes. It proved to be a big success in solving some of the more complex problems of long-running batch processing within organizations. The initial implementations of Hadoop were based on open source Hadoop distributions; however, with the inherent need to make it professionally supported, there were a number of features that were incorporated to make it feasible for enterprise use in terms of provisioning, management, monitoring, and alerting. This resulted in some of the more customized distributions led by MapR, Cloudera, and Hortonworks:
Figure 03: The Hadoop 1 framework
As shown in this image, the Hadoop 1 framework can be broadly classified into Storage and Processing. Storage here is represented by Hadoop Distributed File System (HDFS) while processing is represented as a MapReduce API. Hadoop 2 included many of the improved capabilities with the...