In this chapter, we looked at various aspects of Apache Hadoop. We discussed the main components of Hadoop, such as the Hadoop Distributed File System, the MapReduce framework, and YARN. In between, we did some practical work by executing basic command related to HDFS. We also developed a program to calculate a bill summary using the MapReduce framework with easy-to-understand code.
Then, we discussed other projects under the umbrella of the Apache Foundation. These projects included Apache Zookeeper, Apache Kafka, Apache Flume, Apache Cassandra, Apache HBase, and Apache Spark. These projects are related to Hadoop Ecosystem. Some of them are related to bringing data into Hadoop, while others are related to the processing of data. The important thing we learned here is that though projects may appear similar, their uses and architecture differs...