Hadoop on Mesos
This section will introduce Hadoop, explain how to set up the Hadoop stack on Mesos, and discuss the problems commonly encountered while setting up the stack.
Introduction to Hadoop
Hadoop was developed by Mike Cafarella and Doug Cutting in 2006 to manage the distribution for the Nutch project. The project was named after Doug's son's toy elephant.
The following modules make up the Apache Hadoop framework:
Hadoop Common: This has the common libraries and utilities required by other modules
Hadoop Distributed File System (HDFS): This is a distributed, scalable filesystem capable of storing petabytes of data on commodity hardware
Hadoop YARN: This is a resource manager to manage cluster resources (similar to Mesos)
Hadoop MapReduce: This is a processing model for parallel data processing at scale
MapReduce
MapReduce is a processing model using which large amounts of data can be processed in parallel on a distributed, commodity hardware-based infrastructure reliably and in a fault...