Summary
In this chapter, we introduced four tools to ease development on Hadoop. In particular, we covered:
How Hadoop streaming allows the writing of MapReduce jobs using dynamic languages
How Kite Data simplifies interfacing with heterogeneous data sources
How Apache Crunch provides a high-level abstraction to write pipelines of Spark and MapReduce jobs that implement common design patterns
How Morphlines allows us to declare chains of commands and data transformations that can then be embedded in any Java codebase
In Chapter 10, Running a Hadoop 2 Cluster, we will shift our focus from the domain of software development to system administration. We will discuss how to set up, manage, and scale a Hadoop cluster, while taking aspects such as monitoring and security into consideration.