Summary
In this chapter, we introduced four tools to ease development on Hadoop. In particular, we covered:
- How Hadoop streaming allows the writing of MapReduce jobs using dynamic languages
- How Kite Data simplifies interfacing with heterogeneous data sources
- How Apache Crunch provides a high-level abstraction to write pipelines of Spark and MapReduce jobs that implement common design patterns
- How Morphlines allows us to declare chains of commands and data transformations that can then be embedded in any Java codebase
In Chapter 10, Running a Hadoop 2 Cluster, we will shift our focus from the domain of software development to system administration. We will discuss how to set up, manage, and scale a Hadoop cluster, while taking aspects such as monitoring and security into consideration.