We have now developed an understanding of how a program works and is executed in the Hadoop environment, how resources are allocated to a program, and Hadoop's approach to stored data. This information is mostly related to cluster management and custom application processing, but there are many Apache Projects that are related to big data processing that will help to process data. We will now briefly go through some of these projects, as we will be using some of them in our examples and programs later on in this book.
Apache Projects related to big data
Apache Zookeeper
Zookeeper is an open source Apache Project that provides a centralized infrastructure and services that enable synchronization across a cluster. It is...