Introduction
In this chapter, we will look at cluster planning and some of the important aspects of cluster utilization.
Although this is a recipe book, it is good to have an understanding on the Hadoop cluster layout, network components, operating system, disk arrangements, and memory. We will try to cover some of the fundamental concepts on cluster planning and a few formulas to estimate the cluster size.
Let's say we are ready with our big data initiative and want to take the plunge into the Hadoop world. The first few of the primary concerns is what size cluster do we need? How many nodes and what configurations? What will be the roadmap in terms of the software/application stack, what will be the initial investment? What hardware to choose, whether to go with the vanilla Hadoop distribution or to go with vendor-specific Hadoop distributions.
There are no straightforward answers to these or any magic formulas. Many times, these decisions are influenced by market statistics, or by...