Alternative distributions
Way back in Chapter 2, Getting Up and Running, we went to the Hadoop homepage from which we downloaded the installation package. Odd as it may seem, this is far from the only way to get Hadoop. Odder still may be the fact that most production deployments don't use the Apache Hadoop distribution.
Why alternative distributions?
Hadoop is open source software. Anyone can, providing they comply with the Apache Software License that governs Hadoop, make their own release of the software. There are two main reasons alternative distributions have been created.
Bundling
Some providers seek to build a pre-bundled distribution containing not only Hadoop but also other projects, such as Hive, HBase, Pig, and many more. Though installation of most projects is rarely difficult—with the exception of HBase, which has historically been more difficult to set up by hand—there can be subtle version incompatibilities that don't arise until a particular production workload hits the system...