Starting a standalone Hadoop cluster
To start this all off, we'll begin by setting up a simple standalone cluster. We'll need to download and configure an Apache Hadoop release and make sure that we can use it. Then, we will move on to configuring the same configurations and access methods in an Apache Karaf container. We will utilize an external cluster to show how you can utilize Apache Karaf to spin up a new job engine against a large existing cluster. With the features we have and will deploy, you can also embed an HDFS filesystem from a Karaf container.
Download a Hadoop release from one of the Apache mirrors at http://hadoop.apache.org/releases.html#Download. At the time of writing this book, the latest release is 2.4.0. A full walkthrough of setting up a cluster can be found at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html.
The following are the changes you need to make to get HDFS up and running and talking to a locally installed node, replication...