Getting Apache Hadoop
The official page for Apache Hadoop is http://hadoop.apache.org. Here, you can find in-depth documentation, manuals, and releases of Apache Hadoop. Hadoop is written in Java and requires JVM installed on your single-node setup to run. It is supported on both GNU/Linux and Windows.
Since the purpose of this chapter is to get introduced to Python programming for Apache Hadoop, a quick way to get our hands on a complete Hadoop ecosystem would be most ideal. Cloud vendor Cloudera hosts a number of free QuickStart VMs that contain a single-node Apache Hadoop cluster, complete with sample scripts and ready links to help us dive straight into managing our cluster. The following sections describe how to get a Hadoop VM running on your machine.
Getting a QuickStart VM from Cloudera
The download link to Hadoop QuickStart VMs from Cloudera is http://www.cloudera.com/content/support/en/downloads/quickstart_vms.html. The VM image comes installed with the CentOS 6.4 Linux operating...