Using Whirr
Using EMR is not the only way to deploy Hadoop in the cloud. If you prefer more control over the cluster installation and configuration process, you may want to explore other options.
Whirr is an Apache project that was developed to automate setting up and configuring Hadoop clusters in the cloud. Unlike EMR, Whirr can create Hadoop clusters using not only Amazon EC2, but also other cloud providers. As of now, Whirr supports EC2 and Rackspace cloud.
Installing and configuring Whirr
Whirr is not another Hadoop component. It is a collection of Java programs that helps you to automate creating a Hadoop cluster in the cloud. You can download Whirr from the project's website at:
http://www.apache.org/dyn/closer.cgi/whirr/
Whirr doesn't require any special steps to be installed. You can download the archive, unpack it, and start using the whirr
binary, which can be found in the bin
directory.
There are several configuration files you need to tune before you can use Whirr to launch clusters...