Chapter 6. Deploying Hadoop to the Cloud
Previously, we were focused on deploying Hadoop on real physical servers. This is the most common and generally recommended way to use Hadoop. The Hadoop cluster performs the best when it can fully utilize the available hardware resources. Previously, using virtualized servers for Hadoop was not considered a good practice. This has been changing over the past few years. More and more people realized that the main benefit of running Hadoop in the cloud is not the performance, but rather the great flexibility when it comes to provisioning resources. With cloud you can create large clusters of hundreds of computer nodes in no time, perform the required task, and then destroy the cluster if you don't need it any longer. In this chapter, we will describe the several options you have when deploying Hadoop in the cloud. We will focus on Amazon Web Services (AWS) since it is the most popular public cloud at the moment, and it also provides some advanced features...