Chapter 13: Migrating On-Premises Hadoop Workloads to Amazon EMR
Throughout the previous chapters, we have explained what Amazon EMR is, what its features are, how it integrates with AWS services, and how you can integrate a few of the batch or streaming ETL pipelines using EMR. If you are about to start your big data analytics journey, then you can get started with Amazon EMR and other AWS analytics services right away, but there are lot of customers who are already using Hadoop and Spark in their on-premises environments and are in the planning stage to migrate to the AWS cloud.
If you have Hive, Spark, or Hadoop workloads running in an on-premise Hadoop cluster, then there are several factors you need to consider before migrating to AWS, such as support for the Hadoop services you are using, their versions, how security will work in AWS, and what your migration strategy should be.
In this chapter, we will walk through possible migration approaches, options for migrating...