Section 3: Implementing Common Use Cases and Best Practices
This part of the book will explain how to implement the most common use cases of Amazon EMR, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT operations in S3 data lakes with Apache Hudi. Then it will explain how you can orchestrate your EMR jobs and how you can strategize on-premises Hadoop cluster migration to EMR, and finally, it will cover some of the best practices and cost optimization techniques you can follow while implementing your data analytics pipeline in EMR.
This section comprises the following chapters:
- Chapter 9, Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark
- Chapter 10, Implementing Real-Time Streaming with Amazon EMR and Spark Streaming
- Chapter 11, Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi
- Chapter 12, Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA
- Chapter 13...