You're reading from Simplify Big Data Analytics with Amazon EMR A beginner's guide to learning and implementing Amazon EMR for building data analytics solutions

Product type Paperback

Published in Mar 2022

Publisher Packt

ISBN-13 9781801071079

Length 430 pages

Edition 1st Edition

Tools

Amazon EMR

Concepts

Big Data

Author (1):

Sakti Mishra

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1: Overview, Architecture, Big Data Applications, and Common Use Cases of Amazon EMR

2. Chapter 1: An Overview of Amazon EMR FREE CHAPTER

3. Chapter 2: Exploring the Architecture and Deployment Options

4. Chapter 3: Common Use Cases and Architecture Patterns

5. Chapter 4: Big Data Applications and Notebooks Available in Amazon EMR

6. Section 2: Configuration, Scaling, Data Security, and Governance

7. Chapter 5: Setting Up and Configuring EMR Clusters

8. Chapter 6: Monitoring, Scaling, and High Availability

9. Chapter 7: Understanding Security in Amazon EMR

10. Chapter 8: Understanding Data Governance in Amazon EMR

11. Section 3: Implementing Common Use Cases and Best Practices

12. Chapter 9: Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark

13. Chapter 10: Implementing Real-Time Streaming with Amazon EMR and Spark Streaming

14. Chapter 11: Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi

15. Chapter 12: Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA

16. Chapter 13: Migrating On-Premises Hadoop Workloads to Amazon EMR

17. Chapter 14: Best Practices and Cost-Optimization Techniques

18. Other Books You May Enjoy

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "For example, the following sample JSON specifies configurations for the core-site and mapred-site classifications and includes Hadoop and MapReduce properties with values that you plan to override in the cluster."

A block of code is set as follows:

    "Properties": {
      "mapred.tasktracker.map.tasks.maximum": "10",
      "mapreduce.map.sort.spill.percent": "0.80",
      "mapreduce.tasktracker.reduce.tasks.maximum": "20"
    }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    "Classification": "core-site",
    "Properties": {
      "hadoop.security.groups.cache.secs": "500"

Any command-line input or output is written as follows:

aws emr create-cluster --instance-type m5.2xlarge --release-label emr-6.4.0 --security-configuration <mySecurityConfigName>

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "If you are creating a transient cluster that needs to execute a few steps and then auto terminate, then you can select Step execution for Launch mode."

Tips or Important Notes

Appear like this.