Elastic MapReduce (EMR) is a fully-managed cluster platform for running big-data and analytics frameworks such as Apache Hadoop, Spark, HBase, Presto, Impala, Cascading, and Flink. Running Hadoop clusters is a complex and time-consuming task. EMR provisions the cluster and installs frequently used frameworks for data scientists, analysts, and engineers.
EMR provides the flexibility to bootstrap your cluster, with a series of steps defined by the customer to install, configure, and prepare your data to be processed. EMR can use the Hadoop distributed file system on EBS volumes or EMRFS with Amazon S3 as the backing persistence service.
EMR clusters have a variety of use cases, from ETL and batch processing to real-time applications integrating Amazon Firehose or Apache Spark, and a wide number of connectors and integration architectures. Clusters on EMR can be...