Exporting, importing, querying, and joining tables using AWS MapReduce
In this section, you will learn how to use Amazon Elastic MapReduce (Amazon EMR) with a tailored version of Apache Hive that includes connectivity to DynamoDB to execute operations on data stored in DynamoDB. You can find more information about Apache Hive at https://hive.apache.org/. You will perform actions such as the following:
Loading DynamoDB data tables into the Hadoop Distributed File System (HDFS)
Querying live DynamoDB data
Joining data stored in DynamoDB and exporting or querying it
With Amazon EMR and Hive, you can quickly and efficiently process large amounts of data, such as the data stored in DynamoDB. Apache Hive is a layer that you can use to query a MapReduce cluster using an easy, SQL-like query language called HiveQL. It runs on the Hadoop architecture. To start with the earlier operations, you will have to launch the EMR cluster, specify the location of the DynamoDB tables, and provide the Hive command...