Accessing the DynamoDB data using AWS EMR
AWS Elastic MapReduce (EMR) has hosted Hadoop as a service from Amazon. As Hadoop has become one of the most important ETL/analytics tools these days, it is very important to know how to access the DynamoDB data from EMR so that we can use it for analytics. In this recipe, we are going to see how to access the DynamoDB data from EMR for analytics/querying.
Getting ready
To get started, you need to have a DynamoDB table created, and you should have data in it. Also, you need to have a secret key created, which will be used to connect to the EMR cluster using Putty or ssh
on the UNIX system. In case you haven't, read the documentation at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_SetUp_KeyPair.html.
This generated .pem key
can be converted into a private key (.ppk
), which can be used in Putty. You can refer to the following docs: