Hive and Amazon Web Services
With Elastic MapReduce as the AWS Hadoop-on-demand service, it is of course possible to run Hive on an EMR cluster. But it is also possible to use Amazon storage services, particularly S3, from any Hadoop cluster be it within EMR or your own local cluster.
Hive and S3
As mentioned in Chapter 2, Storage, it is possible to specify a default filesystem other than HDFS for Hadoop and S3 is one option. But, it doesn't have to be an all-or-nothing thing; it is possible to have specific tables stored in S3. The data for these tables will be retrieved into the cluster to be processed, and any resulting data can either be written to a different S3 location (the same table cannot be the source and destination of a single query) or onto HDFS.
We can take a file of our tweet data and place it onto a location in S3 with a command such as the following:
$ aws s3 put tweets.tsv s3://<bucket-name>/tweets/
We firstly need to specify the access key and secret access key that...