Time for action – running UFO analysis on EMR
Let us explore the use of EMR with Hive by doing some UFO analysis on the platform.
Log in to the AWS management console at http://aws.amazon.com/console.
Every Hive job flow on EMR runs from an S3 bucket and we need to select the bucket we wish to use for this purpose. Select S3 to see the list of the buckets associated with your account and then choose the bucket from which to run the example, in the example below, we select the bucket called garryt1use.
Use the web interface to create three directories called
ufodata
,ufoout
, andufologs
within that bucket. The resulting list of the bucket's contents should look like the following screenshot:Double-click on the
ufodata
directory to open it and within it create two subdirectories calledufo
andstates
.Create the following as
s3test.hql
, click on the Upload link within theufodata
directory, and follow the prompts to upload the file:CREATE EXTERNAL TABLE IF NOT EXISTS ufodata(sighted string, reported...