Launch a Hadoop cluster with Spark installed using the Amazon Elastic Map Reduce. Perform the following steps to create an EMR cluster with Spark installed:
- Launch an Amazon EMR Cluster.
- Open the Amazon EMR UI console at https://console.aws.amazon.com/elasticmapreduce/.
- Choose Create cluster:
- Choose appropriate Amazon AMI Version 3.9.0 or later as shown in the following screenshot:
- For the applications to be installed field, choose Spark 1.5.2 or later from the list shown on the User Interface and click on Add.
- Select other hardware options as necessary:
- The Instance Type
- The keypair to be used with SSH
- Permissions
- IAM roles (Default orCustom)
Refer to the following screenshot:
- Click on Create cluster. The cluster will start instantiating as shown in the following screenshot:
- Log in into the master. Once the EMR cluster is ready, you can SSH into the master:
$ ssh -i rd_spark-user1.pem
hadoop@ec2-52-3-242-138.compute-1.amazonaws.com
The output will be similar to following listing:
Last login: Wed Jan 13 10:46:26 2016
__| __|_ )
_| ( / Amazon Linux AMI
___|___|___|
https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
23 package(s) needed for security, out of 49 available
Run "sudo yum update" to apply all updates.
[hadoop@ip-172-31-2-31 ~]$
- Start the Spark Shell:
[hadoop@ip-172-31-2-31 ~]$ spark-shell
16/01/13 10:49:36 INFO SecurityManager: Changing view acls to:
hadoop
16/01/13 10:49:36 INFO SecurityManager: Changing modify acls to:
hadoop
16/01/13 10:49:36 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(hadoop); users with modify permissions:
Set(hadoop)
16/01/13 10:49:36 INFO HttpServer: Starting HTTP Server
16/01/13 10:49:36 INFO Utils: Successfully started service 'HTTP
class server' on port 60523.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.5.2
/_/
scala> sc
- Run Basic Spark sample from the EMR:
scala> val textFile = sc.textFile("s3://elasticmapreduce/samples
/hive-ads/tables/impressions/dt=2009-04-13-08-05
/ec2-0-51-75-39.amazon.com-2009-04-13-08-05.log")
scala> val linesWithCartoonNetwork = textFile.filter(line =>
line.contains("cartoonnetwork.com")).count()
Your output will be as follows:
linesWithCartoonNetwork: Long = 9