Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning with Spark

You're reading from   Machine Learning with Spark Develop intelligent, distributed machine learning systems

Arrow left icon
Product type Paperback
Published in Apr 2017
Publisher Packt
ISBN-13 9781785889936
Length 532 pages
Edition 2nd Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Manpreet Singh Ghotra Manpreet Singh Ghotra
Author Profile Icon Manpreet Singh Ghotra
Manpreet Singh Ghotra
Rajdeep Dua Rajdeep Dua
Author Profile Icon Rajdeep Dua
Rajdeep Dua
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Getting Up and Running with Spark FREE CHAPTER 2. Math for Machine Learning 3. Designing a Machine Learning System 4. Obtaining, Processing, and Preparing Data with Spark 5. Building a Recommendation Engine with Spark 6. Building a Classification Model with Spark 7. Building a Regression Model with Spark 8. Building a Clustering Model with Spark 9. Dimensionality Reduction with Spark 10. Advanced Text Processing with Spark 11. Real-Time Machine Learning with Spark Streaming 12. Pipeline APIs for Spark ML

Configuring and running Spark on Amazon Elastic Map Reduce

Launch a Hadoop cluster with Spark installed using the Amazon Elastic Map Reduce. Perform the following steps to create an EMR cluster with Spark installed:

  1. Launch an Amazon EMR Cluster.
  2. Open the Amazon EMR UI console at https://console.aws.amazon.com/elasticmapreduce/.
  3. Choose Create cluster:
  1. Choose appropriate Amazon AMI Version 3.9.0 or later as shown in the following screenshot:
  1. For the applications to be installed field, choose Spark 1.5.2 or later from the list shown on the User Interface and click on Add.
  2. Select other hardware options as necessary:
    • The Instance Type
    • The keypair to be used with SSH
    • Permissions
    • IAM roles (Default orCustom)

Refer to the following screenshot:

  1. Click on Create cluster. The cluster will start instantiating as shown in the following screenshot:
  1. Log in into the master. Once the EMR cluster is ready, you can SSH into the master:
   $ ssh -i rd_spark-user1.pem
hadoop@ec2-52-3-242-138.compute-1.amazonaws.com
The output will be similar to following listing:
     Last login: Wed Jan 13 10:46:26 2016

__| __|_ )
_| ( / Amazon Linux AMI
___|___|___|

https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
23 package(s) needed for security, out of 49 available
Run "sudo yum update" to apply all updates.
[hadoop@ip-172-31-2-31 ~]$
  1. Start the Spark Shell:
      [hadoop@ip-172-31-2-31 ~]$ spark-shell
16/01/13 10:49:36 INFO SecurityManager: Changing view acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: Changing modify acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(hadoop); users with modify permissions:
Set(hadoop)

16/01/13 10:49:36 INFO HttpServer: Starting HTTP Server
16/01/13 10:49:36 INFO Utils: Successfully started service 'HTTP
class server' on port 60523.

Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.5.2
/_/
scala> sc
  1. Run Basic Spark sample from the EMR:
    scala> val textFile = sc.textFile("s3://elasticmapreduce/samples
/hive-ads/tables/impressions/dt=2009-04-13-08-05
/ec2-0-51-75-39.amazon.com-2009-04-13-08-05.log")

scala> val linesWithCartoonNetwork = textFile.filter(line =>
line.contains("cartoonnetwork.com")).count()
Your output will be as follows:
     linesWithCartoonNetwork: Long = 9
You have been reading a chapter from
Machine Learning with Spark - Second Edition
Published in: Apr 2017
Publisher: Packt
ISBN-13: 9781785889936
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime