Packt+ | Advance your knowledge in tech

You're reading from Apache Spark 2.x Cookbook

Product type Book

Published in May 2017

Publisher

ISBN-13 9781787127265

Pages 294 pages

Edition 1st Edition

Languages

Scala

Concepts

Data Processing

Author (1):

Rishi Yadav

Table of Contents (19) Chapters

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. Getting Started with Apache Spark

2. Developing Applications with Spark

3. Spark SQL

4. Working with External Data Sources

5. Spark Streaming

6. Getting Started with Machine Learning

7. Supervised Learning with MLlib — Regression

8. Supervised Learning with MLlib — Classification

9. Unsupervised Learning

10. Recommendations Using Collaborative Filtering

11. Graph Processing Using GraphX and GraphFrames

12. Optimizations and Performance Tuning

Deploying Spark on a cluster with Mesos

Mesos is slowly emerging as a data center operating system for managing all the compute resources across a data center. Mesos runs on any computer running the Linux operating system. It is built using the same principles as the Linux kernel. Let's see how we can install Mesos.

How to do it…

Mesosphere provides a binary distribution of Mesos. The most recent package of the Mesos distribution can be installed from the Mesosphere repositories by performing the following steps:

Execute Mesos on a Ubuntu OS with the trusty version:

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF 
          DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]') CODENAME=$(lsb_release 
            -cs)
$ sudo vi /etc/apt/sources.list.d/mesosphere.list
deb http://repos.mesosphere.io/Ubuntu trusty main

Update the repositories:

$ sudo apt-get -y update

Install Mesos:

$ sudo apt-get -y install mesos

To connect Spark to Mesos and to integrate Spark with Mesos, make Spark binaries available to Mesos and configure the Spark driver to connect to Mesos.
Use the Spark binaries from the first recipe and upload them to HDFS:

$ hdfs dfs -put spark-2.1.0-bin-hadoop2.7.tgz spark-2.1.0-bin-hadoop2.7.tgz

The master URL of a single master Mesos is mesos://host:5050; the master URL of a ZooKeeper-managed Mesos cluster is mesos://zk://host:2181.
Set the following variables in spark-env.sh:

$ sudo vi spark-env.sh
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI= hdfs://localhost:9000/user/hduser/spark-2.1.0-bin-
          hadoop2.7.tgz

Run the following commands from the Scala program:

Val conf = new SparkConf().setMaster("mesos://host:5050")
Val sparkContext = new SparkContext(conf)

Run the following command from the Spark shell:

$ spark-shell --master mesos://host:5050

Note

Mesos has two run modes:

Fine-grained: In the fine-grained (default) mode, every Spark task runs as a separate Mesos task.

Coarse-grained: This mode will launch only one long-running Spark task on each Mesos machine

To run in the coarse-grained mode, set the spark.mesos.coarse property:

Conf.set("spark.mesos.coarse","true")

You're reading from Apache Spark 2.x Cookbook

Table of Contents (19) Chapters

Deploying Spark on a cluster with Mesos

How to do it…

Note

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Apache Spark 2.x Cookbook

Table of Contents (19) Chapters close

Deploying Spark on a cluster with Mesos

How to do it…

Note

Authors (1)

Other recommended products

Personalised recommendations for you

Table of Contents (19) Chapters