Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Apache Spark 2.x Cookbook
Apache Spark 2.x Cookbook

Apache Spark 2.x Cookbook: Over 70 cloud-ready recipes for distributed Big Data processing and analytics

eBook
R$49.99 R$245.99
Paperback
R$306.99
Subscription
Free Trial
Renews at R$50p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Apache Spark 2.x Cookbook

Developing Applications with Spark

In this chapter, we will cover the following recipes:

  • Exploring the Spark shell
  • Developing a Spark applications in Eclipse with Maven
  • Developing a Spark applications in Eclipse with SBT
  • Developing a Spark application in IntelliJ IDEA with Maven
  • Developing a Spark application in IntelliJ IDEA with SBT
  • Developing applications using the Zeppelin notebook
  • Setting up Kerberos to do authentication
  • Enabling Kerberos authentication for Spark

Introduction

Before we start this chapter, it is important that we discuss some trends that directly affect how we develop applications. 

Big data applications can be divided into the following three categories:

  • Batch
  • Interactive
  • Streaming or continuous applications

When Hadoop was designed, the primary focus was to provide cost-effective storage for large amounts of data. This remained the main show until it was upended by S3 and other cheaper and more reliable cloud storage alternatives. Compute on this large amounts of data in the Hadoop environment was primarily in the form of MapReduce jobs. Since Spark took the ball from Hadoop (OK! Snatched!) and started running with it, Spark also reflected batch orientation focus in the initial phase, but it did a better job than Hadoop in the case of exploiting in-memory storage. 

The most compelling factor of the success of Hadoop was that the cost of storage...

Exploring the Spark shell

Spark comes bundled with a read–eval–print loop (REPL) shell, which is a wrapper around the Scala shell. Though the Spark shell looks like a command line for simple things, in reality, a lot of complex queries can also be executed using it. A lot of times, the Spark shell is used in the initial development phase and once the code is stabilized, it is written as a class file and bundled as a jar to be run using spark-submit flag. This chapter explores different development environments in which Spark applications can be developed.

How to do it...

Hadoop MapReduce's word count, which takes at least three class files and one configuration file, namely project object model (POM), becomes very simple...

Left arrow icon Right arrow icon

Key benefits

  • Contains quick solutions to solving even the most complex Big Data processing problems using Apache Spark
  • Leverage the power of Apache Spark as a unified compute engine and perform streaming analytics, machine learning and graph processing with ease
  • From installing and setting up Spark to fine-tuning its performance, this practical guide is all you need to become a master in using Apache Spark

Description

While Apache Spark 1.x gained a lot of traction and adoption in the early years, Spark 2.x delivers notable improvements in the areas of API, schema awareness, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and more accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data. Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Further on, you will be introduced to working with RDDs, DataFrames and Datasets to operate on schema aware data, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning & recommendation engines in Spark. Last but not least, the final few chapters delve deeper into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting.

Who is this book for?

This book is for data engineers, data scientists, and Big Data professionals who want to leverage the power of Apache Spark 2.x for real-time Big Data processing. If you’re looking for quick solutions to common problems while using Spark 2.x effectively, this book will also help you. The book assumes you have a basic knowledge of Scala as a programming language.

What you will learn

  • Install and configure Apache Spark with various cluster managers & on AWS
  • Set up a development environment for Apache Spark including Databricks Cloud notebook
  • Find out how to operate on data in Spark with schemas
  • Get to grips with real-time streaming analytics using Spark Streaming & Structured Streaming
  • Master supervised learning and unsupervised learning using MLlib
  • Build a recommendation engine using MLlib
  • Graph processing using GraphX and GraphFrames libraries
  • Develop a set of common applications or project types, and solutions that solve complex big data problems

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : May 31, 2017
Length: 294 pages
Edition : 1st
Language : English
ISBN-13 : 9781787127517
Vendor :
Apache
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : May 31, 2017
Length: 294 pages
Edition : 1st
Language : English
ISBN-13 : 9781787127517
Vendor :
Apache
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
R$50 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
R$500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just R$25 each
Feature tick icon Exclusive print discounts
R$800 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just R$25 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total R$ 920.97
Apache Spark 2.x Cookbook
R$306.99
Apache Spark 2.x Machine Learning Cookbook
R$306.99
Mastering Apache Spark 2.x
R$306.99
Total R$ 920.97 Stars icon
Banner background image

Table of Contents

12 Chapters
Getting Started with Apache Spark Chevron down icon Chevron up icon
Developing Applications with Spark Chevron down icon Chevron up icon
Spark SQL Chevron down icon Chevron up icon
Working with External Data Sources Chevron down icon Chevron up icon
Spark Streaming Chevron down icon Chevron up icon
Getting Started with Machine Learning Chevron down icon Chevron up icon
Supervised Learning with MLlib — Regression Chevron down icon Chevron up icon
Supervised Learning with MLlib — Classification Chevron down icon Chevron up icon
Unsupervised Learning Chevron down icon Chevron up icon
Recommendations Using Collaborative Filtering Chevron down icon Chevron up icon
Graph Processing Using GraphX and GraphFrames Chevron down icon Chevron up icon
Optimizations and Performance Tuning Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.3
(3 Ratings)
5 star 33.3%
4 star 33.3%
3 star 0%
2 star 0%
1 star 33.3%
Delayen Weeden Jun 13, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I wanted to get a better understanding of graph processing and I was directed to this book. It answered alot of questions that I had, and it informed me on other areas that I needed to know. Its a great blueprint for people who work in this field. Very informative
Amazon Verified review Amazon
S. Jamal Sep 22, 2017
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Not bad- but needs editing. I have the print version - there are duplicate sentences in paragraphs that obviously should have been removed.Other than that, this is a decent reference and a good introductory book. It is not deep as regards Data Science (but perhaps it is not meant to be).
Amazon Verified review Amazon
TC Dec 11, 2018
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Compared with other "Cookbook": No depth. More like an introduction concept book.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.