You're reading from Fast Data Processing with Spark 2 Accelerate your data for rapid insight

Product type Paperback

Published in Oct 2016

Publisher Packt

ISBN-13 9781785889271

Length 274 pages

Edition 3rd Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Processing

Authors (2):

Krishna Sankar

Holden Karau

View More author details

Table of Contents (13) Chapters

Preface

1. Installing Spark and Setting Up Your Cluster

2. Using the Spark Shell FREE CHAPTER

3. Building and Running a Spark Application

4. Creating a SparkSession Object

5. Loading and Saving Data in Spark

6. Manipulating Your RDD

7. Spark 2.0 Concepts

8. Spark SQL

9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists

10. Spark with Big Data

11. Machine Learning with Spark ML Pipelines

12. GraphX

What this book covers

Chapter 1, Installing Spark and Setting Up Your Cluster, details some common methods for setting up Spark.

Chapter 2, Using the Spark Shell, introduces the command line for Spark. The shell is good for trying out quick program snippets or just figuring out the syntax of a call interactively.

Chapter 3, Building and Running a Spark Application, covers the ways for compiling Spark applications.

Chapter 4, Creating a SparkSession Object, describe the programming aspects of the connection to a spark server regarding the Spark session and the enclosed spark context.

Chapter 5, Loading and Saving Data in Spark, deals with how we can get data in and out of a spark environment.

Chapter 6, Manipulating Your RDD, describes how to program Resilient Distributed Datasets, which is the fundamental data abstraction layer in Spark that makes all the magic possible.

Chapter 7, Spark 2.0 Concepts, is a short, interesting chapter that discusses the evolution of Spark and the concepts underpinning the Spark 2.0 release, which is a major milestone.

Chapter 8 , Spark SQL, deals with the SQL interface in Spark. Spark SQL probably is the most widely used feature.

Chapter 9, Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists, is another interesting chapter, which introduces the Datasets/DataFrames that are added in the Spark 2.0 release.

Chapter 10, Spark with Big Data, describes the interfaces with Parquet and HBase.

Chapter 11, Machine Learning with Spark ML Pipelines, is my favorite chapter. We talk about regression, classification, clustering, and recommendation in this chapter. This is probably the largest chapter in this book. If you are stranded in a remote island and could take only one chapter with you, this should be the one!

Chapter 12, GraphX, talks about an important capability, processing graphs at scale, and also discusses interesting algorithms such as PageRank.

The rest of the chapter is locked

You're reading from Fast Data Processing with Spark 2 Accelerate your data for rapid insight

Table of Contents (13) Chapters

What this book covers

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you