Packt+ | Advance your knowledge in tech

You're reading from Apache Spark 2.x for Java Developers Explore big data at scale using Apache Spark 2.x Java APIs

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787126497

Length 350 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Authors (2):

Sourav Gulati

Sumit Kumar

View More author details

Chapter 1, Introduction to Spark, covers the history of big data, its dimensions, and basic concepts of Hadoop and Spark.

Chapter 2, Revisiting Java, refreshes the concepts of core Java and will focus on the newer feature of Java 8 that will be leveraged while developing Spark applications.

Chapter 3, Let Us Spark, serves the purpose of providing an instruction set so that the reader becomes familiar with installing Apache Spark in standalone mode along with its dependencies.

Chapter 4, Understanding the Spark Programming Model, makes progress by explaining the word count problem in Apache Spark using Java and simultaneously setting up an IDE.

Chapter 5, Working with Data and Storage, teaches you how to read/store data in Spark from/to different storage systems.

Chapter 6, Spark on Cluster, discusses the cluster setup process and some popular cluster managers available with Spark in detail. After this chapter, you will be able to execute Spark jobs effectively in distributed mode.

Chapter 7, Spark Programming Model – Advanced, covers partitioning concepts in RDD along with advanced transformations and actions in Spark.

Chapter 8, Working with Spark SQL, discusses Spark SQL and its related concepts such as dataframe, dataset, and UDF. We will also discuss SqlContext and the newly introduced SparkSession.

Chapter 9, Near-Real-Time Processing with Spark Streaming, covers the internals of Spark Streaming, reading streams of data in Spark from various data sources with examples, and newer extensions of stream processing in Spark known as structured streaming.

Chapter 10, Machine Learning Analytics with Spark MLlib, focuses on introducing the concepts of machine learning and then moves on towards its implementation using Apache Spark Mllib libraries. We also discuss some real-world problems using Spark Mllib.

Chapter 11, Learning Spark GraphX, looks into another module of Spark, GraphX; we will discover types of GraphX RDD and various operations associated with them. We will also discuss the use cases of GraphX implementation.