Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Apache Spark 2.x for Java Developers Explore big data at scale using Apache Spark 2.x Java APIs

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787126497

Length 350 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Authors (2):

Sourav Gulati

Sumit Kumar

View More author details

Table of Contents (12) Chapters

Preface

1. Introduction to Spark FREE CHAPTER

2. Revisiting Java

3. Let Us Spark

4. Understanding the Spark Programming Model

5. Working with Data and Storage

6. Spark on Cluster

7. Spark Programming Model - Advanced

8. Working with Spark SQL

9. Near Real-Time Processing with Spark Streaming

10. Machine Learning Analytics with Spark MLlib

11. Learning Spark GraphX

Summary

This chapter focused on handling streaming data from sources such as Kafka, socket, and filesystem. We also covered various stateful and stateless transformation of DStream along with checkpointing of data. But chekpointing of data alone does not guarantee fault tolerance and hence we discussed other approaches to make Spark Streaming job fault tolerant. We also talked about the transform operation, which comes in handy where operations of RDD API is not available in DStreams. Spark 2.0 introduced structured streaming as a separate module, however, because of its similarity with Spark Streaming, we discussed the newly introduced APIs of structured streaming also.

In the next chapter, we will focus on introducing the concepts of machine learning and then move towards its implementation using Apache Spark MLlib libraries. We will also discuss some real-world problems using...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Gulati

Shekhar Gulati is a developer and OpenShift evangelist working with Red Hat. He has been evangelizing about OpenShift for the last 2 years. He regularly speaks at various conferences and user groups around the world to spread the goodness of OpenShift. He regularly blogs on the OpenShift official blog and has written more than 50 blogs on OpenShift. Shekhar has also written many technical articles for IBM developerWorks, Developer.com, and Javalobby.

See other products by Gulati

Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.

See other products by Kumar