Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Apache Spark 2.x for Java Developers Explore big data at scale using Apache Spark 2.x Java APIs

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787126497

Length 350 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Authors (2):

Sourav Gulati

Sumit Kumar

View More author details

Table of Contents (12) Chapters

Preface

1. Introduction to Spark FREE CHAPTER

2. Revisiting Java

3. Let Us Spark

4. Understanding the Spark Programming Model

5. Working with Data and Storage

6. Spark on Cluster

7. Spark Programming Model - Advanced

8. Working with Spark SQL

9. Near Real-Time Processing with Spark Streaming

10. Machine Learning Analytics with Spark MLlib

11. Learning Spark GraphX

Why use Java for Spark?

With the rise in multi-core CPUs, Java could not keep up with the change in its design to utilize that extra power available to its disposal because of the complexity surrounding concurrency and immutability. We will discuss this in detail, later. First let's understand the importance and usability of Java in the Hadoop ecosystem. As MapReduce was gaining popularity, Google introduced a framework called Flume Java that helped in pipelining multiple MapReduce jobs. Flume Java consists of immutable parallel collections capable of performing lazily evaluated optimized chained operations. That might sound eerily similar to what Apache Spark does, but then even before Apache Spark and Java Flume, there was Cascading, which built an abstraction over MapReduce to simplify the way MapReduce tasks are developed, tested, and run. All these frameworks were majorly...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Gulati

Shekhar Gulati is a developer and OpenShift evangelist working with Red Hat. He has been evangelizing about OpenShift for the last 2 years. He regularly speaks at various conferences and user groups around the world to spread the goodness of OpenShift. He regularly blogs on the OpenShift official blog and has written more than 50 blogs on OpenShift. Shekhar has also written many technical articles for IBM developerWorks, Developer.com, and Javalobby.

See other products by Gulati

Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.

See other products by Kumar