Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Apache Spark 2.x for Java Developers
Apache Spark 2.x for Java Developers

Apache Spark 2.x for Java Developers: Explore big data at scale using Apache Spark 2.x Java APIs

Arrow left icon
Profile Icon Kumar Profile Icon Gulati
Arrow right icon
Free Trial
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2 (4 Ratings)
Paperback Jul 2017 350 pages 1st Edition
eBook
S$41.98 S$59.99
Paperback
S$74.99
Subscription
Free Trial
Arrow left icon
Profile Icon Kumar Profile Icon Gulati
Arrow right icon
Free Trial
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2 (4 Ratings)
Paperback Jul 2017 350 pages 1st Edition
eBook
S$41.98 S$59.99
Paperback
S$74.99
Subscription
Free Trial
eBook
S$41.98 S$59.99
Paperback
S$74.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Apache Spark 2.x for Java Developers

Revisiting Java

This chapter is added as the refresher course of Java. In this chapter, we will discuss some concepts of Java that will be useful while creating applications in Apache Spark.

This book assumes that the reader is comfortable with the basics of Java. We will discuss useful Java concepts and mainly focus on what is new in Java 8? More importantly, we will touch upon on topics such as:

  • Generics
  • Interfaces
  • Lambda expressions
  • Streams

Why use Java for Spark?

With the rise in multi-core CPUs, Java could not keep up with the change in its design to utilize that extra power available to its disposal because of the complexity surrounding concurrency and immutability. We will discuss this in detail, later. First let's understand the importance and usability of Java in the Hadoop ecosystem. As MapReduce was gaining popularity, Google introduced a framework called Flume Java that helped in pipelining multiple MapReduce jobs. Flume Java consists of immutable parallel collections capable of performing lazily evaluated optimized chained operations. That might sound eerily similar to what Apache Spark does, but then even before Apache Spark and Java Flume, there was Cascading, which built an abstraction over MapReduce to simplify the way MapReduce tasks are developed, tested, and run. All these frameworks were majorly...

Generics

Generics were introduced in Java 1.5. Generics help the user to create the general purpose code that has abstract type in its definition. That abstract type can be replaced with any concrete type in the implementation.

For example, the list interface or its implementations, such as ArrayList, LinkedList and so on, are defined with generic type. Users can provide the concrete type such as Integer, Long, or String while implementing the list:

List<Integer> list1 =new ArrayList<Integer>(); 
List<String> list2 =new ArrayList<String>(); 

Here, list1 is the list of integers and list2 is the list of strings. With Java 7, the compiler can infer the type. So the preceding code can also be written as follows:

List<Integer> list1 =new ArrayList<>(); 
List<String> list2 =new ArrayList<>(); 

Another huge benefit of generic type is...

Interfaces

Interfaces are the reference types in Java. They are used in Java to define contracts among classes. Any class that implements that interface has to adhere to the contract that the interface defines.

For example, we have an interface car as follows that consists of three abstract methods:

public interface Car { 
   void shape(); 
   void price(); 
   void color(); 
} 

Any class that implements this interface has to implement all the abstract methods of this interface unless it is an abstract class. Interfaces can only be implemented or extended by other interfaces, they cannot be instantiated.

Prior to Java 8, interfaces consisted only of abstract methods and final variables. In Java 8, interfaces may contain default and static methods as well.

Static method in an interface...

Lambda expressions

Lambda expressions are the brand new feature of Java. Lambda expressions are introduced in Java 8 and it is a step towards facilitating functional programming in Java.

Lambda expressions help you to define a method without declaring it. So, you do not need a name for the method, return-type, and so on. Lambda expressions, like anonymous inner classes, provide the way to pass behaviors to functions. Lambda, however, is a much more concise way of writing the code.

For example, the preceding example of an anonymous inner class can be converted to Lambda as follows:

public class MyFilterImpl { 
   public static void main(String[] args) { 
      File dir = new File("src/main/java"); 
      dir.list((dirname,name)->name.endsWith("java")); //Lambda Expression 
     } 
} 

Note that the signature of the Lambda expression is exactly matching the...

Lexical scoping

Lexical scoping is also referred to as Static scoping. As per lexical scoping, a variable will be accessible in the scope in which it is defined. Here, the scope of the variable is determined at compile time.

Let us consider the following example:

public class LexicalScoping { 
   int a = 1; 
   // a has class level scope. So It will be available to be accessed 
   // throughout the class 
 
   public void sumAndPrint() { 
      int b = 1; 
      int c = a + b; 
      // b and c are local variables of method. These will be accessible 
      // inside the method only 
   } 
   // b and c are no longer accessible 
} 

Variable a will be available throughout the class (let's not consider the difference of static and non-static as of now). However, variables b and c will be available inside the sumAndPrint method only.

Similarly, a variable given inside lambda...

Why use Java for Spark?


With the rise in multi-core CPUs, Java could not keep up with the change in its design to utilize that extra power available to its disposal because of the complexity surrounding concurrency and immutability. We will discuss this in detail, later. First let's understand the importance and usability of Java in the Hadoop ecosystem. As MapReduce was gaining popularity, Google introduced a framework called Flume Java that helped in pipelining multiple MapReduce jobs. Flume Java consists of immutable parallel collections capable of performing lazily evaluated optimized chained operations. That might sound eerily similar to what Apache Spark does, but then even before Apache Spark and Java Flume, there was Cascading, which built an abstraction over MapReduce to simplify the way MapReduce tasks are developed, tested, and run. All these frameworks were majorly a Java implementation to simplify MapReduce pipelines among other things.

These abstractions were simple in fact...

Generics


Generics were introduced in Java 1.5. Generics help the user to create the general purpose code that has abstract type in its definition. That abstract type can be replaced with any concrete type in the implementation.

For example, the list interface or its implementations, such as ArrayList, LinkedList and so on, are defined with generic type. Users can provide the concrete type such as Integer, Long, or String while implementing the list:

List<Integer> list1 =new ArrayList<Integer>(); 
List<String> list2 =new ArrayList<String>(); 

Here, list1 is the list of integers and list2 is the list of strings. With Java 7, the compiler can infer the type. So the preceding code can also be written as follows:

List<Integer> list1 =new ArrayList<>(); 
List<String> list2 =new ArrayList<>(); 

Another huge benefit of generic type is that it brings compile-time safety. Let's create a list without the use of generics:

List list =new ArrayList<>();...

Interfaces


Interfaces are the reference types in Java. They are used in Java to define contracts among classes. Any class that implements that interface has to adhere to the contract that the interface defines.

For example, we have an interface car as follows that consists of three abstract methods:

public interface Car { 
   void shape(); 
   void price(); 
   void color(); 
} 

Any class that implements this interface has to implement all the abstract methods of this interface unless it is an abstract class. Interfaces can only be implemented or extended by other interfaces, they cannot be instantiated.

Prior to Java 8, interfaces consisted only of abstract methods and final variables. In Java 8, interfaces may contain default and static methods as well.

Static method in an interface

The static method in an interface is similar to the static method in a class. Users cannot override them. So even if a class implements an interface, it cannot override a static method of an interface.

Like a static...

Lambda expressions


Lambda expressions are the brand new feature of Java. Lambda expressions are introduced in Java 8 and it is a step towards facilitating functional programming in Java.

Lambda expressions help you to define a method without declaring it. So, you do not need a name for the method, return-type, and so on. Lambda expressions, like anonymous inner classes, provide the way to pass behaviors to functions. Lambda, however, is a much more concise way of writing the code.

For example, the preceding example of an anonymous inner class can be converted to Lambda as follows:

public class MyFilterImpl { 
   public static void main(String[] args) { 
      File dir = new File("src/main/java"); 
      dir.list((dirname,name)->name.endsWith("java")); //Lambda Expression 
     } 
} 

Note that the signature of the Lambda expression is exactly matching the signature of the accept method in the FilenameFilter interface.

Note

One of the huge differences between Lambda and anonymous inner classes...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Perform big data processing with Spark—without having to learn Scala!
  • Use the Spark Java API to implement efficient enterprise-grade applications for data processing and analytics
  • Go beyond mainstream data processing by adding querying capability, Machine Learning, and graph processing using Spark

Description

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone. The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages. By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.

Who is this book for?

If you are a Java developer interested in learning to use the popular Apache Spark framework, this book is the resource you need to get started. Apache Spark developers who are looking to build enterprise-grade applications in Java will also find this book very useful.

What you will learn

  • Process data using different file formats such as XML, JSON, CSV, and plain and delimited text, using the Spark core Library.
  • Perform analytics on data from various data sources such as Kafka, and Flume using Spark Streaming Library
  • Learn SQL schema creation and the analysis of structured data using various SQL functions including Windowing functions in the Spark SQL Library
  • Explore Spark Mlib APIs while implementing Machine Learning techniques to solve real-world problems
  • Get to know Spark GraphX so you understand various graph-based analytics that can be performed with Spark

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 26, 2017
Length: 350 pages
Edition : 1st
Language : English
ISBN-13 : 9781787126497
Vendor :
Apache
Category :
Languages :
Concepts :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jul 26, 2017
Length: 350 pages
Edition : 1st
Language : English
ISBN-13 : 9781787126497
Vendor :
Apache
Category :
Languages :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just S$6 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just S$6 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total S$ 216.97
Apache Spark 2.x for Java Developers
S$74.99
Building Data Streaming Applications with Apache Kafka
S$66.99
Mastering Apache Spark 2.x
S$74.99
Total S$ 216.97 Stars icon

Table of Contents

11 Chapters
Introduction to Spark Chevron down icon Chevron up icon
Revisiting Java Chevron down icon Chevron up icon
Let Us Spark Chevron down icon Chevron up icon
Understanding the Spark Programming Model Chevron down icon Chevron up icon
Working with Data and Storage Chevron down icon Chevron up icon
Spark on Cluster Chevron down icon Chevron up icon
Spark Programming Model - Advanced Chevron down icon Chevron up icon
Working with Spark SQL Chevron down icon Chevron up icon
Near Real-Time Processing with Spark Streaming Chevron down icon Chevron up icon
Machine Learning Analytics with Spark MLlib Chevron down icon Chevron up icon
Learning Spark GraphX Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
(4 Ratings)
5 star 0%
4 star 0%
3 star 25%
2 star 50%
1 star 25%
Ray Brown Apr 07, 2020
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
The index needs a lot of help. I don't know if this is a packt publisher problem. The book has a few typos, but only annoying. Spark is a huge subject and this text -- used as a notebook so you can add your own material, combined with a course on Spark can get you started in the right direction. I've not seen any great texts that cover Spark thoroughly and do not require some research on your own. Spark is a changing product that can provide significant throughput increases with Machine Learning and Extract Transform and Load (ETL) systems. Regardless of which text you purchase you will be doing research on the web to find all your answers.
Amazon Verified review Amazon
Amazon Customer Oct 19, 2019
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
content not upto the mark
Amazon Verified review Amazon
mark berman Dec 21, 2017
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Lots of grammatical and spelling mistakes. Detracts from quality of this book. Suggest the authors engage a professional proof reader next time.
Amazon Verified review Amazon
phani kumar yadavilli Mar 31, 2018
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Some of the chapters are staggered and they are completely unreadable. Please check the screenshots for more details.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.