Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Apache Spark 2.x for Java Developers
Apache Spark 2.x for Java Developers

Apache Spark 2.x for Java Developers: Explore big data at scale using Apache Spark 2.x Java APIs

eBook
€22.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Apache Spark 2.x for Java Developers

Revisiting Java

This chapter is added as the refresher course of Java. In this chapter, we will discuss some concepts of Java that will be useful while creating applications in Apache Spark.

This book assumes that the reader is comfortable with the basics of Java. We will discuss useful Java concepts and mainly focus on what is new in Java 8? More importantly, we will touch upon on topics such as:

  • Generics
  • Interfaces
  • Lambda expressions
  • Streams

Why use Java for Spark?

With the rise in multi-core CPUs, Java could not keep up with the change in its design to utilize that extra power available to its disposal because of the complexity surrounding concurrency and immutability. We will discuss this in detail, later. First let's understand the importance and usability of Java in the Hadoop ecosystem. As MapReduce was gaining popularity, Google introduced a framework called Flume Java that helped in pipelining multiple MapReduce jobs. Flume Java consists of immutable parallel collections capable of performing lazily evaluated optimized chained operations. That might sound eerily similar to what Apache Spark does, but then even before Apache Spark and Java Flume, there was Cascading, which built an abstraction over MapReduce to simplify the way MapReduce tasks are developed, tested, and run. All these frameworks were majorly...

Generics

Generics were introduced in Java 1.5. Generics help the user to create the general purpose code that has abstract type in its definition. That abstract type can be replaced with any concrete type in the implementation.

For example, the list interface or its implementations, such as ArrayList, LinkedList and so on, are defined with generic type. Users can provide the concrete type such as Integer, Long, or String while implementing the list:

List<Integer> list1 =new ArrayList<Integer>(); 
List<String> list2 =new ArrayList<String>(); 

Here, list1 is the list of integers and list2 is the list of strings. With Java 7, the compiler can infer the type. So the preceding code can also be written as follows:

List<Integer> list1 =new ArrayList<>(); 
List<String> list2 =new ArrayList<>(); 

Another huge benefit of generic type is...

Interfaces

Interfaces are the reference types in Java. They are used in Java to define contracts among classes. Any class that implements that interface has to adhere to the contract that the interface defines.

For example, we have an interface car as follows that consists of three abstract methods:

public interface Car { 
   void shape(); 
   void price(); 
   void color(); 
} 

Any class that implements this interface has to implement all the abstract methods of this interface unless it is an abstract class. Interfaces can only be implemented or extended by other interfaces, they cannot be instantiated.

Prior to Java 8, interfaces consisted only of abstract methods and final variables. In Java 8, interfaces may contain default and static methods as well.

Static method in an interface...

Lambda expressions

Lambda expressions are the brand new feature of Java. Lambda expressions are introduced in Java 8 and it is a step towards facilitating functional programming in Java.

Lambda expressions help you to define a method without declaring it. So, you do not need a name for the method, return-type, and so on. Lambda expressions, like anonymous inner classes, provide the way to pass behaviors to functions. Lambda, however, is a much more concise way of writing the code.

For example, the preceding example of an anonymous inner class can be converted to Lambda as follows:

public class MyFilterImpl { 
   public static void main(String[] args) { 
      File dir = new File("src/main/java"); 
      dir.list((dirname,name)->name.endsWith("java")); //Lambda Expression 
     } 
} 

Note that the signature of the Lambda expression is exactly matching the...

Lexical scoping

Lexical scoping is also referred to as Static scoping. As per lexical scoping, a variable will be accessible in the scope in which it is defined. Here, the scope of the variable is determined at compile time.

Let us consider the following example:

public class LexicalScoping { 
   int a = 1; 
   // a has class level scope. So It will be available to be accessed 
   // throughout the class 
 
   public void sumAndPrint() { 
      int b = 1; 
      int c = a + b; 
      // b and c are local variables of method. These will be accessible 
      // inside the method only 
   } 
   // b and c are no longer accessible 
} 

Variable a will be available throughout the class (let's not consider the difference of static and non-static as of now). However, variables b and c will be available inside the sumAndPrint method only.

Similarly, a variable given inside lambda...

Why use Java for Spark?


With the rise in multi-core CPUs, Java could not keep up with the change in its design to utilize that extra power available to its disposal because of the complexity surrounding concurrency and immutability. We will discuss this in detail, later. First let's understand the importance and usability of Java in the Hadoop ecosystem. As MapReduce was gaining popularity, Google introduced a framework called Flume Java that helped in pipelining multiple MapReduce jobs. Flume Java consists of immutable parallel collections capable of performing lazily evaluated optimized chained operations. That might sound eerily similar to what Apache Spark does, but then even before Apache Spark and Java Flume, there was Cascading, which built an abstraction over MapReduce to simplify the way MapReduce tasks are developed, tested, and run. All these frameworks were majorly a Java implementation to simplify MapReduce pipelines among other things.

These abstractions were simple in fact...

Generics


Generics were introduced in Java 1.5. Generics help the user to create the general purpose code that has abstract type in its definition. That abstract type can be replaced with any concrete type in the implementation.

For example, the list interface or its implementations, such as ArrayList, LinkedList and so on, are defined with generic type. Users can provide the concrete type such as Integer, Long, or String while implementing the list:

List<Integer> list1 =new ArrayList<Integer>(); 
List<String> list2 =new ArrayList<String>(); 

Here, list1 is the list of integers and list2 is the list of strings. With Java 7, the compiler can infer the type. So the preceding code can also be written as follows:

List<Integer> list1 =new ArrayList<>(); 
List<String> list2 =new ArrayList<>(); 

Another huge benefit of generic type is that it brings compile-time safety. Let's create a list without the use of generics:

List list =new ArrayList<>();...

Interfaces


Interfaces are the reference types in Java. They are used in Java to define contracts among classes. Any class that implements that interface has to adhere to the contract that the interface defines.

For example, we have an interface car as follows that consists of three abstract methods:

public interface Car { 
   void shape(); 
   void price(); 
   void color(); 
} 

Any class that implements this interface has to implement all the abstract methods of this interface unless it is an abstract class. Interfaces can only be implemented or extended by other interfaces, they cannot be instantiated.

Prior to Java 8, interfaces consisted only of abstract methods and final variables. In Java 8, interfaces may contain default and static methods as well.

Static method in an interface

The static method in an interface is similar to the static method in a class. Users cannot override them. So even if a class implements an interface, it cannot override a static method of an interface.

Like a static...

Lambda expressions


Lambda expressions are the brand new feature of Java. Lambda expressions are introduced in Java 8 and it is a step towards facilitating functional programming in Java.

Lambda expressions help you to define a method without declaring it. So, you do not need a name for the method, return-type, and so on. Lambda expressions, like anonymous inner classes, provide the way to pass behaviors to functions. Lambda, however, is a much more concise way of writing the code.

For example, the preceding example of an anonymous inner class can be converted to Lambda as follows:

public class MyFilterImpl { 
   public static void main(String[] args) { 
      File dir = new File("src/main/java"); 
      dir.list((dirname,name)->name.endsWith("java")); //Lambda Expression 
     } 
} 

Note that the signature of the Lambda expression is exactly matching the signature of the accept method in the FilenameFilter interface.

Note

One of the huge differences between Lambda and anonymous inner classes...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Perform big data processing with Spark—without having to learn Scala!
  • Use the Spark Java API to implement efficient enterprise-grade applications for data processing and analytics
  • Go beyond mainstream data processing by adding querying capability, Machine Learning, and graph processing using Spark

Description

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone. The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages. By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.

Who is this book for?

If you are a Java developer interested in learning to use the popular Apache Spark framework, this book is the resource you need to get started. Apache Spark developers who are looking to build enterprise-grade applications in Java will also find this book very useful.

What you will learn

  • Process data using different file formats such as XML, JSON, CSV, and plain and delimited text, using the Spark core Library.
  • Perform analytics on data from various data sources such as Kafka, and Flume using Spark Streaming Library
  • Learn SQL schema creation and the analysis of structured data using various SQL functions including Windowing functions in the Spark SQL Library
  • Explore Spark Mlib APIs while implementing Machine Learning techniques to solve real-world problems
  • Get to know Spark GraphX so you understand various graph-based analytics that can be performed with Spark

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 26, 2017
Length: 350 pages
Edition : 1st
Language : English
ISBN-13 : 9781787129429
Vendor :
Apache
Category :
Languages :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 26, 2017
Length: 350 pages
Edition : 1st
Language : English
ISBN-13 : 9781787129429
Vendor :
Apache
Category :
Languages :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 120.97
Apache Spark 2.x for Java Developers
€41.99
Building Data Streaming Applications with Apache Kafka
€36.99
Mastering Apache Spark 2.x
€41.99
Total 120.97 Stars icon

Table of Contents

11 Chapters
Introduction to Spark Chevron down icon Chevron up icon
Revisiting Java Chevron down icon Chevron up icon
Let Us Spark Chevron down icon Chevron up icon
Understanding the Spark Programming Model Chevron down icon Chevron up icon
Working with Data and Storage Chevron down icon Chevron up icon
Spark on Cluster Chevron down icon Chevron up icon
Spark Programming Model - Advanced Chevron down icon Chevron up icon
Working with Spark SQL Chevron down icon Chevron up icon
Near Real-Time Processing with Spark Streaming Chevron down icon Chevron up icon
Machine Learning Analytics with Spark MLlib Chevron down icon Chevron up icon
Learning Spark GraphX Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
(4 Ratings)
5 star 0%
4 star 0%
3 star 25%
2 star 50%
1 star 25%
Ray Brown Apr 07, 2020
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
The index needs a lot of help. I don't know if this is a packt publisher problem. The book has a few typos, but only annoying. Spark is a huge subject and this text -- used as a notebook so you can add your own material, combined with a course on Spark can get you started in the right direction. I've not seen any great texts that cover Spark thoroughly and do not require some research on your own. Spark is a changing product that can provide significant throughput increases with Machine Learning and Extract Transform and Load (ETL) systems. Regardless of which text you purchase you will be doing research on the web to find all your answers.
Amazon Verified review Amazon
Amazon Customer Oct 19, 2019
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
content not upto the mark
Amazon Verified review Amazon
mark berman Dec 21, 2017
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Lots of grammatical and spelling mistakes. Detracts from quality of this book. Suggest the authors engage a professional proof reader next time.
Amazon Verified review Amazon
phani kumar yadavilli Mar 31, 2018
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Some of the chapters are staggered and they are completely unreadable. Please check the screenshots for more details.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.