Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Spark 2.x for Java Developers

You're reading from   Apache Spark 2.x for Java Developers Explore big data at scale using Apache Spark 2.x Java APIs

Arrow left icon
Product type Paperback
Published in Jul 2017
Publisher Packt
ISBN-13 9781787126497
Length 350 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Sourav Gulati Sourav Gulati
Author Profile Icon Sourav Gulati
Sourav Gulati
Sumit Kumar Sumit Kumar
Author Profile Icon Sumit Kumar
Sumit Kumar
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Introduction to Spark FREE CHAPTER 2. Revisiting Java 3. Let Us Spark 4. Understanding the Spark Programming Model 5. Working with Data and Storage 6. Spark on Cluster 7. Spark Programming Model - Advanced 8. Working with Spark SQL 9. Near Real-Time Processing with Spark Streaming 10. Machine Learning Analytics with Spark MLlib 11. Learning Spark GraphX

Exploring the Spark ecosystem

Apache Spark is considered as the general purpose system in the big data world. It consists of a lot of libraries that help to perform various analytics on your data. It provides built-in libraries to perform batch analytics, perform real-time analytics, apply machine learning algorithms on your data, and much more.

The following are the various built-in libraries available in Spark:

  • Spark Core: As its name says, the Spark Core library consists of all the core modules of Spark. It consists of the basics of the Spark model, including RDD and various transformation and actions that can be performed with it. Basically, all the batch analytics that can be performed with the Spark programming model using the MapReduce paradigm is the part of this library. It also helps to analyze different varieties of data.
  • Spark Streaming: The Spark Streaming library consists of modules that help users to run near real-time streaming processing on the incoming data. It helps to handle the velocity part of the big data territory. It consists of a lot of modules that help to listen to various streaming sources and perform analytics in near real time on the data received from those sources.
  • Spark SQL: The Spark SQL library helps to analyze structured data using the very popular SQL queries. It consists of a dataset library which helps to view the structured data in the form of a table and provides the capabilities of running SQL on top of it. The SQL library consists of a lot of functions which are available in SQL of RDBMS. It also provides an option to write your own function, called the User Defined Function (UDF).
  • MLlib: Spark MLlib helps to apply various machine learning techniques on your data, leveraging the distributed and scalable capability of Spark. It consists of a lot of learning algorithms and utilities and provides algorithms for classification, regression, clustering, decomposition, collaborative filtering, and so on.
  • GraphX: The Spark GraphX library provides APIs for graph-based computations. With the help of this library, the user can perform parallel computations on graph-based data. GraphX is the one of the fastest ways of performing graph-based computations.
  • Spark-R: The Spark R library is used to run R scripts or commands on Spark Cluster. This helps to provide distributed environment for R scripts to execute. Spark comes with a shell called sparkR which can be used to run R scripts on Spark Cluster. Users which are more familiar with R, can use tool such as RStudio or Rshell and can execute R scripts which will run on the Spark cluster.
You have been reading a chapter from
Apache Spark 2.x for Java Developers
Published in: Jul 2017
Publisher: Packt
ISBN-13: 9781787126497
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image