Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Deep Learning with Apache Spark

You're reading from   Hands-On Deep Learning with Apache Spark Build and deploy distributed deep learning applications on Apache Spark

Arrow left icon
Product type Paperback
Published in Jan 2019
Publisher Packt
ISBN-13 9781788994613
Length 322 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Guglielmo Iozzia Guglielmo Iozzia
Author Profile Icon Guglielmo Iozzia
Guglielmo Iozzia
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. The Apache Spark Ecosystem FREE CHAPTER 2. Deep Learning Basics 3. Extract, Transform, Load 4. Streaming 5. Convolutional Neural Networks 6. Recurrent Neural Networks 7. Training Neural Networks with Spark 8. Monitoring and Debugging Neural Network Training 9. Interpreting Neural Network Output 10. Deploying on a Distributed System 11. NLP Basics 12. Textual Analysis and Deep Learning 13. Convolution 14. Image Classification 15. What's Next for Deep Learning? 16. Other Books You May Enjoy Appendix A: Functional Programming in Scala 1. Appendix B: Image Data Preparation for Spark

Apache Spark fundamentals

This section covers the Apache Spark fundamentals. It is important to become very familiar with the concepts that are presented here before moving on to the next chapters, where we'll be exploring the available APIs.

As mentioned in the introduction to this chapter, the Spark engine processes data in distributed memory across the nodes of a cluster. The following diagram shows the logical structure of how a typical Spark job processes information:

Figure 1.1

Spark executes a job in the following way:

Figure 1.2

The Master controls how data is partitioned and takes advantage of data locality while keeping track of all the distributed data computation on the Slave machines. If a certain Slave machine becomes unavailable, the data on that machine is reconstructed on another available machine(s). In standalone mode, the Master is a single point of failure. This chapter's Cluster mode using different managers section covers the possible running modes and explains fault tolerance in Spark.

Spark comes with five major components:

Figure 1.3

These components are as follows:

  • The core engine.
  • Spark SQL: A module for structured data processing.
  • Spark Streaming: This extends the core Spark API. It allows live data stream processing. Its strengths include scalability, high throughput, and fault tolerance.
  • MLib: The Spark machine learning library.
  • GraphX: Graphs and graph-parallel computation algorithms.

Spark can access data that's stored in different systems, such as HDFS, Cassandra, MongoDB, relational databases, and also cloud storage services such as Amazon S3 and Azure Data Lake Storage.

You have been reading a chapter from
Hands-On Deep Learning with Apache Spark
Published in: Jan 2019
Publisher: Packt
ISBN-13: 9781788994613
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime