You're reading from Databricks Certified Associate Developer for Apache Spark Using Python The ultimate guide to getting certified in Apache Spark using practical examples with Python

Product type Paperback

Published in Jun 2024

Publisher Packt

ISBN-13 9781804619780

Length 274 pages

Edition 1st Edition

Languages

Python

Tools

Apache Spark

Concepts

Data Engineering

Author (1):

Saba Shah

View More author details

Table of Contents (18) Chapters

Preface

1. Part 1: Exam Overview

2. Chapter 1: Overview of the Certification Guide and Exam FREE CHAPTER

3. Part 2: Introducing Spark

4. Chapter 2: Understanding Apache Spark and Its Applications

5. Chapter 3: Spark Architecture and Transformations

6. Part 3: Spark Operations

7. Chapter 4: Spark DataFrames and their Operations

8. Chapter 5: Advanced Operations and Optimizations in Spark

9. Chapter 6: SQL Queries in Spark

10. Part 4: Spark Applications

11. Chapter 7: Structured Streaming in Spark

12. Chapter 8: Machine Learning with Spark ML

13. Part 5: Mock Papers

14. Chapter 9: Mock Test 1

15. Chapter 10: Mock Test 2

16. Index

Why subscribe?

17. Other Books You May Enjoy

Why choose Apache Spark?

In this section, we will discuss the applications of Apache Spark and its features, such as speed, reusability, in-memory computations, and how Spark is a unified platform.

Speed

Apache Spark is one of the fastest processing frameworks for data available today. It beats Hadoop MapReduce by a large margin. The main reason is its in-memory computation capabilities and lazy evaluation. We will learn more about this when we discuss Spark architecture in the next chapter.

Reusability

Reusability is a very important consideration for large organizations making use of modern platforms. Spark can join batch and stream data seamlessly. Moreover, you can augment datasets with historical data to serve your use cases better. This gives a large historical view of data to run queries or build modern analytical systems.