Part 4: Spark Applications
In this part, we will cover Spark’s Structured Streaming, focusing on real-time data processing with concepts such as event time processing, watermarking, triggers, and output modes. Practical examples will illustrate building and deploying streaming applications using Structured Streaming. Additionally, we will delve into Spark ML, Spark’s machine learning library, exploring supervised and unsupervised techniques, model building, evaluation, and hyperparameter tuning across various algorithms. Practical examples will demonstrate Spark ML's application in real-world machine learning tasks, crucial in contemporary data science. While not included in the Spark certification exam, understanding these concepts is essential in modern data engineering.
This part has the following chapters:
- Chapter 7, Structured Streaming in Spark
- Chapter 8, Machine Learning with Spark ML