You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Product type Paperback

Published in Sep 2023

Publisher Packt

ISBN-13 9781801070492

Length 318 pages

Edition 1st Edition

Languages

Python

Tools

MLflow

Concepts

Data Science

Author (1):

Brian Lipp

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1:Fundamental Data Knowledge

2. Chapter 1: Modern Data Processing Architecture FREE CHAPTER

3. Chapter 2: Understanding Data Analytics

4. Part 2: Data Engineering Toolset

5. Chapter 3: Apache Spark Deep Dive

6. Chapter 4: Batch and Stream Data Processing Using PySpark

7. Chapter 5: Streaming Data with Kafka

8. Part 3:Modernizing the Data Platform

9. Chapter 6: MLOps

10. Chapter 7: Data and Information Visualization

11. Chapter 8: Integrating Continous Integration into Your Workflow

12. Chapter 9: Orchestrating Your Data Workflows

13. Part 4:Hands-on Project

14. Chapter 10: Data Governance

15. Chapter 11: Building out the Groundwork

16. Chapter 12: Completing Our Project

17. Index

Why subscribe?

18. Other Books You May Enjoy

Summary

We have come a long way, friends! In this chapter, we covered batch-processing data, as well as streaming data. We embarked on a comprehensive journey through the world of data processing in Apache Spark with Python. We explored both batch processing and streaming data processing techniques, uncovering the strengths and nuances of each approach.

The chapter began with a deep dive into batch processing, where data is processed in fixed-sized chunks. We learned how to work with DataFrames in Spark, perform transformations and actions, and leverage optimizations for efficient data processing.

Moving on to the fascinating realm of stream processing, we learned about the nuances of Spark Structured Streaming, which enables the continuous processing of real-time data streams. Understanding the distinction between micro-batch processing and true streaming clarified how Spark processes streaming data effectively. This chapter highlighted the importance of defining schemas and...

The rest of the chapter is locked

You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Table of Contents (19) Chapters

Summary

Authors (1)

Personalised recommendations for you

You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Table of Contents (19) Chapters

Summary

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you