You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Product type Paperback

Published in Sep 2023

Publisher Packt

ISBN-13 9781801070492

Length 318 pages

Edition 1st Edition

Languages

Python

Tools

MLflow

Concepts

Data Science

Author (1):

Brian Lipp

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1:Fundamental Data Knowledge

2. Chapter 1: Modern Data Processing Architecture FREE CHAPTER

3. Chapter 2: Understanding Data Analytics

4. Part 2: Data Engineering Toolset

5. Chapter 3: Apache Spark Deep Dive

6. Chapter 4: Batch and Stream Data Processing Using PySpark

7. Chapter 5: Streaming Data with Kafka

8. Part 3:Modernizing the Data Platform

9. Chapter 6: MLOps

10. Chapter 7: Data and Information Visualization

11. Chapter 8: Integrating Continous Integration into Your Workflow

12. Chapter 9: Orchestrating Your Data Workflows

13. Part 4:Hands-on Project

14. Chapter 10: Data Governance

15. Chapter 11: Building out the Groundwork

16. Chapter 12: Completing Our Project

17. Index

Why subscribe?

18. Other Books You May Enjoy

Batch processing

Batch-processing data is the most common form of data processing, and for most companies, it is their bread-and-butter approach to data. Batch processing is the method of data processing that is done at a “triggered” pace. This trigger may be manual or based on a schedule. Streaming, on the other hand, involves attempting to trigger something very quickly. This is also known as micro-batch processing. Streaming can exist in different ways on different systems. In Spark, streaming is designed to look and work like batch processing but without the need to constantly trigger the job.

In this section, we will set up some fake data for our examples using the Faker Python library. Faker will only be used for example purposes since it’s very important to the learning process. If you prefer an alternative way to generate data, please feel free to use that instead:

from faker import Faker
import pandas as pd
import random
fake = Faker()
def generate_data...

The rest of the chapter is locked

You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Table of Contents (19) Chapters

Batch processing

Authors (1)

Personalised recommendations for you

You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Table of Contents (19) Chapters

Batch processing

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you