You're reading from Data Ingestion with Python Cookbook A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process

Product type Paperback

Published in May 2023

Publisher Packt

ISBN-13 9781837632602

Length 414 pages

Edition 1st Edition

Languages

Python

Tools

Apache Airflow

Concepts

Data Engineering

Author (1):

Gláucia Esppenchutz

View More author details

Table of Contents (17) Chapters

Preface

1. Part 1: Fundamentals of Data Ingestion

2. Chapter 1: Introduction to Data Ingestion FREE CHAPTER

3. Chapter 2: Principals of Data Access – Accessing Your Data

4. Chapter 3: Data Discovery – Understanding Our Data before Ingesting It

5. Chapter 4: Reading CSV and JSON Files and Solving Problems

6. Chapter 5: Ingesting Data from Structured and Unstructured Databases

7. Chapter 6: Using PySpark with Deﬁned and Non-Deﬁned Schemas

8. Chapter 7: Ingesting Analytical Data

9. Part 2: Structuring the Ingestion Pipeline

10. Chapter 8: Designing Monitored Data Workﬂows

11. Chapter 9: Putting Everything Together with Airﬂow

12. Chapter 10: Logging and Monitoring Your Data Ingest in Airﬂow

13. Chapter 11: Automating Your Data Ingestion Pipelines

14. Chapter 12: Using Data Observability for Debugging, Error Handling, and Preventing Downtime

15. Index

Why subscribe?

16. Other Books You May Enjoy

Creating a SparkSession for PySpark

Previously introduced in Chapter 1, PySpark is a Spark library that was designed to work with Python. PySpark uses a Python API to write Spark functionalities such as data manipulation, processing (batch or real-time), and machine learning.

However, before ingesting or processing data using PySpark, we must initialize a SparkSession. This recipe will teach us how to create a SparkSession using PySpark and explain its importance.

Getting ready

We first need to ensure we have the correct PySpark version. We installed PySpark in Chapter 1; however, checking if we are using the correct version is always good. Run the following command:

$ pyspark –version

You should see the following output:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _...

The rest of the chapter is locked

You're reading from Data Ingestion with Python Cookbook A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process

Table of Contents (17) Chapters

Creating a SparkSession for PySpark

Getting ready

Authors (1)

Personalised recommendations for you

You're reading from Data Ingestion with Python Cookbook A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process

Table of Contents (17) Chapters

Creating a SparkSession for PySpark

Getting ready

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you