You're reading from Essential PySpark for Scalable Data Analytics A beginner's guide to harnessing the power and ease of PySpark 3

Product type Paperback

Published in Oct 2021

Publisher Packt

ISBN-13 9781800568877

Length 322 pages

Edition 1st Edition

Languages

Python

Tools

PySpark

Concepts

Big Data

Author (1):

Sreeram Nudurupati

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1: Data Engineering

2. Chapter 1: Distributed Computing Primer FREE CHAPTER

3. Chapter 2: Data Ingestion

4. Chapter 3: Data Cleansing and Integration

5. Chapter 4: Real-Time Data Analytics

6. Section 2: Data Science

7. Chapter 5: Scalable Machine Learning with PySpark

8. Chapter 6: Feature Engineering – Extraction, Transformation, and Selection

9. Chapter 7: Supervised Machine Learning

10. Chapter 8: Unsupervised Machine Learning

11. Chapter 9: Machine Learning Life Cycle Management

12. Chapter 10: Scaling Out Single-Node Machine Learning Using PySpark

13. Section 3: Data Analysis

14. Chapter 11: Data Visualization with PySpark

15. Chapter 12: Spark SQL Primer

16. Chapter 13: Integrating External Tools with Spark SQL

17. Chapter 14: The Data Lakehouse

18. Other Books You May Enjoy

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The readStream() method of the DataStreamReader object is used to create the streaming DataFrame."

A block of code is set as follows:

lines = sc.textFile("/databricks-datasets/README.md") 
words = lines.flatMap(lambda s: s.split(" ")) 
word_tuples = words.map(lambda s: (s, 1)) 
word_count = word_tuples.reduceByKey(lambda x, y: x + y) 
word_count.take(10) 
word_count.saveAsTextFile("/tmp/wordcount.txt")

Any command-line input or output is written as follows:

%fs ls /FileStore/shared_uploads/delta/online_retail

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "There can be multiple Map stages followed by multiple Reduce stages."

Tips or important notes

Appear like this.