You're reading from Python Data Cleaning and Preparation Best Practices A practical guide to organizing and handling data from various sources and formats using Python

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781837634743

Length 456 pages

Edition 1st Edition

Languages

Python

Concepts

Data Analysis

Author (1):

Maria Zervou

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1: Upstream Data Ingestion and Cleaning

2. Chapter 1: Data Ingestion Techniques FREE CHAPTER

3. Chapter 2: Importance of Data Quality

4. Chapter 3: Data Profiling – Understanding Data Structure, Quality, and Distribution

5. Chapter 4: Cleaning Messy Data and Data Manipulation

6. Chapter 5: Data Transformation – Merging and Concatenating

7. Chapter 6: Data Grouping, Aggregation, Filtering, and Applying Functions

8. Chapter 7: Data Sinks

9. Part 2: Downstream Data Cleaning – Consuming Structured Data

10. Chapter 8: Detecting and Handling Missing Values and Outliers

11. Chapter 9: Normalization and Standardization

12. Chapter 10: Handling Categorical Features

13. Chapter 11: Consuming Time Series Data

14. Part 3: Downstream Data Cleaning – Consuming Unstructured Data

15. Chapter 12: Text Preprocessing in the Era of LLMs

16. Chapter 13: Image and Audio Preprocessing with LLMs

17. Index

Why subscribe?

18. Other Books You May Enjoy

Z-score scaling

Z-score scaling, also known as standardization, is applied when you want to transform your data to have a mean of 0 and a standard deviation of 1. Z-score scaling is widely used in statistical analysis and machine learning, especially when algorithms such as k-means clustering or Principal Component Analysis (PCA) are employed.

Here is the formula for z-score:

X _ scaled =(X − mean(X)) / std(X)

Let’s continue with the house pricing prediction use case to showcase the z-score scaling. The code can be found at https://github.com/PacktPublishing/Python-Data-Cleaning-and-Preparation-Best-Practices/blob/main/chapter09/zscaler.py:

We first perform z-score scaling:

data_zscore = (data - data.mean()) / data.std()

Then, we print the dataset statistics:

print("\nDataset Statistics after Z-score Scaling:")
print(data_zscore.describe())

Finally, we visualize the distributions:

data_zscore.hist(figsize=(12, 10), bins=20, color='green...

The rest of the chapter is locked

You're reading from Python Data Cleaning and Preparation Best Practices A practical guide to organizing and handling data from various sources and formats using Python

Table of Contents (19) Chapters

Z-score scaling

Authors (1)

Personalised recommendations for you

You're reading from Python Data Cleaning and Preparation Best Practices A practical guide to organizing and handling data from various sources and formats using Python

Table of Contents (19) Chapters

Z-score scaling

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you