Z-score scaling
Z-score scaling, also known as standardization, is applied when you want to transform your data to have a mean of 0 and a standard deviation of 1. Z-score scaling is widely used in statistical analysis and machine learning, especially when algorithms such as k-means clustering or Principal Component Analysis (PCA) are employed.
Here is the formula for z-score:
X _ scaled =(X − mean(X)) / std(X)
Let’s continue with the house pricing prediction use case to showcase the z-score scaling. The code can be found at https://github.com/PacktPublishing/Python-Data-Cleaning-and-Preparation-Best-Practices/blob/main/chapter09/zscaler.py:
- We first perform z-score scaling:
data_zscore = (data - data.mean()) / data.std()
- Then, we print the dataset statistics:
print("\nDataset Statistics after Z-score Scaling:") print(data_zscore.describe())
- Finally, we visualize the distributions:
data_zscore.hist(figsize=(12, 10), bins=20, color='green...