Statistical methods
Statistical methods provide valuable tools for identifying outliers and anomalies in our data, aiding in data preprocessing and decision-making. In this section, we’ll talk about how to use methods such as Z-scores, Interquartile Range (IQR), box plots, and scatter plots to uncover anomalies in our data.
Z-scores
Z-scores, also known as standard scores, are a statistical measure that indicates how many standard deviations a data point is away from the mean of the data. Z-scores are used to standardize data and allow for comparisons between different datasets, even if they have different units or scales. They are particularly useful in detecting outliers and identifying extreme values in a dataset. The formula to calculate the Z-score for a data point x in a dataset with mean μ and standard deviation σ is presented here:
Z = (x − μ) / σ
Here, the following applies:
- Z is the Z-score of the data point x ...