Summary
This chapter introduced pandas tools for identifying outliers in our data. We explored a variety of univariate, bivariate, and multivariate approaches to detect observations sufficiently out of range, or otherwise unusual enough, to distort our analysis. These approaches included using the interquartile range to identify extreme values, investigating relationships with a correlated variable, and using parametric and non-parametric multivariate techniques such as linear regression and KNN respectively. We also saw how visualizations can help us get a better feel for how a variable is distributed, and how it moves with a correlated variable. We will go into much greater detail on how to create and interpret visualizations in the next chapter.
Join our community on Discord
Join our community’s Discord space for discussions with the author and other readers:
https://discord.gg/p8uSgEAETX