Identify and Clean Outliers
When confronted with real-world data, we often see a specific thing in a set of records: there are some data points that do not fit with the rest of the records. They have some values that are too big, or too small, or completely missing. These kinds of records are called outliers.
Statistically, there is a proper definition and idea about what an outlier means. And often, you need deep domain expertise to understand when to call a particular record an outlier. However, in this present exercise, we will look into some basic techniques that are commonplace to flag and filter outliers in real-world data for day-to-day work.
Exercise 79: Outliers in Numerical Data
In this exercise, we will first construct a notion of an outlier based on numerical data. Imagine a cosine curve. If you remember the math for this from high school, then a cosine curve is a very smooth curve within the limit of [1, -1]:
To construct a cosine curve, execute the following command:
from math import...