Removing outliers
A simple approach to handling outliers is to remove them completely before analyzing our dataset; this is also known as trimming. A major setback of this approach is the fact that we may lose some useful insights, especially if the outliers were legitimate. Therefore, it is very important to understand the context of the dataset before removing outliers. In certain scenarios, edge cases exist, and these cases can easily be tagged as outliers when the context isn’t properly understood. Edge cases are typically scenarios that are unlikely to occur. However, they can reveal important insights that will be overlooked if they are removed.
Trimming can be useful when the distribution of the data is important and we need to retain it. It is also useful when we have a minimal number of outliers.
We will explore how to remove outliers from our dataset using the drop
method in pandas
to achieve this.
Getting ready
We will work with the Amsterdam House Prices...