Activity 8: Handling Outliers and Missing Data
In this activity, we will identify and get rid of outliers. Here, we have a CSV file. The goal here is to clean the data by using the knowledge that we have learned about so far and come up with a nicely formatted DataFrame. Identify the type of outliers and their effect on the data and clean the messy data.
The steps that will help you solve this activity are as follows:
Read the visit_data.csv file.
Check for duplicates.
Check if any essential column contains NaN.
Get rid of the outliers.
Report the size difference.
Create a box plot to check for outliers.
Get rid of any outliers.
Note
The solution for this activity can be found on page 312.