Dealing with Outliers and Missing Values
Outliers and missing values are common issues we will encounter when analyzing various forms of data. They can lead to inaccurate or biased conclusions when not handled properly in our dataset. Hence, it is important to appropriately address them before analyzing our data.
Outliers are unusually high or low values within a dataset that deviate significantly from the rest of the data points in the dataset. Outliers occur due to a wide variety of reasons; the common reasons are covered in this chapter. On the other hand, missing values refer to the absence of data points within a specific variable or observation in our dataset. There are several reasons why they occur; the common reasons are also covered in this chapter.
When handling outliers and missing values, proper care needs to be taken because using the wrong technique can also lead to inaccurate or biased conclusions. An important step when handling missing values and outliers is...