In the previous section, Exploratory data analysis, we have discovered that there are 0 values in certain columns, which indicates missing values. We have also seen that the variables have different scales, which can negatively impact model performance. In this section, we will perform data preprocessing to handle these issues.
Data preprocessing
Handling missing values
First, let's call the isnull() function to check whether there are any missing values in the dataset:
print(df.isnull().any())
We'll see the following output:
It seems like there are no missing values in the dataset, but are we sure? Let's get a statistical summary of the dataset to investigate further:
print(df.describe())
The output is as...