Handling missing values
Data is messy and often includes incomplete values for data points. Failing to fill in missing values or remove rows containing them can cause errors downstream when training machine learning models. Because of this, you sometimes will need to replace the missing value with an actual value – or choose to remove rows with missing values entirely.
Using a DataFrame
, you can detect missing values in any column by displaying the column in your notebook and looking at the NullValues
property in the result.
The following C# code cell displays the height_in_cm
column:
dfPlayers["height_in_cm"]
This outputs the information about the column, including a NullCount
value of 2, as shown in Figure 4.13:
Figure 4.13 – Two null values in the height_in_cm column
When examining this data, we know that there can’t really be any players who have a null height. These null values represent incomplete records...