Deleting rows and columns containing missing data
One of the ways to manage missing data is to simply drop records that contain missing data. You could also drop columns if you decide they’re not useful given how many rows are missing.
In this recipe, we’ll cover how to delete rows and columns that contain missing data.
Getting ready
Read the same dataset that we used in the previous recipe:
df = pl.read_csv('temperatures.csv')
How to do it...
Here are the ways to delete rows and columns that contain missing data:
- Delete rows that contain
null
values in a whole DataFrame using.drop_nulls()
. Apply the.null_count()
method to check that it worked:df.drop_nulls().null_count()
Here’s the output of the preceding code.
Figure 5.8 – DataFrame after dropping nulls
- Delete rows with
null
values for selected columns:df.select( pl.col('avg_temp_celsius') ...