Identifying missing data
The first step in handling missing data is to identify whether there is missing data and how many instances of it you have in your data. Polars provides several ways to accomplish that.
Getting ready
We’ll be using the NumPy library to generate NaN values. Note that you can still generate NaN values in native Python with code such as float('nan')
.
Install numpy
with the following command if you haven’t already as Polars’ dependency:
>>> pip install numpy
We’ll be using a dataset that we manually create. Make sure to run the following code before proceeding to the next steps:
from datetime import date import numpy as np date_col = pl.date_range(date(2023, 1, 1), date(2023, 1, 15), '1d', eager=True) avg_temp_c_list = [-3,None,6,-1,np.nan,6,4,None,1,2,np.nan,7,9,-2,None] df = pl.DataFrame({ 'date': date_col, 'avg_temp_celsius...