Filling in missing data
One way for you to handle missing data is by filling it with substitutions. This is also called imputation. If you’re building a machine learning model or conducting a statistical test, how you fill your missing data can affect your model output. Knowing the various ways of filling in missing data gives you the options from which you can choose the best approach for your particular use case.
In this recipe, we’ll look at how to fill missing data with a constant value, strategy, interpolation, and expressions.
Getting ready
We’ll be using the same temperature dataset we’ve used throughout this chapter. Run the following code to read the CSV file:
df = pl.read_csv('../data/temperatures.csv')
Note
We will only cover how to fill null
values and won’t cover how to fill NaN values as the functionality of the methods and expressions are also available for NaN values. For instance, the .fill_null()
expression...