Handling Missing Data
In data analysis, data science, and data engineering, the majority of time is spent doing data manipulations and cleaning. Your data could be very messy in that it contains a lot of missing data that you need to treat with care. To compute whatever you need, you may need to identify missing data and decide what to do with it.
There are two approaches to handling missing data. One is to substitute missing data with alternate values. Another way is to simply drop records that contain missing data. However, your decision to handle missing data should align with your end goal. That helps identify the appropriate approach as well as values with which you may want to replace missing data.
We’ll cover null
and Not a Number (NaN) values in this chapter. Polars treats them differently and NaN values are technically a type of floating point data rather than missing data. That also means that there are different methods and expressions for null
and NaN.
You...