Handling missing data with interpolation
Another commonly used technique for imputing missing values is interpolation. The pandas library provides the DataFrame.interpolate()
method for more complex univariate imputation strategies.
For example, one of the interpolation methods available is linear interpolation. Linear interpolation can be used to impute missing data by drawing a straight line between the two points surrounding the missing value (in time series, this means for a missing data point, it looks at a prior past value and the next future value to draw a line between them). A polynomial interpolation, on the other hand, will attempt to draw a curved line between the two points. Hence, each method will have a different mathematical operation to determine how to fill in for the missing data.
The interpolation capabilities in pandas can be extended further through the SciPy library, which offers additional univariate and multivariate interpolations.
In this recipe...