Handling missing data
While dealing with large datasets in the wild, you are bound to encounter missing data. If it is not part of the time series, it may be part of the additional information you collect and map. Before we jump the gun and fill it with a mean value or drop those rows, let's think about a few aspects:
- The first consideration should be whether the missing data we are worried about is missing or not. For that, we need to think about the
Data Generating Process
(DGP
) (the process that is generating the time series). As an example, let's look at sales at a local supermarket. You have been given thepoint-of-sale
(POS
) transactions for the last 2 years and you are processing the data into a time series. While analyzing the data, you found that there are a few products where there aren't any transactions for a few days. Now, what you need to think about is whether the missing data is missing or whether there is some information that this missingness is giving...