Median forecasting
A good sanity check and an often underrated forecasting tool is medians. A median is a value separating the higher half of a distribution from the lower half; it sits exactly in the middle of the distribution. Medians have the advantage of removing noise, coupled with the fact that they are less susceptible to outliers than means, and the way they capture the midpoint of distribution means that they are also easy to compute.
To make a forecast, we compute the median over a look-back window in our training data. In this case, we use a window size of 50, but you could experiment with other values. The next step is to select the last 50 values from our X values and compute the median.
Take a minute to note that in the NumPy median function, we have to set keepdims=True
. This ensures that we keep a two-dimensional matrix rather than a flat array, which is important when computing the error. So, to make a forecast, we need to run the following code:
lookback = 50 lb_data = X_train...