Correcting outliers that cause wide uncertainty intervals
In the first type of outlier we looked at, the problem was that the seasonality was affected and forever changed yhat
in the forecast (if you remember from Chapter 2, Getting Started with Facebook Prophet, yhat
is the predicted value for future dates contained in Prophet's forecast
DataFrame). In this second problem, yhat
is minimally affected but the uncertainty intervals widen dramatically.
To simulate this issue, we need to modify our NatGeo data a bit. Let's say that Instagram introduced a bug in their code that capped likes to 100,000 per post. It somehow went unnoticed for a year before being fixed, but unfortunately, all likes above 100,000 were lost. Such an error would look like this:
You can simulate this new dataset yourself with the following code:
df3 = df.copy() df3.loc[df3['ds'].dt...