Correcting outliers that cause seasonality swings
We'll be using a new dataset in this chapter to look at outliers: the average number of likes per day of posts on National Geographic's Instagram account, @NatGeo
. This data was collected on November 21, 2019.
I've chosen this dataset because it exhibits several significant outliers, which are marked in the following plot:
Each dashed vertical line indicates a moment where the time series deviated significantly. The second line from the left indicates a radical trend change in the summer of 2015 but the other four lines indicate outliers, with the last two outliers spanning across wide time ranges. We'll specifically be looking at the line occurring in mid-2016, in August to be precise. This represents the most extreme outliers. The 2014 set of outliers can be safely ignored, as they do not affect the forecast...