Neural networks
A relatively recent development in time series forecasting is the use of Recurrent Neural Networks (RNNs). This was made possible with the development of the Long Short-Term Memory (LSTM) unit by Sepp Hochreiter and Jürgen Schmidhuber in 1997. Essentially, an LSTM unit allows a neural network to process a sequence of data, such as speech or video, instead of a single data point, such as an image.
A standard RNN is called recurrent because it has loops built into it, which is what gives it memory, that is, gives it access to previous information. A basic neural network can be trained to recognize an image of a pedestrian on a street by learning what a pedestrian looks like from previous images, but it cannot be trained to identify that a pedestrian in a video will soon be crossing the street based upon the pedestrian’s approach observed in previous frames of the video. It has no knowledge of the sequence of images that leads to the pedestrian stepping out into the road. Short-term memory is what the network needs temporarily to provide context, but that memory degrades quickly.
Early RNNs had a memory problem: it just wasn’t very long. In the sentence “airplanes fly in the …,” a simple RNN may be able to guess the next word will be sky, but with “I went to France for vacation last summer. That’s why I spent my spring learning to speak …,” it’s not so easy for the RNN to guess that French comes next; it understands that the word for a language should come next but has forgotten that the phrase started by mentioning France. An LSTM, though, has this necessary context. It gives the network’s short-term memory more longevity. In the case of time series data, where patterns can reoccur over long time scales, LSTMs can perform very well.
Time series forecasting with LSTMs is still in its infancy when compared to the other forecasting methods discussed here; however, it shows promise. One strong advantage over other forecasting techniques is the ability of neural networks to capture non-linear relationships, but as with any deep learning problem, LSTM forecasting requires a great deal of data and computing power and a long processing time.
Additionally, there are many decisions to be made regarding the architecture of the model and the hyperparameters to be used, which necessitate a very experienced forecaster. In most practical problems, where budget and deadlines must be considered, an ARIMA model is often the better choice.