GRUs are a cousin to the long short-term memory recurrent neural networks. Both LSTM and GRU networks have additional parameters that control when and how their internal memory is updated. Both can capture long- and short-term dependencies in sequences. The GRU networks, however, involve less parameters than their LSTM cousins, and as a result, are faster to train. The GRU learns how to use its reset and forget gates in order to make longer term predictions while enforcing memory protection. Let's look at a simple diagram of a GRU:
GRU