Excellent at mitigating the vanishing gradients problem, the GRU is a good choice for modeling long-term dependencies such as grammar, punctuation, and word morphology:
def GRU_stacked_model(): model = Sequential() model.add(GRU(128, input_shape=(seq_len, len(characters)), return_sequences=True)) model.add(GRU(128)) model.add(Dense(len(characters), activation='softmax')) return model
Just like the SimpleRNN, we define the dimensions of the input at the first layer and return a 3D tensor output to the second GRU layer, which will help retain more complex time-dependent representations that are present in our training data. We also stack two GRU layers on top of each other to see what the increased representational power of our model produces:
Hopefully, this architecture results in realistic albeit novel sequences of text that even a...