Tuning LSTM hyperparameters and GRU
Nevertheless, I still believe it is possible to attain about 100% accuracy with more LSTM layers. The following are the hyperparameters that I would still try to tune to see the accuracy:
// Hyper parameters for the LSTM training val learningRate = 0.001f val trainingIters = trainingDataCount * 1000 // Loop 1000 times on the dataset val batchSize = 1500 // I would set it 5000 and see the performance val displayIter = 15000 // To show test set accuracy during training val numLstmLayer = 3 // 5, 7, 9 etc.
There are many other variants of the LSTM cell. One particularly popular variant is the Gated Recurrent Unit (GRU) cell, which is a slightly dramatic variation on the LSTM. It also merges the cell state and hidden state and makes some other changes. The resulting model is simpler than standard LSTM models and has been growing increasingly popular. This cell was proposed by Kyunghyun Cho et al. in a 2014 paper that also introduced the encoder-decoder network...