Distributed deep RNNs
As you now have an understanding of a RNN, its applications, features, and architecture, we can now move on to discuss how to use this network as distributed architecture. Distributing RNN is not an easy task, and hence, only a few researchers have worked on this in the past. Although the primary concept of data parallelism is similar for all the networks, distributing RNNs among multiple servers requires some brainstorming and a bit tedious work too.
Recently, one work from Google [119] has tried to distribute recurrent networks in many servers in a speech recognition task. In this section, we will discuss this work on distributed RNNs with the help of Hadoop.
Asynchronous stochastic gradient descent (ASGD) can be used for large-scale training of a RNN. ASGD has particularly shown success in sequence discriminative training of the deep neural networks.
A two-layer deep Long short-term memory RNN is used to build the Long short-term memory network. Each Long short-term...