Dropout
If there's a problem with the network being tied densely, just force it to be sparse. Then the vanishing gradient problem won't occur and learning can be done properly. The algorithm based on such an idea is the dropout algorithm. Dropout for deep neural networks was introduced in Improving neural networks by preventing co adaptation of feature detectors (Hinton, et. al. 2012, http://arxiv.org/pdf/1207.0580.pdf) and refined in Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Srivastava, et. al. 2014, https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf). In dropout, some of the units are, literally, forcibly dropped while training. What does this mean? Let's look at the following figures—firstly, neural networks:
There is nothing special about this figure. It is a standard neural network with one input layer, two hidden layers, and one output layer. Secondly, the graphical model can be represented as follows by applying dropout to this...