Deep learning algorithms without pre-training
In the previous chapter, you learned that layer-wise training with pre-training was a breakthrough for DBN and SDA. The reason why these algorithms need pre-training is because an issue occurs where an output error gradually vanishes and doesn't work well in neural networks with simple piled-up layers (we call this the vanishing gradient problem). The deep learning algorithm needs pre-training whether you want to improve the existing method or reinvent it—you might think of it like that.
However, actually, the deep learning algorithms in this chapter don't have a phase of pre-training, albeit in the deep learning algorithm without pre-training, we can get a result with higher precision and accuracy. Why is such a thing possible? Here is a brief reason. Let's think about why the vanishing gradient problem occurs—remember the equation of backpropagation? A delta in a layer is distributed to all the units of a previous layer by literally propagating...