Limitations of neural networks
In this section, we will discuss in detail the issues faced by neural networks, which will become the stepping stone for building deep learning networks.
Vanishing gradients, local optimum, and slow training
One of the major issues with neural networks is the problem of "vanishing gradient" (References [8]). We will try to give a simple explanation of the issue rather than exploring the mathematical derivations in depth. We will choose the sigmoid activation function and a two-layer neural network, as shown in the following figure, to demonstrate the issue:
As we saw in the activation function description, the sigmoid function squashes the output between the range 0 and 1. The derivative of the sigmoid function g'(a) = g(a)(1 – g(a)) has a range between 0 and 0.25. The goal of learning is to minimize the output loss, that is, . In general, the output error does not go to 0, so maximum iterations; a user...