If we look into the learning method of neural network architectures, it usually consists of a lot of parameters and is optimized using a gradient-descent algorithm, which takes many iterative steps over many examples to perform well. The gradient descent algorithm, however, provides a decent performance in its models, but there are scenarios where the gradient-descent optimization algorithm fails. Let's look at such scenarios in the coming sections.
There are mainly two reasons why the gradient-descent algorithm fails to optimize a neural network when a limited amount of data is given:
- For each new task, the neural network has to start from a random initialization of its parameters, which results in late convergence. Transfer learning has been used to alleviate this problem by using a pretrained network, but it is constrained in that the data...