Previously, we learned that when the input value is large, the variation of the Sigmoid output doesn't make much difference when the weight values change considerably.
Now, let's consider the opposite scenario, where the input values are very small:
When the input value is very small, the Sigmoid output changes slightly, making a big change to the weight value.
Additionally, in the Scaling the input data section, we saw that large input values have a negative effect on training accuracy. This suggests that we can neither have very small nor very big values for our input.
Along with very small or very big values in input, we may also encounter a scenario where the value of one of the nodes in the hidden layer could result in either a very small number or a very large number, resulting in the same issue we saw previously with the weights connecting the hidden layer to the next layer.
Batch normalization comes to the rescue in such...