Understanding the impact of batch normalization
Previously, we learned that when the input value is large, the variation of the sigmoid output doesn’t make much difference when the weight values change considerably.
Now, let’s consider the opposite scenario, where the input values are very small:
Figure 3.19: Sigmoid value for the different values of input and weight
When the input value is very small, the sigmoid output changes slightly, requiring a big change to the weight value to achieve optimal results.
Additionally, in the Scaling the input data section, we saw that large input values have a negative effect on training accuracy. This suggests that we can neither have very small nor very big values for our input.
Along with very small or very big values in input, we may also encounter a scenario where the value of one of the nodes in the hidden layer could result in either a very small number or a very large number, resulting in the same...