Initial Weight Values
The initial weight values are especially important in neural network training. What values are set as the initial weight values often determines the success or failure of neural network training. In this section, we will explain the recommended initial weight values, then conduct an experiment to check that they accelerate neural network learning.
How About Setting the Initial Weight Values to 0?
Later, we will look at a technique called weight decay, which reduces overfitting and improves generalization performance. In short, weight decay is a technique that reduces the values of the weight parameters to prevent overfitting.
If we want the weights to be small, starting with the smallest possible initial values is probably a good approach. Here, we use an initial weight value such as 0.01 * np.random.randn(10, 100)
. This small value is the value generated from the Gaussian distribution multiplied by 0.01—a Gaussian distribution with a standard...