- Which regularization strategy discussed in this chapter alleviates overfitting in deep models?
Dropout.
- Does adding a batch normalization layer make the learning algorithm have to learn more parameters?
Actually, no. For every layer in which dropout is used, there will be only two parameters for every neuron to learn: . If you do the math, the addition of new parameters is rather small.
- What other deep belief networks are out there?
Restricted Boltzmann machines, for example, are another very popular example of deep belief networks. Chapter 10, Restricted Boltzmann Machines, will cover these in more detail.
- How come deep autoencoders perform better on MNIST than on CIFAR-10?
Actually, we do not have an objective way of saying that deep autoencoders are better on these datasets. We are biased in thinking about it in terms of clustering and data labels. Our bias in thinking about the latent representations in Figure 8.12 and Figure 8.16 in terms of labels...