The following hyperparameters are very important and must be tuned to achieve optimized results.
- Dropout: Used for random omission of feature detectors to prevent overfitting
- Sparsity: Used to force activations of sparse/rare inputs
- Adagrad: Used for feature-specific learning-rate optimization
- Regularization: L1 and L2 regularization
- Weight transforms: Useful for deep autoencoders
- Probability distribution manipulation: Used for initial weight generation
- Gradient normalization and clipping
Another important question is: when do you want to add a max pooling layer rather than a convolutional layer with the same stride? A max pooling layer has no parameters at all, whereas a convolutional layer has quite a few. Sometimes, adding a local response normalization layer that makes the neurons that most strongly activate inhibit neurons at the...