Tuning and optimizing CNN hyperparameters
The following hyperparameters are very important and must be tuned to achieve optimized results.
- Dropout: Used for random omission of feature detectors to prevent overfitting
- Sparsity: Used to force activations of sparse/rare inputs
- Adagrad: Used for feature-specific learning-rate optimization
- Regularization: L1 and L2 regularization
- Weight transforms: Useful for deep autoencoders
- Probability distribution manipulation: Used for initial weight generation
- Gradient normalization and clipping
Another important question is: when do you want to add a max pooling layer rather than a convolutional layer with the same stride? A max pooling layer has no parameters at all, whereas a convolutional layer has quite a few. Sometimes, adding a local response normalization layer that makes the neurons that most strongly activate inhibit neurons at the same location but in neighboring feature maps, encourages different feature maps to specialize and pushes them apart, forcing...