A convolutional layer with N sets of different neurons is thus defined by N weight matrices (also called filters or kernels) of shape D × k × k (when the filters are square), and N bias values. Therefore, this layer only has N × (Dk2 + 1) values to train. A fully connected layer with similar input and output dimensions would need (H × W × D) × (Ho × Wo × N) parameters instead. As we demonstrated previously, the number of parameters for fully connected layers is influenced by the dimensionality of the data, whereas this does not affect the parameter numbers for convolutional layers.
This property makes convolutional layers really powerful tools in computer vision for two reasons. First, as implied in the previous paragraph, it means we can train networks for larger input images without impacting the number of parameters we would need to tune. Second, this also means that convolutional layers can be applied to...