A convolutional layer is first defined by its number of filters, N, by its input depth, D (that is, the number of input channels), and by its filter/kernel size, (kH, kW). As square filters are commonly used, the size is usually simply defined by k (though, as mentioned earlier, non-square filters are sometimes considered).
However, as mentioned previously, convolutional layers actually differ from the homonym mathematical operation. The operation between the input and their filters can take several additional hyperparameters, affecting the way the filters are sliding over the images.
First, we can apply different strides with which the filters are sliding. The stride hyperparameter thus defines whether the dot product between the image patches and the filters should be computed at every position when sliding (stride = 1), or every s position (stride = s). The larger the stride, the sparser the resulting feature maps.
Images can also be zero-padded before...