Convolutional layers
The convolutional layer is the most important building block of a CNN. It consists of a set of filters (also known as kernels or feature detectors), where each filter is applied across all areas of the input data. A filter is defined by a set of learnable weights.
To add some meaning to this laconic definition, we’ll start with the following figure:

Figure 4.1 – Convolution operation start
The preceding figure shows a two-dimensional input layer of a CNN. For the sake of simplicity, we’ll assume that this is the input layer, but it can be any layer of the network. We’ll also assume that the input is a grayscale image, and each input unit represents the color intensity of a pixel. This image is represented by a two-dimensional tensor.
We’ll start the convolution by applying a 3×3 filter of weights (again, a two-dimensional tensor) in the top-left corner of the image. Each input unit is associated...