These pooling layers are a bit peculiar because they do not have any trainable parameters. Each neuron simply takes the values in its window (the receptive field) and returns a single output, computed from a predefined function. The two most common pooling methods are max-pooling and average-pooling. Max-pooling layers return only the maximum value at each depth of the pooled area (refer to Figure 3.5), and average-pooling layers compute the average at each depth of the pooled area (refer to Figure 3.6).
Pooling layers are commonly used with a stride value equal to the size of their window/kernel size, in order to apply the pooling function over non-overlapping patches. Their purpose is to reduce the spatial dimensionality of the data, cutting down the total number of parameters needed in the network, as well as its computation time. For instance, a pooling layer with a 2 × 2 window size and stride of 2 (that is, k = 2 and...