The goal of the RoI pooling layer is simple—to take a part of the activation map of variable size and convert it into a fixed size. The input activation map sub-window is of size h × w. The target activation map is of size H × W. RoI pooling works by dividing its input into a grid where each cell is of size h/H × w/W.
Let's use an example. If the input is of size h × w = 5 × 4, and the target activation map is of size H × W = 2 × 2, then each cell should be of size 2.5 × 2. Because we can only use integers, we will make some cells of size 3 × 2 and others of size 2 × 2. Then, we will take the maximum of each cell:
Figure 5.13: Example of RoI pooling with an RoI of size 5 × 4 (from B3 to E7) and an output of size 2 × 2 (from J4 to K5)
An RoI pooling layer is very similar to a max-pooling layer. The difference is that RoI pooling works with inputs of variable size, while max-pooling...