The Mask R-CNN algorithm (2017), by Girshick et al., includes a number of improvements compared with the Faster R-CNN algorithm for region-based object detection, with the following two primary contributions:
- ROI Pooling is replaced with an ROI Align module (which is more accurate).
- An additional branch is inserted (which receives the output from ROI Align, subsequently feeding it into two successive convolution layers. Output from the last convolutional layer forms the object mask) at the output of the ROI Align module.
The RoIAlign module provides a more precise correspondence between the regions of the feature map selected and those of the input image. Much more fine-grained alignment is needed for pixel-level segmentation, rather than just computing the bounding boxes. The following screenshot shows the architecture of Mask R-CNN:
In...