Mask R-CNN (https://arxiv.org/abs/1703.06870) was proposed by Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick at CVPR 2017. Mask R-CNN efficiently detects objects in an image using R-CNN, while simultaneously object segmentation tasks for each region of interest. So, the segmentation task works in parallel with classification and bounding box regression. The Mask R-CNN's high-level architecture is as follows:
The details of the Mask R-CNN implementation is as follows:
- Mask R-CNN follows the general two-stage principle of Faster R-CNN but with a modification—the first stage, RPN, remains the same as Faster R-CNN. The second stage, Fast R-CNN, which starts with feature extraction from Region of Interest (RoI), classification, and bounding-box regression, also outputs a binary mask for each RoI...