Mask R-CNN (https://arxiv.org/abs/1703.06870) was proposed by Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick at CVPR 2017. Mask R-CNN efficiently detects objects in an image using R-CNN, while simultaneously object segmentation tasks for each region of interest. So, the segmentation task works in parallel with classification and bounding box regression. The Mask R-CNN's high-level architecture is as follows:
![](https://static.packt-cdn.com/products/9781838827069/graphics/assets/2884d889-f0b0-44e5-83cf-22ec8797f59a.png)
The details of the Mask R-CNN implementation is as follows:
- Mask R-CNN follows the general two-stage principle of Faster R-CNN but with a modification—the first stage, RPN, remains the same as Faster R-CNN. The second stage, Fast R-CNN, which starts with feature extraction from Region of Interest (RoI), classification, and bounding-box regression, also outputs a binary mask for each RoI...