When we introduced object detection in Chapter 1, Computer Vision and Neural Networks, we explained that this process is often used as a preliminary step, providing image patches containing a single instance for further analysis. With this in mind, instance segmentation becomes a matter of two steps:
- Using an object detection model to return bounding boxes for each instance of target classes
- Feeding each patch to a semantic segmentation model to obtain the instance mask
If the predicted bounding boxes are accurate (each capturing a whole, single element), then the task of the segmentation network is straightforward—to classify which pixels in the corresponding patch belong to the captured class, and which pixels are part of the background/belong to another class.
This way of solving instance segmentation is advantageous, as we already have all the necessary tools to implement it (object detection and semantic segmentation models...