With models trained for object segmentation, the softmax output represents for each pixel the probability that it belongs to one of N classes. However, it does not express whether two pixels or blobs of pixels belong to the same instance of a class. For example, given the predicted label map shown in Figure 6-10, we have no way of counting the number of tree or building instances.
In the following subsection, we will present two different ways of achieving instance segmentation by extending solutions for two related tasks that we've tackled already—object segmentation and object detection.