As we saw in the first part of this chapter, encoding-decoding networks are trained to map data samples from one domain to another (for example, from noisy to noiseless, or from color to depth). Object segmentation can be seen as one such operation – the mapping of images from the color domain to the class domain. Given its value and context, we want to assign one of the target classes to each pixel of a picture, returning a label map with the same height and width.
Teaching encoders-decoders to take an image and return a label map still requires some consideration, which we will now discuss.