Classifying pixels
As we have already discussed, the desired output of a model performing semantic segmentation is an image with each of its pixels assigned a label of its most likely class (or even a specific instance of a class). Throughout this book, we have also seen that layers of a deep neural network learn features that are activated when a corresponding input that satisfies the particular feature is detected. We can visualize these activations using a technique called class activation maps (CAMs). The output produces a heatmap of class activations over the input image; the heatmap consists of a matrix of scores associated with a specific class, essentially giving us a spatial map of how intensely the input region activates a specified class. The following figure shows an output of a CAM visualization for the class cat. Here, you can see that the heatmap portrays what the model considers important features (and therefore regions) for this class:
Note
The preceding figure was produced...