The semantic segmentation network generally consists of an encoder-decoder network. The encoder produces high-level features using convolution, while the decoder helps in interpreting these high-level features using classes. The encoder is a common encoding mechanism that is used by pre-trained networks and the decoder weight that's learned while training a segmentation network. The following diagram shows the architecture of the encoder-decoder-based FCN architecture for semantic segmentation:
Fig 8.2: Semantic segmentation architecture
You can check out the preceding diagram at the following link: https://www.mdpi.com/2313-433X/4/10/116/pdf.
The encoder gradually reduces the spatial dimension with the help of pooling layers, while the decoder recovers the features of the object and spatial dimensions. You can read more about semantic segmentation in the paper on ECRU: An Encoder-Decoder-Based Convolution Neural...