Upsampling and convolutions are conducted in the decoder's softmax classifier, at the end of each pixel. The max-pooling indices at the corresponding encoder layer are recalled and upsampled during the upsampling process. Then, a K-class softmax classifier is used for predicting each pixel.
A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labeling was researched and developed by members of the Computer Vision and Robotics Group at the University of Cambridge, UK. Click on the following link for more details: http://mi.eng.cam.ac.uk/projects/segnet/.
In the next section, we'll cover the Pyramid Scene Parsing Network (PSPNet).