As we saw in previous chapters, such as Chapter 3, Modern Neural Networks, and Chapter 4, Influential Classification Tools, CNNs are great feature extractors. Their convolutional layers convert their input tensors into more and more high-level feature maps, while their pooling layers gradually down-sample the data, leading to compact and semantically rich features. Therefore, CNNs make for performant encoders.
However, how could this process be reversed to decode these low-dimensional features into full images? As we will present in the following paragraphs, the same way convolutions and pooling operations replaced dense layers for the encoding of images, reverse operations—such as transposed convolution (also known as deconvolutions), dilated convolutions, and unpooling—were developed to better decode features.