Researchers have found that Convolutional Neural Networks (CNN) work best with images because they can extract the spatial information hidden in the image. It is thus natural to assume that if encoder and decoder network consists of CNN, it will work better than the rest of the autoencoders, and so we have Convolutional Autoencoders (CAE). In Chapter 4, Convolutional Neural Networks, the process of convolution and max-pooling was explained, which we will use as a base to understand how convolutional autoencoders work.
A CAE is one where both the encoder and decoder are CNN networks. The convolutional network of the encoder learns to encode the input as a set of signals and then the decoder CNN tries to reconstruct the input from them. They work as general purpose feature extractors and learn the optimal filters needed to capture the features from the...