Decoding the standard autoencoder
Autoencoders are more of a concept than an actual neural network architecture. This is due to the fact that they can be based on different base neural network layers. When dealing with images, you build CNN autoencoders, and when dealing with text, you might want to build RNN autoencoders. When dealing with multimodal datasets with images, text, audio, numerical, and categorical data, well, you use a combination of different layers as a base. Autoencoders are mainly based on three components, called the encoder, the bottleneck layers, and the decoder. This is illustrated in Figure 5.1.
Figure 5.1 – The autoencoder concept
The encoder for a standard autoencoder typically takes in high-dimensional data and compresses it to an arbitrary scale smaller than the original data dimensions, which will result in what is known as a bottleneck representation, where it ties itself to the bottleneck, signifying a compact representation...