At this point, you might wonder what is the purpose of autoencoders. Why do we bother learning a representation of the original input, only to reconstruct a similar output? The answer lies in the learned representation of the input. By forcing the learned representation to be compressed (that is, having smaller dimensions compared to the input), we essentially force the neural network to learn the most salient representation of the input. This ensures that the learned representation only captures the most relevant characteristics of the input, known as the latent representation.
As a concrete example of latent representations, take, for example, an autoencoder trained on the cats and dogs dataset, as shown in the following diagram:
An autoencoder trained on this dataset will eventually learn that the salient characteristics of cats and dogs are the the shape...