Autoencoders impose a bottleneck on the network, which enforces a compressed knowledge representation of the original input. If the bottleneck were not present, the network would simply learn to memorize the input values. This would mean that the model wouldn't generalize well on unseen data.
Let's take a look at the following diagram, which depicts an autoencoder bottleneck:
We want the model to be sensitive to the inputs so that it detects their signal, but not so much that it simply memorizes them and doesn't predict well on unseen data. As such, we need to construct a loss/cost function, which determines the optimal trade-off.
The loss function can be defined as per the following formula, where xi is the value at each node in the input layer and x̄i is the value of each node in the output layer. L(x, x̄) represents...