On one extreme, you may always try to limit the network's learning capacity by sticking to shallow layers and having a very small latent space dimension. This approach may even provide an excellent baseline for benchmarking against more complex methods. However, other methods exist that may allow us to benefit from the representational power of deeper layers, without being penalized for issues of overcapacity up to a certain extent. Such methods include modifying the loss function that's used by an autoencoder so as to incentivize some representational criteria for the latent space being learned by the network.
For example, instead of simply copying the inputs, we may require our loss function to account for the sparsity of the latent space, favoring more rich representations over others. As we will see, we may even consider...