Sparse autoencoders
In general, standard autoencoders produce dense internal representations. This means that most of the values are different from zero. In some cases, however, it's more useful to have sparse code that can better represent the atoms belonging to a dictionary. In this case, if , , we can consider each sample as the overlap of specific atoms weighted accordingly. To achieve this objective, we can simply apply an L1 penalty to the code layer, as explained in Chapter 2, Loss functions and Regularization. The loss function for a single sample, therefore, becomes the following:
In this case, we need to consider the extra hyperparameter α, which must be tuned to increase the sparsity without a negative impact on the accuracy. As a general rule of thumb, I suggest starting with a value equal to 0.01 and then reducing it until the desired result has been achieved. In most cases, higher values yield very poor performance, and therefore they are generally avoided...