Using label smoothing to increase performance
One of the constant battles we have to fight against in machine learning is overfitting. There are many techniques we can use to prevent a model from losing generalization power, such as dropout, L1 and L2 regularization, and even data augmentation. A recent addition to this group is label smoothing, a more forgiving alternative to one-hot encoding.
Whereas in one-hot encoding we represent each category as a binary vector where the only non-zero element corresponds to the class that's been encoded, with label smoothing, we represent each label as a probability distribution where all the elements have a non-zero probability. The one with the highest probability, of course, is the one that corresponds to the encoded class.
For instance, a smoothed version of the [0, 1, 0] vector would be [0.01, 0.98, 0.01].
In this recipe, we'll learn how to use label smoothing. Keep reading!
Getting ready
Install Pillow
, which...