4. Loss functions
In SSD, there are thousands of anchor boxes. As discussed earlier in this chapter, the goal of object detection is to predict both the category and offsets of each anchor box. We can use the following loss functions for each prediction:
- - Categorical cross-entropy loss for ycls
- - L1 or L2 for yoff. Note that only positive anchor boxes contribute to L1 is also known as mean absolute error (MAE) loss, while L2 is also known as mean squared error (MSE) loss.
The total loss function is:
For each anchor box, the network predicts the following:
- ycls or the category or class in the form of a one-hot vector
- yoff = ((xomin,yomin),(xomax,yomax)) or the offsets in the form of pixel coordinates relative to anchor box.
For computational convenience, the offsets are better expressed in the form:
SSD is a supervised object detection algorithm...