14. Conclusion
In this chapter, the concept of multi-scale single shot object detection was discussed. Using anchor boxes that are centered on the centroid of the receptive field patches, the ground truth bounding box offsets are computed. Instead of raw pixel error, normalized pixel error encourages a bounded range that is more suitable for optimization.
The ground truth class label is assigned per anchor box. If an anchor box does not overlap an object, it is assigned the background class and its offset is not included in the offset loss computation. Focal loss has been proposed to improve the category loss function. The default L1 offset loss function can be replaced by a smooth L1 loss function.
Evaluation on the test dataset shows that normalized offset using default loss functions results in the best performance for average precision and recall while mIoU is improved when offsets normalization is removed. The performance can be improved by increasing the number...