Understanding the DEtection TRansformer
DEtection TRansformer (DETR, End-to-End Object Detection with Transformers, https://arxiv.org/abs/2005.12872) introduces a novel transformer-based object detection algorithm.
A quick recap of the YOLO object detection algorithm
We first introduced YOLO in Chapter 5. It has three main components. The first is the backbone – that is, a CNN model that extracts features from the input image. Next is the neck – an intermediate part of the model that connects the backbone to the head. Finally, the head outputs the detected objects using a multi-step algorithm. More specifically, it splits the image into a grid of cells. Each cell contains several pre-defined anchor boxes with different shapes. The model predicts whether any of the anchor boxes contains an object and the coordinates of the object’s bounding box. Many of the boxes will overlap and predict the same object. The model filters the overlapping objects with the help...