In previous chapters on object detection, we learned about leveraging anchor boxes/region proposals to perform object classification and detection. However, it involved a pipeline of steps to come up with object detection. DETR is a technique that leverages transformers to come up with an end-to-end pipeline that simplifies the object detection network architecture considerably. Transformers are one of the more popular and more recent techniques to perform various tasks in NLP. In this section, we will learn about the working details of transformers, DETR, and code it up to perform our task of detecting trucks versus buses.
The working details of transformers
Transformers have proven to be a remarkable architecture for sequence-to-sequence problems. Almost all NLP tasks, as of the time of writing this book, have state-of-the-art implementations that come from transformers. This class of networks uses only linear layers and softmax to create self-attention ...