First released in 2015, YOLO outperformed almost all other object detection architectures, both in terms of speed and accuracy. Since then, the architecture has been improved several times. In this chapter, we will draw our knowledge from the following three papers:
- You Only Look Once: Unified, real-time object detection (2015), Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi
- YOLO9000: Better, Faster, Stronger (2016), Joseph Redmon and Ali Farhadi
- YOLOv3: An Incremental Improvement (2018), Joseph Redmon and Ali Farhadi
For the sake of clarity and simplicity, we will not describe all the small details that allow YOLO to reach its maximum performance. Instead, we will focus on the general architecture of the network. We'll provide an implementation of YOLO so that you can compare our architecture with code. It is available in the chapter's repository.
This implementation has been designed to be easy to read and understand. We invite those readers...