A comparison of the three YOLO versions is shown in this table:
|
YOLO |
YOLO v2 |
YOLO v3 |
Input size |
224 x 224 |
448 x 448 |
|
Framework |
Darknet trained on ImageNet—1,000. |
Darknet-19 19 convolution layers and 5 max pool layers. |
Darknet-53 53 convolutional layers. For detection, 53 more layers are added, giving a total of 106 layers. |
Small size detection |
It cannot find small images. |
Better than YOLO at detecting small images. |
Better than YOLO v2 at small image detection. |
|
|
Uses anchor boxes. |
Uses a residual block. |
The following diagram compares the architectures of YOLO v2 and YOLO v3:
The basic convolution layers are similar, but YOLO v3 carries out detection at three separate layers: 82, 94, and 106.
The most critical item that you should take from YOLO v3 is its object detection...