We covered the architecture of two object detection models. The first one, YOLO, is known for its inference speed. We went through the general architecture and how inference works, as well as the training procedure. We also detailed the loss used to train the model. The second one, Faster R-CNN, is known for its state-of-the-art performance. We analyzed the two stages of the network and how to train them. We also described how to use Faster R-CNN through the TensorFlow Object Detection API.
In the next chapter, we will extend object detection further by learning how to segment images into meaningful parts, as well as how to transform and enhance them.