Most of the natural images that we come across in everyday life don't consist of a single object covering the whole image. Often, it's a mixture of different objects located at different positions. In such cases, simple object recognition is not going to work. Hence, detecting various objects that are present in an image along with their position becomes challenging. This is where deep learning shines!
So, object detection can be broken down into two parts:
- Object localization: Determining the x, y co-ordinates of an object in the image
- Object recognition: Determining whether the location has an object or not and, if so, what object it is
Thus, object detection networks have two separate sub-networks to perform these two tasks. The first network generates different regions of interest in the image while the second network classifies...