Historically, object detection relied on a classical computer vision technique: image descriptors. To detect an object, for instance, a bike, you would start with several pictures of this object. Descriptors corresponding to the bike would be extracted from the image. Those descriptors would represent specific parts of the bike. When looking for this object, the algorithm would attempt to find the descriptors again in the target images.
To locate the bike in the image, the most commonly used technique was the floating window. Small rectangular areas of the images are examined, one after the other. The part with the most matching descriptors would be considered to be the one containing the object. Over time, many variations were used.
This technique presented a few advantages: it was robust to rotation and color changes, it did not require a lot of training data, and it worked with most objects. However, the level of accuracy was not satisfactory.
While neural networks...