R-CNN achieved a more significant improvement in object detection than any of the previous methods, but it was slow, as it performed a forward pass on the CNN for every region proposal. Moreover, training was a multistage pipeline consisting of first optimizing the CNN for region proposal, then running SVMs for object classification, followed by using bounding box regressors to draw the bounding boxes. Ross Girschick, who was also the creator of R-CNN, proposed a model called fast R-CNN to improve detection using a single-stage training method. The following figure shows the architecture of fast R-CNN:
The steps used in fast R-CNN are as follows:
- The fast R-CNN network processes the whole image with several convolution and max pooling layers to produce a feature map.
- Feature maps are fed into a selective search to generate region proposals.
- For each region...