Region-specific CNN (R-CNN) was introduced by Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik in a paper titled Rich feature hierarchies for accurate object detection and semantic segmentation. It is a simple and scalable object detection algorithm that improves the mean average precision by more than 30% over the previous best result in VOC2012. The paper can be read at https://arxiv.org/abs/1311.2524
VOC stands for Visual Object Classes (http://host.robots.ox.ac.uk/pascal/VOC) and PASCAL stands for Pattern Analysis Statistical Modeling and Computational Learning. The PASCAL VOC ran challenges from 2005 to 2012 on object-class recognition. The PASCAL VOC annotation is widely used in object detection and it uses .xml format.
The entire object detection model is broken down into image segmentation, selective search-based region proposal, feature...