Issues with the annotation process
As we have seen so far, annotations are critical to both training and testing. Thus, any mislabeling, biased annotations, or insufficient annotated data will drastically impact the learning and evaluation process of your ML model. As you can expect, the annotation process is time-consuming, expensive, and error-prone, and this is what we will see in this section.
The annotation process is expensive
To train state-of-the-art computer vision or natural language processing (NLP) models, you need large-scale training data. For example, BERT (https://arxiv.org/abs/1810.04805) was trained on BooksCorpos (800 million words) and Wikipedia (2,500 million words). Similarly, ViT (https://arxiv.org/abs/2010.11929) was trained on ImageNet (14 million images) and JFT (303 million images). Annotating such huge datasets is extremely difficult and challenging. Furthermore, it is time-consuming and expensive. It should be noted that the time required to annotate...