Challenges with labeling data at scale
Besides the conceptual challenges with agreeing on how to label data, we need to consider the logistics. SageMaker Ground Truth lets you assign data labeling jobs to a human workforce. But you may face additional challenges such as the following:
- Unique labeling logic: If our labeling case requires a custom workflow, we need to model that in Ground Truth.
- Annotation quality: The labels applied by workers may be subject to implicit bias that affects the results.
- Cost and time: Labeling data requires people for a period of time. If you have a very large dataset, you'll consume a lot of person-hours.
- Security: Given that your data may be sensitive, you need to make sure that access to the data is restricted to an authorized workforce.
Additional information
If you need an introduction to Ground Truth, please review Chapter 2 of Learn Amazon SageMaker, written by Julien Simon.
To put these concerns into focus, let&apos...