Weak supervision
Weak supervision is a labeling technique in machine learning that leverages imperfect or noisy sources of supervision to assign labels to data instances. Unlike traditional labeling methods that rely on manually annotated data, weak supervision allows for a more scalable and automated approach to labeling. It refers to the use of heuristics, rules, or probabilistic methods to generate approximate labels for data instances.
Rather than relying on a single authoritative source of supervision, weak supervision harnesses multiple sources that may introduce noise or inconsistency. The objective is to generate labels that are “weakly” indicative of the true underlying labels, enabling model training in scenarios where obtaining fully labeled data is challenging or expensive.
For instance, consider a task where we want to build a machine learning model to identify whether an email is spam or not. Ideally, we would have a large dataset of emails that are...