Sampling with EER
EER focuses on measuring the potential decrease in generalization error instead of the expected change in the model, as seen in the previous approach. The goal is to estimate the anticipated future error of a model by training it with the current labeled set and the remaining unlabeled samples. EER can be defined as follows:
Here, is the pool of paired labeled data, , and is the estimated output distribution. L is a chosen loss function that measures the error between the true distribution, , and the learner’s prediction, .
This involves selecting the instance that is expected to have the lowest future error (referred to as risk) for querying. This focuses active ML on reducing long-term generalization errors rather than just immediate training performance.
In other words, EER selects unlabeled data points that, when queried and learned from, are expected to significantly reduce the model’s errors on new data points from the same distribution...