Understanding query-by-committee approaches
Query-by-committee aims to add diversity by querying points where an ensemble of models disagrees the most. Different models will disagree where the data is most uncertain or ambiguous.
In the query-by-committee approach, a group of models is trained using a labeled set of data. By doing so, the ensemble can work together and provide a more robust and accurate prediction.
One interesting aspect of this approach is that it identifies the data point that causes the most disagreement among the ensemble members. This data point is then chosen to be queried to obtain a label.
The reason why this method works well is because different models tend to have the most disagreement on difficult and boundary examples, as depicted in Figure 2.2. These are the instances where there is ambiguity or uncertainty, and by focusing on these points of maximal disagreement, the ensemble can gain consensus and make more confident predictions: