Model Selection by Multiple Disagreeing Metrics
What happens if the metrics do not agree on the ranking of our models? In the last chapter, on classification, we learned about the precision and recall metrics, which we "merged" into the F1 score, because it is easier to compare models on one metric than two. But what if we did not want to (or couldn't) merge two or more metrics into one (possibly arbitrary) metric?
Pareto Dominance
If a model is better than another model on one metrics, and at least as good on all other metrics, this model should be considered better overall. We say that the model dominates the other model.
If we remove all the models that are dominated by other models, we will have the nondominated models left. This set of models is referred to as the Pareto set (or the Pareto front). We will see in a moment why Pareto front is a fitting name.
Let's say that our Pareto set consists of two models. One has high precision, but low recall. The other...