In the example of the viral infection from the previous section, your quarantine capacity may be limited to, say, 500 patients. In such a case, you would want as many positive cases to be in the top 500 patients according to their predicted probabilities. In other words, we do not care much about the model's overall precision, since we only care about its precision for the top k samples.
We can calculate the precision for the top k samples using the following code:
def precision_at_k_score(y_true, y_pred_proba, k=1000, pos_label=1):
topk = [
y_true_ == pos_label
for y_true_, y_pred_proba_
in sorted(
zip(y_true, y_pred_proba),
key=lambda y: y[1],
reverse=True
)[:k]
]
return sum(topk) / len(topk)
If you are not a big fan of the functional programming paradigm, then let me explain the code to you in detail. The zip() method combines the two lists and...