A basic tenet of science is measurement, and the science of machine learning is not an exception. We need to be able to measure, or evaluate, how well our models are performing, so we can continue to improve on them, compare one model to another, and detect when our models are behaving poorly.
There's only one problem. How do we evaluate how our models are performing? Should we measure how fast they can be trained or make inferences? Should we measure how many times they get the right answer? How do we know what the right answer is? Should we measure how far we deviated from the observed values? How do we measure that distance?
As you can see, there are a lot of decisions to make around how we evaluate our models. What really matters is the context. In some cases, efficiency definitely matters, but every machine learning context requires us to measure how our predictions...