As stated, to measure how fast a model performs, we want to compute the time it takes to process a single image. However, to minimize measuring error, we will actually measure the processing time for several images. We will then divide the time obtained by the number of images.
We are not measuring the computing time over a single image for several reasons. First, we want to remove measurement error. When running the inference for the first time, the machine could be busy, the GPU might not be initialized yet, or many other technicalities could be causing the slowdown. Running several times allows us to minimize this error.
The second reason is TensorFlow's and CUDA's warmup. When running an operation for the first time, deep learning frameworks are usually slower—they have to initialize variables, allocate memory, move data around, and so on. Moreover, when running repetitive operations, they usually automatically optimize for it.
For all those reasons...