As we saw previously in the book, most models require post-processing operations. If implemented using the wrong tools, post-processing can take a lot of time. While most post-processing happens on the CPU, it is sometimes possible to run some operations on the GPU.
Using tracing tools, we can analyze the time taken by post-processing to optimize it. Non-Maximum Suppression (NMS) is an operation that can take a lot of time if not implemented correctly (refer to Chapter 5, Object Detection Models):
Notice in the preceding diagram that the slow implementation takes linear computing time, while the fast implementation is almost constant. Though four milliseconds may seem quite low, keep in mind that some models can return an even larger number of boxes, resulting in a post-processing time.