Events are objects that exist on the GPU, whose purpose is to act as milestones or progress markers for a stream of operations. Events are generally used to provide measure time duration on the device side to precisely time operations; the measurements we have been doing so far have been with host-based Python profilers and standard Python library functions such as time. Additionally, events they can also be used to provide a status update for the host as to the state of a stream and what operations it has already completed, as well as for explicit stream-based synchronization.
Let's start with an example that uses no explicit streams and uses events to measure only one single kernel launch. (If we don't explicitly use streams in our code, CUDA actually invisibly defines a default stream that all operations will be placed into).
Here, we will use the same useless...