We will now learn about atomic operations in CUDA. Atomic operations are very simple, thread-safe operations that output to a single global array element or shared memory variable, which would normally lead to race conditions otherwise.
Let's think of one example. Suppose that we have a kernel, and we set a local variable called x across all threads at some point. We then want to find the maximum value over all xs, and then set this value to the shared variable we declare with __shared__ int x_largest. We can do this by just calling atomicMax(&x_largest, x) over every thread.
Let's look at a brief example of atomic operations. We will write a small program for two experiments:
- Setting a variable to 0 and then adding 1 to this for each thread
- Finding the maximum thread ID value across all threads
Let's start out by setting the...