GPU, a vectorial and parallel architecture
GPUs provide an incredible processing power in certain situations. If you ever tried to program a software rasterizer for your CPU, you would have noticed that the performance was terrible. Even the most advanced software rasterizer, taking advantage of vectorial instruction sets such as SSE3, or making intensive use of all available cores through multithreading, offers very poor performance compared with a GPU. CPUs are simply not meant for pixels.
So, why are GPUs so fast at processing fragments, pixels, and vertices compared to a CPU? The answer is that by the scalar nature of a CPU, it always process one instruction after another. On the other side, GPUs process hundreds of instructions simultaneously. A CPU has few (or only one) big multipurpose cores that can execute one shader's instance at once, but a GPU has dozens or hundreds of small and very specific cores that execute many shaders' instances in parallel.
Another great advantage of GPU over CPU is that all native types are vectorial. Imagine a typical CPU structure for a vector of floats:
struct Vector3 { float x, y, z; };
Now suppose that you want to calculate the cross product of two vectors:
vec3 a; vec3 b = {1, 2, 3}; vec3 c = {1, 1, 1}; // a = cross(b, c); a.x = (b.y * c.z) – (b.z * c.y); a.y = (b.z * c.x) – (b.x * c.z); a.z = (b.x * c.y) – (b.y * c.x);
As you can see, this simple scalar operation in CPU took six multiplications, three subtractions, and three assignments; whereas in a GPU, vectorial types are native. A vec3
type is like a float
or an int
for a CPU. Also native types' operations are native too.
vec3 b = vec3(1, 2, 3); vec3 c = vec3(1, 1, 1); vec3 a = cross(b, c);
And that is all. The cross product operation is done in a single and atomic operation. This is a pretty simple example, but now think in the number of operations of these kinds that are done to process vertices and fragments per second and how a CPU would handle that. The number of multiplications and additions involved in a 4 x 4 matrix multiplication is quite large, while in GPU, it's only a matter of one single operation.
In a GPU, there are many other built-in operations (directly native or based on native operations) for native types: addition, subtraction, dot products, and inner/outer multiplications, geometric, trigonometric, or exponential functions. All these built-in operations are mapped directly (totally or partially) into the graphics hardware and therefore, all of them cost only a small fraction of the CPU equivalents.
All shader computations rely heavily on linear algebra calculations, mostly used to compute things such as light vectors, surface normals, displacement vectors, refractions and diffractions, cube maps, and so on. All these computations and many more are vector-based, so it is easy to see why a GPU has great advantages over a CPU to perform these tasks.
The following are the reasons why GPUs are faster than CPUs for vectorial calculations and graphics computations:
Many shaders can be executed at the same time
Inside a shader, many instructions can be executed in a block