On-device memory bottlenecks
Nowadays, CPU memory is often tens or hundreds of gigabytes in size. Compared to this huge host of memory, the GPU memory size is often quite limited. The following table shows the commonly used GPU memory sizes:
As shown in the preceding table, even with state-of-the-art GPUs such as the A100, the memory size is only 40 GB. More popular GPU choices, such as the NVIDIA 2080 or K80, only have a GPU memory size of around 10 GB.
When conducting DNN training, those generated intermediate results (for example, feature maps) are often orders of magnitude bigger than the original input data. Thus, it makes the GPU memory limitation more pronounced.
There are mainly two ways to reduce the memory footprint on the accelerators: recomputation and quantization. Let's take a look.