State-of-the-art hardware
Due to the huge computation power needed for training giant NLP models, we usually use a state-of-the-art hardware accelerator to do the NLP model training. In the following sections, we will look into some of the best GPUs and hardware links from NVIDIA.
P100, V100, and DGX-1
Tesla P100 GPU and Volta V100 GPU are the best GPUs launched by NVIDIA. Each P100/V100 GPU has the following:
- 5–8 teraflops of double-precision computation power
- 16 GB on-device memory
- 700 GB/s high bandwidth memory I/O
- NVLink-optimized
As per the specification listed in the preceding list, each P100/V100 GPU has a huge amount of computation power. There is an even more powerful machine that includes eight P100/V100 GPUs inside a single box. The eight-P100/V100-GPU box is called DGX-1.
DGX-1 is designed for high-performance computation. When embedding eight P100/V100 GPUs inside a single box, the cross-GPU network bandwidth becomes the main...