Summary
After reading this chapter, you should be able to explore and find the real bottleneck in single-node training. You should also know how data parallelism mitigates this bottleneck in single-node training, thus increasing the overall throughput. Finally, you should know about the several main hyperparameters related to data parallel training.
In the next chapter, we will focus on two major system architectures for data parallel training, namely the parameter server (PS) and All-Reduce paradigms.