Summary
In this chapter, we mainly discussed the two most popular data parallel training paradigms: parameter server and All-Reduce. You should now know about the Parameter Server architecture, its implementation, and its shortcomings. You should also understand the All-Reduce architecture and its broader family of collective communications.
In the next chapter, we will focus on implementing the whole model training and serving pipeline using data parallelism.