Chapter 17. Distributed Models with TensorFlow Clusters
Previously we learned how to run TensorFlow models at scale in production using Kubernetes, Docker and TensorFlow serving. TensorFlow serving is not the only way to run TensorFlow models at scale. TensorFlow provides another mechanism to not only run but also train the models on different nodes and different devices on multiple nodes or the same node. In this chapter, we shall learn how to distribute the TensorFlow models to run on multiple devices across multiple nodes.
In this chapter, we shall cover the following topics:
- Strategies for distributed execution
- TensorFlow clusters
- Data parallel modelsÂ
- Asynchronous and synchronous updates to distributed models