Each TensorFlow computation is described in terms of a graph. This allows a natural degree of flexibility in the structure and the placement of operations that can be split across distributed nodes of computation. The graph can be split into multiple subgraphs that are assigned to different nodes in a cluster of servers.
I strongly suggest the reader have a look to the paper Large Scale Distributed Deep Networks Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. NIPS, 2012, https://research.google.com/archive/large_deep_networks_nips2012.html
One key result of the paper is to prove that it is possible to run distributed stochastic gradient descent (SDG) where multiple nodes are working in parallel on data-shards and update independently and...