TensorFlow clusters
A TensorFlow (TF)Â cluster is one mechanism that implements the distributed strategies that we have just discussed. At the logical level, a TF cluster runs one or more jobs, and each job consists of one or more tasks. Thus a job is just a logical grouping of the tasks. At the process level, each task runs as a TF server. At the machine level, each physical machine or node can run more than one task by running more than one server, one server per task. The client creates the graph on different servers and starts the execution of the graph on one server by calling the remote session.
As an example, the following diagram depicts two clients connected to two jobs named m1
:
The two nodes are running three tasks each, and the job w1
is spread across two nodes while the other jobs are contained within the nodes.
Note
A TF server is implemented as two processes: master and worker. The master coordinates the computation with other tasks and the worker is the one actually running the...