Parallelizing TensorFlow
To extend our reach for parallelizing TensorFlow, we can also perform separate operations of our graph on entirely different machines in a distributed manner. This recipe will show us how that is achieved.
Getting ready
A few months after the release of TensorFlow, Google released TensorFlow Distributed. This was a big upgrade to the TensorFlow ecosystem, allowing a TensorFlow cluster to be set up (separate worker machines), to share the computational task of training and evaluating models. Using TensorFlow Distributed is as easy as setting up some parameters for workers and then assigning different jobs to different workers.
In this recipe, we will set up two local workers and assign them different jobs.
How to do it…
To start, we load TensorFlow and define our two local workers with a configuration dictionary file (ports
2222
and2223
):import tensorflow as tf # Cluster for 2 local workers (tasks 0 and 1): cluster = tf.train.ClusterSpec({'local': ['localhost:2222', ...