Using the Google Cloud GPU through AI Platform
Having worked through the previous section for utilizing Cloud TPU with AI Platform, we are ready to do the same with the GPU. As it turns out, the formats of training script and invocation commands are very similar. With the exception of a few more parameters and slight differences in the distributed strategy definition, everything else remains the same.
There are several distributed strategies (https://www.tensorflow.org/guide/distributed_training#types_of_strategies) currently available. For a TensorFlow Enterprise distribution in Google AI Platform, MirroredStrategy
and TPUStrategy
are the only two that are fully supported. All the others are experimental. Therefore, in this section's example, we will use MirroredStrategy
. This strategy creates copies of all the variables in the model on each GPU. As these variables are updated at each gradient decent step, the value updates are copied to each GPU synchronously. By default...