Training a model on a single machine
As described in Chapter 3, Developing a Powerful Deep Learning Model, training a DL model involves extracting meaningful patterns from a dataset. When the size of the dataset is small and the model has few parameters to tune, a central processing unit (CPU) might be sufficient to train the model. However, DL models have shown greater performance when they are trained with a larger training set and consist of a greater number of neurons. Therefore, training using a graphics processing unit (GPU) has become the standard since you can exploit its massive parallelism in matrix multiplication.
Utilizing multiple devices for training in TensorFlow
TF provides the tf.distribute.Strategy
module, which allows you to use multiple GPU or CPU devices for training with very simple code modifications (https://www.tensorflow.org/guide/distributed_training). tf.distribute.Strategy
is fully compatible with tf.keras.Model.fit
, as well as custom training loops...