If you would like to run TensorFlow on multiple GPUs, you can construct your model assigning a specific chunk of code to a GPU. For example, having two GPUs, we can split the previous code in this way, assigning the first matrix computation to the first GPU as follows:
with tf.device('/gpu:0'):
a = tf.placeholder(tf.float32, [10000, 10000])
c1.append(matpow(a, n))
The second matrix computation to the second GPU as follows:
with tf.device('/gpu:1'):
b = tf.placeholder(tf.float32, [10000, 10000])
c1.append(matpow(b, n))
Finally, your CPU will manage the results; also note that we used the shared c1 array to collect them:
with tf.device('/cpu:0'):
sum = tf.add_n(c1)
print(sum)