The gradient tape allows easy backpropagation in eager mode. To illustrate this, we will use a simple example. Let's assume that we want to solve the equation A × X = B, where A and B are constants. We want to find the value of X to solve the equation. To do so, we will try to minimize a simple loss, abs(A × X - B).
In code, this translates to the following:
A, B = tf.constant(3.0), tf.constant(6.0)
X = tf.Variable(20.0) # In practice, we would start with a random value
loss = tf.math.abs(A * X - B)
Now, to update the value of X, we would like to compute the gradient of the loss with respect to X. However, when printing the content of the loss, we obtain the following:
<tf.Tensor: id=18525, shape=(), dtype=float32, numpy=54.0>
In eager mode, TensorFlow computed the result of the operation instead of storing the operation! With no information on the operation and its inputs, it would be impossible to automatically differentiate...