The last architectural change improved the accuracy of our model, but we can do even better by changing the sigmoid activation function with the Rectified Linear Unit, shown as follows:
A Rectified Linear Unit (ReLU) unit computes the function f(x) = max(0, x), ReLU is computationally fast because it does not require any exponential computation, such as those required in sigmoid or tanh activations, furthermore it was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions.
To use the ReLU function, we simply change, in the previously implemented model, the following definitions of the first four layers, in the previously implemented model.
First layer output:
Y1 = tf.nn.relu(tf.matmul(XX, W1) + B1)
Second layer output:
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + B2)
Third layer output:
Y3 = tf.nn.relu(tf.matmul(Y2, W3) + B3) ...