Picking the right activation functions
So far, we have used the ReLU and sigmoid activation functions in our implementations. You may wonder how to pick the right activation function for your neural networks. Detailed advice on when to choose a particular activation function is given next:
- Linear: f(z) = z. You can interpret this as no activation function. We usually use it in the output layer in regression networks as we don’t need any transformation to the outputs.
- Sigmoid (logistic) transforms the output of a layer to a range between 0 and 1. You can interpret it as the probability of an output prediction. Therefore, we usually use it in the output layer in binary classification networks. Besides that, we sometimes use it in hidden layers. However, it should be noted that the sigmoid function is monotonic but its derivative is not. Hence, the neural network may get stuck at a suboptimal solution.
- Softmax: As was mentioned in Chapter 4, Predicting Online...