Choosing activation functions for multilayer neural networks
For simplicity, we have only discussed the sigmoid activation function in the context of multilayer feedforward NNs so far; we used it in the hidden layer as well as the output layer in the MLP implementation in Chapter 12, Implementing a Multilayer Artificial Neural Network from Scratch.
Note that in this book, the sigmoidal logistic function, , is referred to as the sigmoid function for brevity, which is common in machine learning literature. In the following subsections, you will learn more about alternative nonlinear functions that are useful for implementing multilayer NNs.
Technically, we can use any function as an activation function in multilayer NNs as long as it is differentiable. We can even use linear activation functions, such as in Adaline (Chapter 2, Training Simple Machine Learning Algorithms for Classification). However, in practice, it would not be very useful to use linear activation functions for...