Activation functions
Activation functions are mathematical functions that are generally applied to the outputs of ANN layers to limit or bound the values of the layer. The reason that values may want to be bounded is that without activation functions, the value and corresponding gradients can either explode or vanish, thereby making the results unusable. This is because the final value is the cumulative product of the values from each subsequent layer. As the number of layers increases, the likelihood of values and gradients exploding to infinity or vanishing to zero increases. This concept is known as the exploding and vanishing gradient problem. Deciding whether a node in a layer should be activated is another use of activation functions, hence their name. Common activation functions and their visual representation in Figure 1.36 are as follows:
- Step function: The value is non-zero if it is above a certain threshold, otherwise it is zero. This is shown in Figure 1.36a...