You're reading from Intelligent Projects Using Python 9 real-world AI projects leveraging machine learning and deep learning with TensorFlow and Keras

Product type Paperback

Published in Jan 2019

Publisher Packt

ISBN-13 9781788996921

Length 342 pages

Edition 1st Edition

Languages

Python

Tools

Keras

Concepts

Artificial Intelligence

Author (1):

Santanu Pattanayak

View More author details

Neural activation units

Several kinds of neural activation units are used in neural networks, depending on the architecture and the problem at hand. We will discuss the most commonly used activation functions, as these play an important role in determining the network architecture and performance. Linear and sigmoid unit activation functions were primarily used in artificial neural networks until rectified linear units (ReLUs), invented by Hinton et al., revolutionized the performance of neural networks.

Linear activation units

A linear activation unit outputs the total input to the neuron that is attenuated, as shown in the following graph:

Figure 1.5: Linear neuron

If x is the total input to the linear activation unit, then the output, y, can be represented as follows:

Sigmoid activation units

The output of the sigmoid activation unit, y, as a function of its total input, x, is expressed as follows:

Since the sigmoid activation unit response is a nonlinear function, as shown in the following graph, it is used to introduce nonlinearity in the neural network:

Figure 1.6: Sigmoid activation function

Any complex process in nature is generally nonlinear in its input-output relation, and hence, we need nonlinear activation functions to model them through neural networks. The output probability of a neural network for a two-class classification is generally given by the output of a sigmoid neural unit, since it outputs values from zero to one. The output probability can be represented as follows:

Here, x represents the total input to the sigmoid unit in the output layer.

The hyperbolic tangent activation function

The output, y, of a hyperbolic tangent activation function (tanh) as a function of its total input, x, is given as follows:

The tanh activation function outputs values in the range [-1, 1], as you can see in the following graph:

Figure 1.7: Tanh activation function

One thing to note is that both the sigmoid and the tanh activation functions are linear within a small range of the input, beyond which the output saturates. In the saturation zone, the gradients of the activation functions (with respect to the input) are very small or close to zero; this means that they are very prone to the vanishing gradient problem. As you will see later on, neural networks learn from the backpropagation method, where the gradient of a layer is dependent on the gradients of the activation units in the succeeding layers, up to the final output layer. Therefore, if the units in the activation units are working in the saturation region, much less of the error is backpropagated to the early layers of the neural network. Neural networks minimize the prediction error in order to learn the weights and biases (W) by utilizing the gradients. This means that, if the gradients are small or vanish to zero, then the neural network will fail to learn these weights properly.

Rectified linear unit (ReLU)

The output of a ReLU is linear when the total input to the neuron is greater than zero, and the output is zero when the total input to the neuron is negative. This simple activation function provides nonlinearity to a neural network, and, at the same time, it provides a constant gradient of one with respect to the total input. This constant gradient helps to keep the neural network from developing saturating or vanishing gradient problems, as seen in activation functions, such as sigmoid and tanh activation units. The ReLU function output (as shown in Figure 1.8) can be expressed as follows:

The ReLU activation function can be plotted as follows:

Figure 1.8: ReLU activation function

One of the constraints for ReLU is its zero gradients for negative values of input. This may slow down the training, especially at the initial phase. Leaky ReLU activation functions (as shown in Figure 1.9) can be useful in this scenario, where the output and gradients are nonzero, even for negative values of the input. A leaky ReLU output function can be expressed as follows:

The parameter is to be provided for leaky ReLU activation functions, whereas for a parametric ReLU, is a parameter that the neural network will learn through training. The following graph shows the output of the leaky ReLU activation function:

Figure 1.9: Leaky ReLU activation function

The softmax activation unit

The softmax activation unit is generally used to output the class probabilities, in the case of a multi-class classification problem. Suppose that we are dealing with an n class classification problem, and the total input corresponding to the classes is given by the following:

In this case, the output probability of the k^th class of the softmax activation unit is given by the following formula:

There are several other activation functions, mostly variations of these basic versions. We will discuss them as we encounter them in the different projects that we will cover in the following chapters.