As mentioned, both the MP neuron and perceptron models are unable to deal with nonlinear problems. To combat this issue, modern-day perceptrons use an activation function that introduces nonlinearity to the output.
The perceptrons (neurons, but we will mostly refer to them as nodes going forward) we will use are of the following form:
Here, y is the output, φ is a nonlinear activation function, xi is the inputs to the unit, wi is the weights, and b is the bias. This improved version of the perceptron looks as follows:
In the preceding diagram, the activation function is generally the sigmoid function:
What the sigmoid activation function does is squash all the output values into the (0, 1) range. The sigmoid activation function is largely used for historical purposes since the developers of the earlier neurons focused on thresholding. When gradient-based learning...