ANNs work on the concept of deep learning. They represent the human nervous system in how the nervous system consists of a number of neurons that communicate with each other using axons.
Artificial Neural Networks
Biological neurons
The working principles of ANNs are inspired by how a human brain works, depicted in Figure 7. The receptors receive the stimuli either internally or from the external world; then they pass the information into the biological neurons for further processing. There are a number of dendrites, in addition to another long extension called the axon.
Towards its extremity, there are minuscule structures called synaptic terminals, used to connect one neuron to the dendrites of other neurons. Biological neurons receive short electrical impulses called signals from other neurons, and in response, they trigger their own signals:
We can thus summarize that the neuron comprises a cell body (also known as the soma), one or more dendrites for receiving signals from other neurons, and an axon for carrying out the signals generated by the neurons.
A neuron is in an active state when it is sending signals to other neurons. However, when it is receiving signals from other neurons, it is in an inactive state. In an idle state, a neuron accumulates all the signals received before reaching a certain activation threshold. This whole thing motivated researchers to introduce an ANN.
A brief history of ANNs
Inspired by the working principles of biological neurons, Warren McCulloch and Walter Pitts proposed the first artificial neuron model in 1943 in terms of a computational model of nervous activity. This simple model of a biological neuron, also known as an artificial neuron (AN), has one or more binary (on/off) inputs and one output only.
An AN simply activates its output when more than a certain number of its inputs are active. For example, here we see a few ANNs that perform various logical operations. In this example, we assume that a neuron is activated only when at least two of its inputs are active:
The example sounds too trivial, but even with such a simplified model, it is possible to build a network of ANs. Nevertheless, these networks can be combined to compute complex logical expressions too. This simplified model inspired John von Neumann, Marvin Minsky, Frank Rosenblatt, and many others to come up with another model called a perceptron back in 1957.
The perceptron is one of the simplest ANN architectures we've seen in the last 60 years. It is based on a slightly different AN called a Linear Threshold Unit (LTU). The only difference is that the inputs and outputs are now numbers instead of binary on/off values. Each input connection is associated with a weight. The LTU computes a weighted sum of its inputs, then applies a step function (which resembles the action of an activation function) to that sum, and outputs the result:
One of the downsides of a perceptron is that its decision boundary is linear. Therefore, they are incapable of learning complex patterns. They are also incapable of solving some simple problems like Exclusive OR (XOR). However, later on, the limitations of perceptrons were somewhat eliminated by stacking multiple perceptrons, called MLP.
How does an ANN learn?
Based on the concept of biological neurons, the term and the idea of ANs arose. Similarly to biological neurons, the artificial neuron consists of the following:
- One or more incoming connections that aggregate signals from neurons
- One or more output connections for carrying the signal to the other neurons
- An activation function, which determines the numerical value of the output signal
The learning process of a neural network is configured as an iterative process of optimization of the weights (see more in the next section). The weights are updated in each epoch. Once the training starts, the aim is to generate predictions by minimizing the loss function. The performance of the network is then evaluated on the test set.
Now we know the simple concept of an artificial neuron. However, generating only some artificial signals is not enough to learn a complex task. Albeit, a commonly used supervised learning algorithm is the backpropagation algorithm, which is very commonly used to train a complex ANN.