Multilayer perceptrons
The main limitation of a perceptron is its linearity. How is it possible to exploit this kind of architecture by removing such a constraint? The solution is easier than any speculation. Adding at least a non-linear layer between input and output leads to a highly non-linear combination, parametrized with a larger number of variables. The resulting architecture, called Multilayer Perceptron (MLP) and containing a single (only for simplicity) Hidden Layer, is shown in the following diagram:
This is a so-called feed-forward network, meaning that the flow of information begins in the first layer, proceeds always in the same direction and ends at the output layer. Architectures that allow a partial feedback (for example, in order to implement a local memory) are called recurrentnetworks and will be analyzed in the next chapter.
In this case, there are two weight matrices, W and H, and two corresponding bias vectors, b and c. If there are m hidden neurons, xi ∈ ℜn × 1 (column...