The following implementation increases the network complexity by adding four layers before the softmax layer. To determine the appropriate size of the network, that is, the number of hidden layers and the number of neurons per layer, generally we rely on general empirical criteria, the personal experience, or appropriate tests.
The following table summarizes the implemented network architecture, it shows the number of neurons per layer and the respective activation functions:
Layer | Number of neurons | Activation function |
First | L = 200 | sigmoid |
Second | M = 100 | sigmoid |
Third | N = 60 | sigmoid |
Fourth | O = 30 | sigmoid |
Fifth | 10 | softmax |
The transfer function for the first four layers is the sigmoid function; the last layer of the transfer function is always the softmax since the output of the network must express a probability for the input digit. In general, the number...