Computational Graph of the Softmax-with-Loss Layer
The following figure is the computational graph of the Softmax-with-Loss layer and obtains backward propagation. We will call the softmax function the Softmax layer, the cross-entropy error the Cross-Entropy Error layer, and the layer where these two are combined the Softmax-with-Loss layer. You can represent the Softmax-with-Loss layer with the computational graph provided in Figure A.1: Entropy:
Figure A.1: Computational graph of the Softmax-with-Loss layer
The computational graph shown in Figure A.1 assumes that there is a neural network that classifies three classes. The input from the previous layer is (a1, a2, a3), and the Softmax layer outputs (y1, y2, y3). The label is (t1, t2, t3) and the Cross-Entropy Error layer outputs the loss, L.
This appendix shows that the result of backward propagation of the Softmax-with-Loss layer will be (y1 − t1, y2 − t2, y3 − t3), as shown in Figure...