As briefly mentioned when introducing the architecture, GoogLeNet has two auxiliary branches at training time (removed after), also leading to predictions.
Their purpose is to improve the propagation of the loss through the network during training. Indeed, deeper CNNs are often plagued with vanishing gradient. Many CNN operations (for instance, sigmoid) have derivatives with small amplitudes (below one). Therefore, the higher the number of layers, the smaller the product of the derivatives becomes when backpropagating (as more values below one are multiplied together, the closer to zero the result will become). Often, the gradient simply vanishes/shrinks to zero when reaching the first layers. Since the gradient values are directly used to update the parameters, these layers won't effectively learn if the gradient is too small.