Bayesian treatment of neural networks
To set the neural network learning in a Bayesian context, consider the error function for the regression case. It can be treated as a Gaussian noise term for observing the given dataset conditioned on the weights w. This is precisely the likelihood function that can be written as follows:
Here, is the variance of the noise term given by and represents a probabilistic model. The regularization term can be considered as the log of the prior probability distribution over the parameters:
Here, is the variance of the prior distribution of weights. It can be easily shown using Bayes' theorem that the objective function M(w) then corresponds to the posterior distribution of parameters w:
In the neural network case, we are interested in the local maxima of . The posterior is then approximated as a Gaussian around each maxima , as follows:
Here, A is a matrix of the second derivative of M(w) with respect to w and represents an inverse of the covariance matrix...