Bayesian treatment of neural networks
To set the neural network learning in a Bayesian context, consider the error function for the regression case. It can be treated as a Gaussian noise term for observing the given dataset conditioned on the weights w. This is precisely the likelihood function that can be written as follows:
data:image/s3,"s3://crabby-images/b96af/b96af2b35ddf58ab174717dcf1e49c4cae82e42f" alt=""
Here, is the variance of the noise term given by
and
represents a probabilistic model. The regularization term can be considered as the log of the prior probability distribution over the parameters:
data:image/s3,"s3://crabby-images/839a0/839a040b6c2c7d25e89bf5a75788576c3006cd3a" alt=""
Here, is the variance of the prior distribution of weights. It can be easily shown using Bayes' theorem that the objective function M(w) then corresponds to the posterior distribution of parameters w:
data:image/s3,"s3://crabby-images/25e58/25e58c401541f4fcd2d33ce9693661c9903f8d5b" alt=""
In the neural network case, we are interested in the local maxima of . The posterior is then approximated as a Gaussian around each maxima
, as follows:
data:image/s3,"s3://crabby-images/dfcc4/dfcc456e8816daaee37d20ad1fa557b778341706" alt=""
Here, A is a matrix of the second derivative of M(w) with respect to w and represents an inverse of the covariance matrix...