If we expand equation 5.6, we get the following:
Both terms in this expression look very similar. The first one, the lppd (log point-wise predictive density), is computing the mean likelihood over the posterior samples. We do this for each data point and then we take the logarithm and sum up over all data points. Please compare this term with equations 5.3 and 5.4. This is just what we call deviance, but computed, taking into account the posterior. Thus, if we accept that computing the log-likelihood is a good way to measure the appropriateness of the fit of a model, then computing it from the posterior is a logic path for a Bayesian approach. As we already said, the lddp of observed data is an overestimate of the lppd for future data. Thus, we introduce a second term to correct the overestimation. The second term computes the variance of the log-likelihood over...