Let's assume we can model a value as a function of plus some noise:
Here
This is similar to the assumption that we made in Chapter 3, Modeling with Linear Regression, for linear regression models. The main difference is that now we will put a prior distribution over . Gaussian processes can work as such a prior, thus we can write:
Here, represents a Gaussian process distribution, with being the mean function and the kernel, or covariance, function. Here, we have used the word function to indicate that, mathematically, the mean and covariance are infinite objects, even when, in practice, we always work with finite objects.
If the prior distribution is a GP and the likelihood is a normal distribution, then the posterior is also a GP and we can compute it analytically:
Here:
is the observed data point and represents the test points...