Let's assume we can model a value as a function
of
plus some noise:
data:image/s3,"s3://crabby-images/517f5/517f54a1828e2a21e0d87da35d5d97cc7f07b15d" alt=""
Here
This is similar to the assumption that we made in Chapter 3, Modeling with Linear Regression, for linear regression models. The main difference is that now we will put a prior distribution over . Gaussian processes can work as such a prior, thus we can write:
data:image/s3,"s3://crabby-images/28568/285688c553cfddf9ef393e53e654c426dd1b0bf8" alt=""
Here, represents a Gaussian process distribution, with
being the mean function and
the kernel, or covariance, function. Here, we have used the word function to indicate that, mathematically, the mean and covariance are infinite objects, even when, in practice, we always work with finite objects.
If the prior distribution is a GP and the likelihood is a normal distribution, then the posterior is also a GP and we can compute it analytically:
Here:
is the observed data point and
represents the test points...