Gaussian processes
We just saw a brief introduction on how to use kernels to build statistical models to describe arbitrary functions. Maybe the kernelized regression sounds a little bit like ad hoc trickery and the idea of having to somehow specify the number and distribution of a set of knots is a little problematic. Now we are going to see an alternative way to use kernels by doing inference directly in the function space. This alternative is mathematically and computationally more appealing and is based on using Gaussian processes.
Before introducing Gaussian processes let's think about what a function is? We may think of a function as mapping from a set of inputs to a set of outputs. One way to learn this mapping is by restricting it to a line, as we did in Chapter 4, Understanding and Predicting Data with Linear Regression Models, and then to use the Bayesian machinery to infer the plausible values of the parameters controlling that line. But suppose we do not want to restrict our model...