The Gaussian process (GP) can be thought of as an alternative Bayesian approach to regression problems. They are also referred to as infinite dimensional Gaussian distributions. GP defines a priori over functions that can be converted into a posteriori once we have observed a few data points. Although it doesn’t seem possible to define distributions over functions, it turns out that we only need to define distributions over a function's values at observed data points.
Formally, let's say that we observed a function, , at n values
as
. The function is a GP if all of the values,
, are jointly Gaussian, with a mean of
and a covariance of
given by
. Here, the
function defines how two variables are related to each other. We will discuss different kinds of kernels later in this section...