The Gaussian process (GP) can be thought of as an alternative Bayesian approach to regression problems. They are also referred to as infinite dimensional Gaussian distributions. GP defines a priori over functions that can be converted into a posteriori once we have observed a few data points. Although it doesn’t seem possible to define distributions over functions, it turns out that we only need to define distributions over a function's values at observed data points.
Formally, let's say that we observed a function, , at n values  as . The function is a GP if all of the values, , are jointly Gaussian, with a mean of  and a covariance of   given by . Here, the  function defines how two variables are related to each other. We will discuss different kinds of kernels later in this section...