The kernel trick
To learn how the kernel trick allows us to do feature construction implicitly and efficiently, we will first have to learn what a kernel is.
What is a kernel?
The simplest way to think about a kernel is to consider it as a mapping that takes two vectors as input and returns a scalar. It is a mapping that maps . This means that a kernel is a function
, with the input vectors being
and
The value of
is a real number. This means that the inner product
is an example of a kernel function.
That is a high-level mathematical definition of what a kernel is, but what is the intuition behind this? An kernel function applied to the vectors
and
is typically used to measure the similarity between those vectors. Consequently, we usually want our kernel function to have its largest values when
and
are most similar and its lowest values when
and
are least similar. We want the function to decrease smoothly and monotonically in between those two scenarios.