When working with non-linear problems, it's useful to transform the original vectors by projecting them into a (often higher-dimensional) space where they can be linearly separated. Let's suppose we consider a mapping function from the input sample space X to another one, V:
We saw a similar approach when we discussed polynomial regression. SVMs also adopt the same strategy, assuming that when the samples have been projected onto V they can be separated easily. However, now there's a complexity problem that we need to overcome. The mathematical formulation, in fact, becomes as follows:
Every feature vector is now filtered by a non-linear function that can completely reshape the scenario. However, the introduction of such a function generally increases the computational complexity in a way that may discourage you from using this approach...