What if the data cannot be optimally partitioned using a linear decision boundary? In such a case, we say the data is not linearly separable.
The basic idea to deal with data that is not linearly separable is to create nonlinear combinations of the original features. This is the same as saying we want to project our data to a higher-dimensional space (for example, from 2D to 3D), in which the data suddenly becomes linearly separable.
This concept is illustrated in the following diagram:
The preceding diagram shows how to find linear hyperplanes in higher-dimensional spaces. If data in its original input space (left) cannot be linearly separated, we can apply a mapping function Ï•(.) that projects the data from 2D into a 3D (or a high-dimensional) space. In this higher-dimensional space, we may find that there is now a linear decision...