Let's consider a generic linear classification problem with two classes. In the following figure, there's an example:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/0dd53ff2-a2c2-4e03-bc96-71ad5c4837ad.png)
Our goal is to find an optimal hyperplane, which separates the two classes. In multi-class problems, the strategy one-vs-all is normally adopted, so the discussion can be focused only on binary classifications. Suppose we have the following dataset:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/832e6e01-15e0-447a-b591-dc79770c9f1a.png)
This dataset is associated with the following target set:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/246a5f0a-c886-4530-9ce9-10d60f7ba27d.png)
We can now define a weight vector made of m continuous components:
We can also define the quantity z:
If x is a variable, z is the value determined by the hyperplane equation. Therefore, if the set of coefficients w that has been determined is correct, it happens that:
Now we must find a way to optimize w, in order to reduce the classification error. If such a combination exists (with a certain error threshold), we say...