In a supervised learning problem, there will always be a dataset, defined as a finite set of real vectors with m features each:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/4d73c607-b719-4e8a-a44b-570d10d2a9cb.png)
Considering that our approach is always probabilistic, we need to consider each X as drawn from a statistical multivariate distribution D. For our purposes, it's also useful to add a very important condition upon the whole dataset X: we expect all samples to be independent and identically distributed (i.i.d). This means all variables belong to the same distribution D, and considering an arbitrary subset of m values, it happens that:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/ecffdb18-364e-4e7b-8359-d76fe73241ab.png)
The corresponding output values can be both numerical-continuous or categorical. In the first case, the process is called regression, while in the second, it is called classification. Examples of numerical outputs are:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/1e909b14-98cd-4dc8-8448-db18cd908a32.png)
Categorical examples are:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/2de12c7c-68d5-476b-a38e-e6d92f42efb5.png)
We define generic regressor, a vector...