A typical semi-supervised scenario is not very different from a supervised one. Let's suppose we have a data generating process, pdata:
However, contrary to a supervised approach, we have only a limited number N of samples drawn from pdata and provided with a label, as follows:
Instead, we have a larger amount (M) of unlabeled samples drawn from the marginal distribution p(x):
In general, there are no restrictions on the values of N and M; however, a semi-supervised problem arises when the number of unlabeled samples is much higher than the number of complete samples. If we can draw N >> M labeled samples from pdata, it's probably useless to keep on working with semi-supervised approaches and preferring classical supervised methods is likely to be the best choice. The extra complexity we need is justified by M >> N, which is a...