An introduction to semi-supervised learning
Semi-supervised learning is a class of supervised learning that takes unlabeled data into consideration. If we have a very large amount of data, we most likely want to apply learning to it. However, training that particular data with supervised learning is a problem, because a supervised learning algorithm always requires a target variable: a class that can be assigned to the dataset.
Suppose that we have millions of instances of a particular type of data. Assigning a class to these instances would be a very big problem. Therefore, we'll take a small set from that particular data and manually tag the data (meaning that we'll manually provide a class for the data). Once we have done this, we'll train our model with it, so that we can work with the unlabeled data (because we now have a small set of labeled data, which we created). Typically, a small amount of labeled data is used with a large amount of unlabeled data. Semi-supervised learning falls...