Label propagation is a family of semi-supervised algorithms based on a graph representation of the dataset. In particular, if we have N labeled points (with bipolar labels +1 and -1) and M unlabeled points (denoted by y=0), it's possible to build an undirected graph based on a measure of geometric affinity among samples. If G = {V, E} is the formal definition of the graph, the set of vertices is made up of sample labels V = { -1, +1, 0 }, while the edge set is based on an affinity matrix W (often called adjacency matrix when the graph is unweighted), which depends only on the X values, not on the labels.
In the following graph, there's an example of such a structure:
In the preceding example graph, there are four labeled points (two with y=+1Â and two with y=-1), and two unlabeled points (y=0). The affinity matrix is normally...