There are several techniques used to quantify the similarity between nodes. They can be divided into two categories:
- Set-based measures: Compare the content of two sets globally. For instance, sets (A, B, C) and (C, D, B) have two common elements.
- Vector-based measures: Compare vectors element-wise, meaning that the position of each element is important. Euclidean distance is an example of such measures.
Let's go into more detail about these metrics, starting from the set-based similarities.
Set-based similarities
The GDS 1.0 implements two variants of set-based similarities we'll cover here.
Overlapping
The overlapping similarity is a measure of the number of common elements between two sets, relative to the size of the smallest set.
Definition
This measure's mathematical definition is as follows:
O(A, B) = | A ∩ B | / min(|A|, |B|)
A ∩ B is the intersection between sets A and B (common elements) and |A| denotes the...