Nearest-neighbor methods
A nearest-neighbor method gives a parameter that represents the threshold of the distance between strings; any string pairs that have a distance value closer to the specified one will be grouped together, as illustrated in the following screenshot:
Let's see some of those methods. For k-Nearest Neighbor (kNN) methods, Optimus implements Levenshtein distance. Let's see how this works.
Levenshtein distance
The Levenshtein distance between two words is calculated as the minimum number of single-character changes that need to be done to a word to convert it into another.
In this example, let's look at the necessary steps to transform a string, "AABBCC"
, to "ABZ"
.
Let's refer to "AABBCC"
as String1
and "ABZ"
as String2
. We'll proceed as follows:
- First, delete
"A"
fromString1
("...