String distance
A string distance measuring algorithm is a calculation of how similar two strings are to one another. The strings smell
and bell
can be defined as similar, as they share three characters. The strings bell
and fell
are even closer, as they share three characters and are only one character apart from one another. When calculating string distance, the string fell
will receive a higher ranking than smell
when the distance is measured between them and bell
.
The NPM package natural
provides three different algorithms for string distance calculation: Jaro-Winkler, the Dice coefficient, and the Levenshtein distance. Their main differences can be described as follows:
Dice coefficient: This calculates the difference between strings and represents the difference as a value between zero and one. Zero being completely different and one meaning identical.
Jaro-Winkler: This is similar to the Dice Coefficient, but gives greater weighting to similarities at the beginning of the string.
Levenshtein...