Now that we have measured numerical and text distances, we will spend some time learning how to combine them to measure distances between observations that have both text and numerical features.
Using an address matching example
Getting ready
Nearest-neighbor is a great algorithm to use for address matching. Address matching is a type of record matching in which we have addresses in multiple datasets and would like to match them up. In address matching, we may have typos in the address, different cities, or different ZIP Codes, but they may all refer to the same address. Using the nearest-neighbor algorithm across the numerical and character components of an address may help us to identify addresses that are actually the same...