What is entity matching?
Finding matching items is one of the oldest tasks in database processing, and as databases get larger and more distributed, this task becomes more and more important. Each time two datasets are merged, questions arise about how to identify duplicates, how to connect items from the first dataset to the similar items in the second data set. When we find ourselves asking Are these two things different even though they have the same name? or Are these other two things the same, even though they have different names? we can apply entity matching techniques to find out the answer.
In light of all this concern with the names for an item, it is perhaps appropriate that this task itself has many names: entity matching, entity disambiguation, object consolidation, duplicate identification, merge/purge, and record linkage, to name a few. We will use the term entity matching in this chapter to generically describe this class of activities.
Consider the following examples where...