Demonstrating the use of RecordLinkage package
We will leverage the RecordLinkage
package in R. The data shown in the previous section is available with the package:
Note
RecordLinkage
: Record linkage in R provides functions to link and deduplicate datasets. Methods based on a stochastic approach are implemented, as well as classification algorithms from the machine learning domain. Authors: Andreas Borg and Murat Sariyar.
> library(RecordLinkage, quietly = TRUE) > data(RLdata500) > str(RLdata500) 'data.frame': 500 obs. of 7 variables: $ fname_c1: Factor w/ 146 levels "ALEXANDER","ANDRE",..: 19 42 114 128 112 77 42 139 26 99 ... $ fname_c2: Factor w/ 23 levels "ALEXANDER","ANDREAS",..: NA NA NA NA NA NA NA NA NA NA ... $ lname_c1: Factor w/ 108 levels "ALBRECHT","BAUER",..: 61 2 31 106 50 23 76 61 77 30 ... $ lname_c2: Factor w/ 8 levels "ENGEL","FISCHER",..: NA NA NA NA NA NA NA NA NA NA ... $ by : int 1949 1968 1930 1957 1966 1929 1967 1942 1978 1971 ... $ bm ...