Complete R code
The complete R code for this chapter is given as follows. We have split the different algorithms used for record linkage into different sections, for ease of reading.
Feature generation
Let us begin with the feature generation R code:
library(RecordLinkage, quietly = TRUE) ###### Quick look at our data ######### data(RLdata500) str(RLdata500) head(RLdata500) #### Feature generation ############# rec.pairs <- compare.dedup(RLdata500 ,blockfld = list(1, 5:7) ,strcmp = c(2,3,4) ,strcmpfun = levenshteinSim) summary(rec.pairs) matches <- rec.pairs$pairs matches[c(1:3, 1203:1204), ] RLdata500[1,] RLdata500[174,] # String features rec.pairs.matches <- compare.dedup(RLdata500 ,blockfld = list(1, 5:7) ,strcmp = c(2,3,4) ,strcmpfun = levenshteinSim) # Not specifying the fields for string comparision...