Nearest neighbor estimator
Using nearest neighbor, we have an unclassified object and a set of objects that are classified. We then take the attributes of the unclassified object, compare against the known classifications in place, and select the class that is closest to our unknown. The comparison distances resolve to Euclidean geometry computing the distances between two points (where known attributes fall in comparison to the unknown's attributes).
Nearest neighbor using R
For this example, we are using the housing data from ics.edu
. First, we load the data and assign column names:
housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data")
colnames(housing) <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PRATIO", "B", "LSTAT", "MDEV")
summary(housing)
We reorder the data so the key (the housing price MDEV
) is in ascending order:
housing <- housing[order(housing$MDEV),]
Now, we can split the data into a training...