K nearest neighborhood
K nearest neighborhood is another supervised learning algorithm which helps us to figure out the class of the out-sample data among k classes. K has to be chosen appropriately, otherwise it might increase variance or bias, which reduces the generalization capacity of the algorithm. I am considering Up
, Down
, and Nowhere
as three classes which have to be recognized on the out-sample data. This is based on Euclidian distance. For each data point in the out-sample data, we calculate its distance from all data points in the in-sample data. Each data point has a vector of distances and the K distance which is close enough will be selected and the final decision about the class of the data point is based on a weighted combination of all k neighborhoods:
>library(class)
The K nearest neighborhood function in R does not need labeled values in the training data. So I am going to use the normalized in-sample and normalized out-sample data created in the Logistic regression...