Predicting cluster assignments
The goal in this exercise is to score the test dataset, by assigning clusters based upon the predict method for the training dataset.
Using flexclust to predict cluster assignment
The standard kmeans
function does not have a prediction method. However, we can use the flexclust
package which does. Since the prediction method can take a long time to run, we will illustrate it only on a sample number of rows and columns. In order to compare the test and training results, they also need to have the same number of columns. For illustration purposes, we will set the number at 10.
To begin, take a sample from the OnlineRetail
training data:
set.seed(1) sample.size <- 10000 max.cols <- 10 library("flexclust") OnlineRetail <- OnlineRetail[1:sample.size, ]
Next, create the document term matrix from the description column in the sampled dataset. We will use the create_matrix
function from the RTextTools
package, which can create a TDM first without having a separate...