K-means clustering of terms
Now we can cluster the term document matrix using k-means. For illustration purposes, we will specify that five clusters be generated:
kmeans5 <- kmeans(dtms, 5)
Once k-means is done, we will append the cluster number to the original data, and then create five subsets based upon the cluster:
kw_with_cluster <- as.data.frame(cbind(OnlineRetail, Cluster = kmeans5$cluster)) # subset the five clusters cluster1 <- subset(kw_with_cluster, subset = Cluster == 1) cluster2 <- subset(kw_with_cluster, subset = Cluster == 2) cluster3 <- subset(kw_with_cluster, subset = Cluster == 3) cluster4 <- subset(kw_with_cluster, subset = Cluster == 4) cluster5 <- subset(kw_with_cluster, subset = Cluster == 5)
Examining cluster 1
Print out a sample of the data:
> head(cluster1[10:13]) Desc2 lastword firstword Cluster 50 VintageBillboardLove/hateMug MUG VINTAGE 1 86 BagVintagePaisley PAISLEY BAG 1 113 ShopperVintagePaisley PAISLEY SHOPPER 1 145 ShopperVintagePaisley...