If you don't know in advance how many clusters you have, then how do you choose the optimal k? This is essentially an egg-and-chicken problem. Several approaches are popular and we'll discuss one of them: the elbow method.
Do you remember those mysterious WCSS that we calculated on every iteration of k-means? This measure tells us how much points in every cluster are different from their centroid. We can calculate it for several different k values and plot the result. It usually looks somewhat similar to the plot on the following graph:
This plot should remind you about the similar plots of loss functions from Chapter 3, K-Nearest Neighbors Classifier. It shows how well our model fits the data. The idea of the elbow method is to choose the k value after which the result is not...