Machine learning algorithms can be seen as optimization problems. They take data samples, and an objective function, and try to optimize this function. In the case of supervised learning, the objective function is based on the labels given to it. We try to minimize the differences between our predictions and the actual labels. In the case of unsupervised learning, things are different due to the lack of labels. Clustering algorithms, in essence, try to put the data samples into clusters so that it minimizes the intracluster distances and maximizes the intercluster distances. In other words, we want samples that are in the same cluster to be as similar as possible, and samples from different clusters to be as different as possible.
Nevertheless, there is one trivial solution to this optimization problem. If we treat each sample as its own cluster, then the intracluster distances are all zeros and the intercluster distances are at their maximum....