Chapter 7. Clustering with Python
In the previous two chapters, we discussed and understood two important algorithms used in predictive analytics, namely, linear regression and logistic regression. Both of them are very widely used. They are supervised algorithms. If you stress your memory a tad bit and have thoroughly read the previous chapters of the book, you would remember that a supervised algorithm is one where the historical value of an output variable is known from the data. A supervised algorithm uses this value to train and build the model to forecast the value of an output variable for a dataset in future. An unsupervised algorithm, on the other hand, doesn't have the luxury or constraints (different perspectives of looking at it) of the output variable. It uses the values of the predictor variables instead to build a model.
Clustering—the algorithm that we are going to discuss in this chapter—is an unsupervised algorithm. Clustering or segmentation...