Issues in common with supervised learning
Many of the issues that we discussed related to supervised learning are also common with unsupervised learning. Some of them are listed here:
Types of features handled by the algorithm: Most clustering and outlier algorithms need numeric representation to work effectively. Transforming categorical or ordinal data has to be done carefully
Curse of dimensionality: Having a large number of features results in sparse spaces and affects the performance of clustering algorithms. Some option must be chosen to suitably reduce dimensionality—either feature selection where only a subset of the most relevant features are retained, or feature extraction, which transforms the feature space into a new set of principal variables of a lower dimensional space
Scalability in memory and training time: Many unsupervised learning algorithms cannot scale up to more than a few thousands of instances either due to memory or training time constraints
Outliers and noise in...