Unsupervised learning with KMC with large datasets
KMC takes unlabeled data and forms clusters of data points. The names (integers) of these clusters provide a basis to then run a supervised learning algorithm such as a decision tree.
In this section, we will see how to use KMC with large datasets.
When facing a project with large unlabeled datasets, the first step consists of evaluating if machine learning will be feasible or not. Trying to avoid AI in a book on AI may seem paradoxical. However, in AI, as in real life, you should use the right tools at the right time. If AI is not necessary to solve a problem, do not use it.
Use a proof of concept (POC) approach to see if a given AI project is possible or not. A POC costs much less than the project itself and helps to build a team that believes in the outcome. Or, the POC might show that it is too risky to go forward with an ML solution. Intractable problems exist. It's best to avoid spending months on something...