Summary
Although it may seem paradoxical, try to avoid AI before jumping into a project that involves millions to billions of records of data (such as SQL, Oracle, and big data). Try simpler classical solutions like big data methods. If the AI project goes through, LLN will lead to random sampling over the datasets, thanks to CLT.
A pipeline of classical and ML processes will solve the volume problem, as well as the human analytic limit problem. The random sampling function does not need to run a mini-batch function included in the KMC program. Batches can be generated as a preprocessing phase using classical programs. These programs will produce random batches of equal size to the KMC NP-hard problem, transposing it into an NP problem.
KMC, an unsupervised training algorithm, will transform unlabeled data into a labeled data output containing a cluster number as a label.
In turn, a decision tree, chained to the KMC program, will train its model using the output of...