Assumptions and mathematical notations
There are some key assumptions made by many stream machine learning techniques and we will state them explicitly here:
- The number of features in the data is fixed.
- Data has small to medium dimensions, or number of features, typically in the hundreds.
- The number of examples or training data can be infinite or very large, typically in the millions or billions.
- The number of class labels in supervised learning or clusters are small and finite, typically less than 10.
- Normally, there is an upper bound on memory; that is, we cannot fit all the data in memory, so learning from data must take this into account, especially lazy learners such as K-Nearest-Neighbors.
- Normally, there is an upper bound on the time taken to process the event or the data, typically a few milliseconds.
- The patterns or the distributions in the data can be evolving over time.
- Learning algorithms must converge to a solution in finite time.
Let Dt = {xi, yi : y = f(x)} be the given data available...