Batch Big Data Machine Learning
Batch Big Data Machine Learning involves two basic steps, as discussed in Chapter 2, Practical Approach to Real-World Supervised Learning, Chapter 3, Unsupervised Machine Learning Techniques, and Chapter 4, Semi-Supervised and Active Learning: learning or training data from historical datasets and applying the learned models to unseen future data. The following figure demonstrates the two environments along with the component tasks and some technologies/frameworks that accomplish them:
We will discuss two of the most well-known frameworks for doing Machine Learning in the context of batch data and will use the case study to highlight either the code or tools to perform modeling.
H2O as Big Data Machine Learning platform
H2O (References [13]) is a leading open source platform for Machine Learning at Big Data scale, with a focus on bringing AI to the enterprise. The company was founded in 2011...