Chapter 1, Introduction to Large-Scale Machine Learning, invites readers into the land of machine learning and big data, introduces historical paradigms, and describes contemporary tools, including Apache Spark and H2O.
Chapter 2, Detecting Dark Matter: The Higgs-Boson Particle, focuses on the training and evaluation of binomial models.
Chapter 3, Ensemble Methods for Multi-Class Classification, checks into a gym and tries to predict human activities based on data collected from body sensors.
Chapter 4, Predicting Movie Reviews Using NLP, introduces the problem of nature language processing with Spark and demonstrates its power on the sentiment analysis of movie reviews.
Chapter 5, Online Learning with Word2Vec, goes into detail about contemporary NLP techniques.
Chapter 6, Extracting Patterns from Clickstream Data, introduces the basics of frequent pattern mining and three algorithms available in Spark MLlib, before deploying one of these algorithms in a Spark Streaming application.
Chapter 7, Graph Analytics with GraphX, familiarizes the reader with the basic concepts of graphs and graph analytics, explains the core functionality of Spark GraphX, and introduces graph algorithms such as PageRank.
Chapter 8, Lending Club Loan Prediction, combines all the tricks introduced in the previous chapters into end-to-end examples, including data processing, model search and training, and model deployment as a Spark Streaming application.