This chapter summarizes everything you learned throughout the book with end-to-end examples. We analyzed the data, transformed it, performed several experiments to figure out how to set up the model-training pipeline, and built models. The chapter also stresses on the need for well-designed code, which can be shared across several projects. In our example, we created a shared library that was used at the time of training as well as being utilized during the scoring time. This was demonstrated on the critical operation called "model deployment" when trained models and related artifacts are used to score unseen data.
This chapter also brings us to the end of the book. Our goal was to show that solving machine learning challenges with Spark is mainly about experimentation with data, parameters, models, debugging data / model-related issues, writing code that can...