We are almost at the end of the book, but the last chapter is going to utilize all the tricks and knowledge we covered in the previous chapters. We showed you how to utilize the power of Spark for data manipulation and transformation, and we showed you the different methods for data modeling, including linear models, tree models, and model ensembles. Essentially, this chapter will be the kitchen sink of chapters, whereby we will deal with many problems all at once, ranging from data ingestion, manipulation, preprocessing, outlier handling, and modeling, all the way to model deployment.
One of our main goals is to provide a realistic picture of a data scientists' daily life--start with almost raw data, explore the data, build a few models, compare them, find the best model, and deploy into production--if only it were this easy all the time! In...