Chapter 12. Common Recipes for Implementing a Robust Machine Learning System
In this chapter, we will cover:
- Spark's basic statistical API to help you build your own algorithms
- ML pipelines for real-life machine learning applications
- Normalizing data with Spark
- Splitting data for training and testing
- Common operations with the new Dataset API
- Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0
- LabeledPoint data structure for Spark ML
- Getting access to Spark cluster in Spark 2.0+
- Getting access to Spark cluster pre-Spark 2.0
- Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0
- New model export and PMML markup in Spark 2.0
- Regression model evaluation using Spark 2.0
- Binary classification model evaluation using Spark 2.0
- Multilabel classification model evaluation using Spark 2.0
- Multiclass classification model evaluation using Spark 2.0
- Using the Scala Breeze library to do graphics in Spark 2.0