This is the first of two recipes which cover the ML pipeline in Spark 2.0. For a more advanced treatment of ML pipelines with additional details such as API calls and parameter extraction, see later chapters in this book.
In this recipe, we attempt to have a single pipeline that can tokenize text, use HashingTF (an old trick) to map term frequencies, run a regression to fit a model, and then predict which group a new term belongs to (for example, news filtering, gesture classification, and so on).