Summary
In this chapter, we went into details of how to use PySpark ML: the official main machine learning library for PySpark. We explained what the Transformer
and Estimator
are, and showed their role in another concept introduced in the ML library: the Pipeline
. Subsequently, we also presented how to use some of the methods to fine-tune the hyper parameters of models. Finally, we gave some examples of how to use some of the feature extractors and models from the library.
In the next chapter, we will delve into graph theory and GraphFrames that help in tackling machine learning problems better represented as graphs.