Chapter 6: Feature Engineering – Extraction, Transformation, and Selection
In the previous chapter, you were introduced to Apache Spark's native, scalable machine learning library, called MLlib, and you were provided with an overview of its major architectural components, including transformers, estimators, and pipelines.
This chapter will take you to your first stage of the scalable machine learning journey, which is feature engineering. Feature engineering deals with the process of extracting machine learning features from preprocessed and clean data in order to make it conducive for machine learning. You will learn about the concepts of feature extraction, feature transformation, feature scaling, and feature selection and implement these techniques using the algorithms that exist within Spark MLlib and some code examples. Toward the end of this chapter, you will have learned the necessary techniques to implement scalable feature engineering pipelines that convert...