In this chapter, we will cover the following topics:
- Two methods of ingesting and preparing a CSV file for processing in Spark
- Singular Value Decomposition (SVD) to reduce high-dimensionality in Spark
- Principal Component Analysis (PCA) to pick the most effective latent factor for machine learning in Spark