"One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man."
- Elbert Hubbard
In this chapter, you will learn how to use Spark for the analysis of structured data (unstructured data, such as a document containing arbitrary text or some other format has to be transformed into a structured form); we will see how DataFrames/datasets are the corner stone here, and how Spark SQL's APIs make querying structured data simple yet robust. Moreover, we introduce datasets and see the difference between datasets, DataFrames, and RDDs. In a nutshell, the following topics will be covered in this chapter:
- Spark SQL and DataFrames
- DataFrame and SQL API
- DataFrame schema
- datasets and encoders
- Loading and saving data
- Aggregations
- Joins