In the previous section, we looked at Spark's core functionality using RDDs. RDDs are powerful constructs; however, there are still some low-level details that a Spark user has to understand and master before making use of it. Spark's Datasets and DataFrame constructs provide higher level APIs for working with data.
Spark's Dataset brings a declarative style of programming along with the functional programming style of RDD. Structured Query Language (SQL) is a very popular declarative language, and is extremely popular among people who do not have a strong background in functional programming. The Spark DataFrame is a special type of dataset that provides the concepts of the row and column, as seen in the tradition relational database (RDBS) work.
Let's explore the example we used earlier using RDD. We will use the dataset and...