Coding our first Spark SQL job
In this section, we will discuss the basics of writing/coding Spark SQL jobs in Scala and Java. Spark SQL exposes the rich DataFrame API (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame) for loading and analyzing datasets in various forms. It not only provides operations for loading/analyzing data from structured formats such as Hive, Parquet, and RDBMS, but also provides flexibility to load data from semistructured formats such as JSON and CSV. In addition to the various explicit operations exposed by the DataFrame API, it also facilitates the execution of SQL queries against the data loaded in the Spark.
Let's move ahead and code our first Spark SQL job in Scala and then we will also look at the corresponding implementation in Java.
Coding a Spark SQL job in Scala
In this section, we will code and execute our first Spark SQL Job using Scala APIs.
It is our first Spark SQL job, so we will make it simple and use some sample...