Summary
In this chapter, you learned about SQL as a declarative language that has been universally accepted as the language for structured data analysis because of its ease of use and expressiveness. You learned about the basic constructions of SQL, including the DDL and DML dialects of SQL. You were introduced to the Spark SQL engine as the unified distributed query engine that powers both Spark SQL and DataFrame APIs. SQL optimizers, in general, were introduced, and Spark's very own query optimizer Catalyst was also presented, along with its inner workings as to how it takes a Spark SQL query and converts it into Java JVM bytecode. A reference to the Spark SQL language was also presented, along with the most important DDL and DML statements, with examples. Finally, a few performance optimizations techniques were also discussed to help you get the best out of Spark SQL for all your data analysis needs. In the next chapter, we will extend our Spark SQL knowledge and see how external...