Summary
In this chapter, we explored Spark SQL and DataFrames. DataFrames add a rich layer of abstraction on top of Spark's core engine, greatly facilitating the manipulation of tabular data. Additionally, the source API allows the serialization and de-serialization of DataFrames from a rich variety of data files.
In the next chapter, we will build on our knowledge of Spark and DataFrames to build a spam filter using MLlib.