In this chapter, we will present typical use cases for using Spark SQL in graph applications. Graphs are common in many different domains. Typically, graphs are analyzed using special graph processing engines. GraphX is the Spark component for graph computations. It is based on RDDs and supports graph abstractions and operations, such as subgraphs, aggregateMessages, and so on. In addition, it also exposes a variant of the Pregel API. However, our focus will be on the GraphFrame API implemented on top of Spark SQL Dataset/DataFrame APIs. GraphFrames is an integrated system that combines graph algorithms, pattern matching, and queries. GraphFrame API is still in beta (as of Spark 2.2) but is definitely the future graph processing API for Spark applications.
More specifically, in this chapter, you will learn the following topics:
- Using GraphFrames...