GraphFrames
Having seen GraphX over the course of this chapter, have you not wondered what happened to DataFrame? If you are reading/following this book cover to cover, you might be asking yourself why is there a switch between RDD and the DataFrame API? We saw that DataFrame has become the primary API for Spark, and all new optimizations can only be benefitted from if you are using a DataFrame API, so why is there no DataFrame API for GraphX?
Well the reality is that there is a lot of focus on GraphFrames, which is the DataFrame based API for graphs in Spark. There are certain motivations to have a DataFrame based API for Spark and some of these stem from some shortcomings of GraphX.
Why GraphFrames?
GraphX poses certain challenges, for example:
- Supports Scala only: The promise of Spark lies in the fact that you can have the same set of algorithms available to a wide variety of users, who can program in Java, Scala, Python, or R. GraphX only supports Scala API. This is a serious limitation...