Getting started with GraphX
You don't need any additional installation of software to get started with GraphX. GraphX is included within the Spark installation. This section introduces how to create and explore graphs using a simple family relationship graph. The family graph created will be used in all operations within this section.
Basic operations of GraphX
GraphX does not support the Python API yet. For easy understanding, let's use spark-shell
to interactively work with GraphX. First of all, let's create input data (vertex and edge files) needed for our GraphX operations and then store it on HDFS.
Note
All programs in this chapter are executed on CDH 5.8 VM. For other environments, file paths might change, but the concepts are the same in any environment.
Creating a graph
We can create a graph using the following steps:
Create a vertex file with vertex ID, name, and age as shown here:
[cloudera@quickstart ~]$ cat vertex.csv 1,Jacob,48 2,Jessica,45 3,Andrew,25 4,Ryan,53 5,Emily,22 6,Lily,52...