For this application section, in which we will discuss triangle counting, (strongly) connected components, PageRank and other algorithms available in GraphX, we will load another interesting graph dataset from http://networkrepository.com/. This time please download data from http://networkrepository.com/ca-hollywood-2009.php, which consists of an undirected graph whose vertices represent actors occurring in movies. Each line of the file contains two vertex IDs representing an edge, meaning that these actors appeared together in a movie.
The dataset consists of about 1.1 million vertices and has 56.3 million edges. Although the file size, even after unzipping, is not particularly large, a graph of this size is a real challenge for a graph processing engine. Since we assume you work with Spark's standalone mode locally, this graph will likely...