Accumulators and implementing BFS in Spark
Now that we have the concept of breadth-first-search under our belt and we understand how that can be used to find the degrees of separation between superheroes, let's apply that and actually write some Spark code to make it happen. So how do we turn breadth-first search into a Spark problem? This will make a lot more sense if that explanation of how BFS works is still fresh in your head. If it's not, it might be a good idea to go back and re-read the previous section; it will really help a lot if you understand the theory.
Convert the input file into structured data
The first thing we need to do is actually convert our data file or input file into something that looks like the nodes that we described in the BFS algorithm in the previous section, Superhero degrees of separation - introducing breadth-first search.
We're starting off, for example, with a line of input that looks like the one shown here that says hero ID 5983
appeared with heroes 1165...