Traversing a Graph

A common activity on a graph is visiting each vertex of it in a given order. We will start by introducing the breadth-first search, and then follow with depth-first search. Both of these techniques form the archetype for many important graph algorithms, as we will see later with the cycle detection and Dijkstra's algorithm for single-source shortest paths.

Breadth-First Search

Given a graph G = (V, E) and a source vertex s, breadth-first search explores the edges of G systematically to discover every vertex that is reachable from s. While doing so, it computes the smallest number of edges from s to each reachable vertex, making it suitable to solve the single-source shortest path problem on unweighted graphs, or graphs whose edges all have the same weight.

Breadth-First Search (BFS) is named so because it expands the frontier between discovered and undiscovered vertices uniformly across the breadth of the frontier. In that sense, the algorithm first explores vertices at distance k from s before discovering vertices at distance k + 1. To keep track of progress, breadth-first search identifies each vertex as undiscovered, discovered, or expanded. All vertices start out undiscovered. A vertex is discovered the first time it is encountered during search, and is expanded when all the vertices adjacent to it have been discovered.

BFS constructs a breadth-first tree, rooted at source vertex s. Whenever the search discovers an undiscovered vertex v when scanning the outward edges of already discovered vertex u, the vertex v and the edge (u, v) are added to the tree. Therefore, u becomes the parent of v in the breadth-first tree. Since a vertex is discovered at most once, it has at most one parent.

In order to illustrate this, let's look at a run of breadth-first searches for the directed graph of Figure 6.1, starting at node 2, in the following table:

Table 6.3: A Run of BFS on the directed graph of Figure 6.1, starting at node 2

There are a lot of insights to take from the breadth-first tree. For instance, the path from the root to a given node in the tree is the shortest path (in terms of edges) from those two vertices. Another thing to note is that vertices that are not in the breadth-first tree (as is the case of 0) are unreachable from the root vertex.

We previously saw how to perform the breadth-first search on trees. BFS on graphs is similar, but we need to keep track of explored nodes so that we don't get stuck in cycles. In order to implement breadth-first search, we will assume that our graph is represented using adjacency lists.

We will attach certain attributes to each vertex in the graph that will allow us to guide our search and later construct the breadth-first tree. We will also use a first-in, first-out queue (covered in Chapter 2, Sorting Algorithms and Fundamental Data Structures) to manage the set of discovered vertices. The following code snippet illustrates the implementation of breadth-first search:

Queue<Integer> q = new LinkedList<>();
q.add(start);
while (!q.isEmpty()) {
  int current = q.remove();
  for (int i = 0; i < this.adj[current].size(); i++) {
    int next = this.adj[current].get(i);
    if (!visited[next]) {
      visited[next] = true;
      parent[next] = current;
      q.add(next);
    }
  }
}

Snippet 6.4: Implementation of breadth-first search. Source class name: BFS.Graph

Go to https://goo.gl/VqrQWM to access this code.

Let's focus on the implementation of the BFS function. We will start by initializing a couple of auxiliary arrays: parent and visited. The first one will hold, at parent[i], the parent of node i in the breadth-first tree. The second one will tell us, at visited[i], whether or not vertex i has been discovered. We start by discovering the starting node and adding it to a queue. The queue will keep those vertices that have been discovered but not yet expanded. As such, while there are still elements in the queue, we will take its first element, go through its adjacent vertices, and discover those that haven't already been discovered, adding them to the queue.

When the queue becomes empty, we're sure of having expanded all vertices that are reachable from the starting vertex.

In the previous implementation, we've returned the array of parent nodes of the breadth-first tree in the bfs() function, allowing us to reconstruct the paths. If not necessary, you could just return the size of paths, or any other information we might be interested in extracting from the breadth-first search traversal.

In the bfs() method, we're sure of enqueuing, and hence dequeuing, each vertex at most once. As such, the total time dedicated to queue operations is O(V). After dequeuing each vertex, we scan its adjacency list. Since we dequeue each vertex at most once, we scan each adjacency list at most once. As the sum of lengths of all the adjacency lists is O(E), the total time spent in scanning adjacency lists is O(E). Therefore, the BFS procedure has an initialization time of O(V) and a total running time of O(V + E), running in linear time to the size of the adjacency list representation of G.

As we will see in later sections, the BFS procedure is the archetype for many important graph algorithms.

Depth-First Search

Given a graph G = (V, E) and a source vertex s, depth-first search explores the edges of the graph by going "deeper" in the graph whenever possible. Depth-First Search (DFS) explores edges adjacent to the most recently discovered vertex v that still has unexplored edges whose head is adjacent to it. Once all of v's edges have been explored, the search "backtracks" to explore edges, leaving the vertex from which v was discovered. The process continues until all vertices that are reachable from the original source vertex have been discovered.

If any undiscovered vertices remain, then DFS selects one of them as a new source, and it repeats the search from that source. While it may seem odd that BFS limits itself to vertices reachable from a single source whereas DFS considers multiple sources, the reason behind it is related to the applications of these searches.

BFS is usually used to find shortest-path distances while DFS is often used as a subroutine in another algorithm, which we shall see when we explore the cycle detection problem.

Similar to BFS, when we discover a vertex v during the scan of the adjacency list of an already discovered vertex, we record its parent attribute. Since we mentioned that we explore different sources, the parent subgraph produced by DFS is, unlike the breadth-first tree, a forest (that is, a set of trees).

In order to illustrate this, let's look at a run of DFS for the directed graph of Figure 6.1, starting at node 2, in the following table:

Table 6.4: A run of DFS on the directed graph of Figure 6.1, starting at node 2

Note that the results of DFS may depend on the order in which the vertices are examined. In the previous case, we started with 2 and always went for the lowest-numbered vertex in the adjacency list of a vertex first. If we had started with vertex 0, we would have a different forest. In practice, we can usually use any DFS result with equivalent results.

We previously saw how to perform DFS on trees. DFS on graphs is similar, but we need to keep track of explored nodes so that we don't get stuck in cycles.

In order to implement DFS, we will assume that our graph is represented using adjacency lists. We will attach certain attributes to each vertex in the graph, which will allow us to guide our search and later construct the depth-first forest. The following code snippet illustrates the implementation of depth-first search:

public void dfsVisit(int u, boolean[] visited, int[] parent) {
  visited[u] = true;
  for (int i = 0; i < adj[u].size(); i++) {
    int next = adj[u].get(i);
    if (!visited[next]) {
      parent[next] = u;
      dfsVisit(next, visited, parent);
    }
  }
}

Snippet 6.5: Implementation of depth-first search. Source class name:dfs.Graph.

Go to https://goo.gl/saZYQp to access this code.

The DFS procedure works by initializing all vertices as not visited, and setting their parents to -1 (meaning that they have no parent). Then, we find the first undiscovered vertex and visit it. In each visit, we start by recording the vertex as visited and then going through its adjacency list. There, we are looking for vertices not yet discovered. Once we find one, we visit it. Looking at the previous implementation, we see that the loops inside DFS take time O(V), as they run for each vertex in the graph. We can also see that the dfsVisit() method is called exactly once for each vertex. During the execution of dfsVisit(), the loop scanning the adjacency list executes in time proportional to the size of the vertex's adjacency list. Since we said before that dfsVisit() is called exactly once for each vertex, the total time spent in the loop is proportional to the sum of the sizes of all adjacency lists, that is, O(E). Therefore, the total running time of DFS is O(V + E).

In the DFS method, we're returning the parent array, but the return type of this routine is usually adapted depending on the larger task that a more general algorithm that uses DFS is trying to achieve. We'll see DFS adapted to our specific needs in the next section.

Cycle Detection

A useful application of DFS is determining whether or not a graph is acyclic (that is, it does not contain cycles). In order to do so, it's important to define four types of edges in terms of the depth-first forest produced by DFS. They are as follows:

Tree edges: They are edges in the depth-first forest. An edge can only be a tree edge if it was the one explored when first discovering a vertex.
Back edges: They are edges connecting a vertex to an ancestor in a depth-first tree. Self-loops (which may occur in directed graphs) are back edges.
Forward edges: They are edges that do not belong to a depth-first tree but connect a vertex to one of its descendants in a depth-first tree. Forward edges are therefore edges that weren't used when performing the DFS, but connect vertices u and v in a depth-first tree provided that v is a descendant of u in the tree.
Cross edges: They are all other edges. They can go between vertices in the same depth-first tree or they can go between vertices in different depth-first trees. They are therefore edges that weren't used when performing the depth-first search, but connect vertices that don't share an ancestor relationship in the same tree or vertices in different trees.

Having classified edges, it is possible to show that a directed graph is acyclic if and only if a DFS does not produce back edges. If a depth-first search produces a back edge (u, v), then vertex v is an ancestor of vertex u in the depth-first forest. Therefore, G contains a path from v to u, and (u, v) completes a cycle. This algorithm is generalizable for undirected graphs. In undirected graphs, if we find a back edge (u, v) and v is not the parent of u in the depth-first forest, then we are in the presence of a cycle.

Activity: Using BFS to Find the Shortest Path Out of a Maze

Scenario

Our maze is an H by W rectangle, represented by an array of size H of W-sized strings. Each character in a string can either be '#' or '.'. '#' represents a wall, which we cannot cross, and '.' represents a free space, which we can cross. The border of the maze is always filled with '#' except for one square, which represents the exit. For example, the following is a valid maze:

Find the total number of steps to exit the maze, when supplied with a starting point (i, j) (with (0, 0) being the upper-left point and (H, W) being the lower-right point).

Aim

To use BFS to find the shortest path out of a given maze.

Prerequisites

Implement the distToExit() method of the Maze class in the source code, which returns the integer distance from that point to the exit of the maze. It is available at the following URL:
https://github.com/TrainingByPackt/Data-Structures-and-Algorithms-in-Java/blob/master/src/main/java/com/packt/datastructuresandalg/lesson6/activity/maze/Maze.java
Assume that the points supplied to distToExit() are valid (that is, they're not inside walls)
Remember that we can only move in the cardinal directions (North, South,
East, and West)

Steps for Completion

Encode the maze representation to a graph representation
Apply the BFS implementation shown in the preceding section (with a small modification to account for distances), or you can build the graph as you go
Since you know that there are at most four outward edges for a given vertex, compute their positions as you go

In this section, we've introduced two different ways to traverse a graph—breadth-first search (BFS) and depth-first search (DFS). We've seen how to use BFS to find the single-source shortest path in unweighted graphs and how to use DFS to find cycles in a graph. In the next section, we will be looking at two different algorithms to find shortest paths in a graph.