Anatomy of DFS in Digraphs - Digraphs and DAGs - Algorithms Third Edition in C++ Part 5. Graph Algorithms (2006)

Algorithms Third Edition in C++ Part 5. Graph Algorithms (2006)

CHAPTER NINETEEN
Digraphs and DAGs

19.2 Anatomy of DFS in Digraphs

We can use our DFS code for undirected graphs from Chapter 18 to visit each edge and each vertex in a digraph. The basic principle behind the recursive algorithm holds: To visit every vertex that can be reached from a given vertex, we mark the vertex as having been visited, then (recursively) visit all the vertices that can be reached from each of the vertices on its adjacency list.

In an undirected graph, we have two representations of each edge, but the second representation that is encountered in a DFS always leads to a marked vertex and is ignored (see Section 18.2). In a digraph, we have just one representation of each edge, so we might expect DFS algorithms to be more straightforward. But digraphs themselves are more complicated combinatorial objects than undirected graphs, so this expectation is not justified. For example, the search trees that we use to understand the operation of the algorithm have a more complicated structure for digraphs than for undirected graphs. This complication makes digraph-processing algorithms more difficult to devise. For example, as we will see, it is more difficult to make inferences about directed paths in digraphs than it is to make inferences about paths in graphs.

As we did in Chapter 18, we use the term standard adjacency-lists DFS to refer to the process of inserting a sequence of edges into a digraph ADT implemented with an adjacency-lists representation (Program 17.9, invoked with true as the constructor’s second argument), then doing a DFS with, for example, Program 18.3 and the parallel term standard adjacency-matrix DFS to refer to the process of inserting a sequence of edges into a digraph ADT implemented with an adjacency-matrix representation (Program 17.7, invoked with true as the constructor’s second argument), then doing a DFS with, for example, Program 18.3.

For example, Figure 19.9 shows the recursive-call tree that describes the operation of a standard adjacency-lists DFS on the sample digraph in Figure 19.1. Just as for undirected graphs, such trees have internal nodes that correspond to calls on the recursive DFS function for each vertex, with links to external nodes that correspond to edges that take us to vertices that have already been seen. Classifying the nodes and links gives us information about the search (and the di-graph), but the classification for digraphs is quite different from the classification for undirected graphs.

Figure 19.9 DFS forest for a digraph

image

This forest describes a standard adjacency-lists DFS of the sample digraph in Figure 19.1. External nodes represent previously visited internal nodes with the same label; otherwise the forest is a representation of the digraph, with all edges pointing down. There are four types of edges: tree edges, to internal nodes; back edges, to external nodes representing ancestors (shaded circles); down edges, to external nodes representing descendants (shaded squares); and cross edges, to external nodes representing nodes that are neither ancestors nor descendants (white squares). We can determine the type of edges to visited nodes, by comparing the preorder and postorder numbers (bottom) of their source and destination:

image

For example, 7-6 is a cross edge because 7‘s preorder and postorder numbers are both larger than 6’s.

In undirected graphs, we assigned each link in the DFS tree to one of four classes according to whether it corresponded to a graph edge that led to a recursive call and to whether it corresponded to the first or second representation of the edge encountered by the DFS. In digraphs, there is a one-to-one correspondence between tree links and graph edges, and they fall into four distinct classes:

• Those representing a recursive call ( tree edges)

• Those from a vertex to an ancestor in its DFS tree ( back edges)

• Those from a vertex to a descendant in its DFS tree ( down edges)

• Those from a vertex to another vertex that is neither an ancestor nor a descendant in its DFS tree ( cross edges)

A tree edge is an edge to an unvisited vertex, corresponding to a recursive call in the DFS. Back, cross, and down edges go to visited vertices. To identify the type of a given edge, we use preorder and postorder numbering (the order in which nodes are visited in preorder and postorder walks of the forest, respectively).

Property 19.3 In a DFS forest corresponding to a digraph, an edge to a visited node is a back edge if it leads to a node with a higher postorder number; otherwise, it is a cross edge if it leads to a node with a lower preorder number and a down edge if it leads to a node with a higher preorder number.

Proof: These facts follow from the definitions. A node’s ancestors in a DFS tree have lower preorder numbers and higher postorder numbers; its descendants have higher preorder numbers and lower postorder numbers. It is also true that both numbers are lower in previously visited nodes in other DFS trees, and both numbers are higher in yetto-be-visited nodes in other DFS trees, but we do not need code that tests for these cases.

Program 19.2 is a DFS class that identifies the type of each edge in the digraph. Figure 19.10 illustrates its operation on the example digraph of Figure 19.1. During the search, testing to see whether an edge leads to a node with a higher postorder number is equivalent to testing whether a postorder number has yet been assigned. Any node for which a preorder number has been assigned but for which a postorder number has not yet been assigned is an ancestor in the DFS tree and will therefore have a postorder number higher than that of the current node.

Figure 19.10 Digraph DFS trace

image

This DFS trace is the output of Program 19.2 for the example digraph in Figure 19.1. It corresponds precisely to a preorder walk of the DFS tree depicted in Figure 19.9.

As we saw in Chapter 17 for undirected graphs, the edge types are properties of the dynamics of the search, rather than of only the graph. Indeed, different DFS forests of the same graph can differ remarkably in character, as illustrated in Figure 19.11. For example, even the number of trees in the DFS forest depends upon the start vertex.

Despite these differences, several classical digraph-processing algorithms are able to determine digraph properties by taking appropriate action when they encounter the various types of edges during a DFS. For example, consider the following basic problem:

Directed cycle detection Does a given digraph have any directed cycles? (Is the digraph a DAG?) In undirected graphs, any edge to a visited vertex indicates a cycle in the graph; in digraphs, we must restrict our attention to back edges.

Program 19.2 DFS of a digraph

This DFS class uses preorder and postorder numberings to show the role that each edge in the graph plays in the DFS (see Figure 19.10).

image

Property 19.4 A digraph is a DAG if and only if we encounter no back edges when we use DFS to examine every edge.

Proof: Any back edge belongs to a directed cycle that consists of the edge plus the tree path connecting the two nodes, so we will find no back edges when using DFS on a DAG. To prove the converse, we show that if the digraph has a cycle, then the DFS encounters a back

Figure 19.11 DFS forests for a digraph

image

These forests describes depth-first search of the same graph as Figure 19.9, when the graph search function checks the vertices (and calls the recursive function for the unvisited ones) in the order s, s+1, …, V-1, 0, 1, …, s-1 for each s. The forest structure is determined both by the search dynamics and the graph structure. Each node has the same children (the nodes on its adjacency list, in order) in every forest. The leftmost tree in each forest contains all the nodes reachable from its root, but reachability inferences about other nodes are complicated because of back, cross, and down edges. Even the number of trees in the forest depends on the starting node, so we do not necessarily have a direct correspondence between trees in the forest and strong components, the way that we did for components in undirected graphs. For example, we see that all vertices are reachable from 8 only when we start the DFS at 8.

edge. Suppose that v is the first of the vertices on the cycle that is visited by the DFS. That vertex has the lowest preorder number of all the vertices on the cycle. The edge that points to it will therefore be a back edge: It will be encountered during the recursive call for v (for a proof that it must be, see Property 19.5), and it points from some node on the cycle to v, a node with a lower preorder number (see Property 19.3).

We can convert any digraph into a DAG by doing a DFS and removing any graph edges that correspond to back edges in the DFS. For example, Figure 19.9 tells us that removing the edges 2-0, 3-5, 2-3, 9-11, 10-12, 4-2, and 8-7 makes the digraph in Figure 19.1 a DAG. The specific DAG that we get in this way depends on the graph representation and the associated implications for the dynamics of the DFS (see Exercise 19.37). This method is a useful way to generate large arbitrary DAGs randomly (see Exercise 19.76) for use in testing DAG-processing algorithms.

Directed cycle detection is a simple problem, but contrasting the solution just described with the solution that we considered in Chapter 18 for undirected graphs gives insight into the necessity of considering the two types of graphs as different combinatorial objects, even though their representations are similar and the same programs work on both types for some applications. By our definitions, we seem to be using the same method to solve this problem as for cycle detection in undirected graphs (look for back edges), but the implementation that we used for undirected graphs would not work for digraphs. For example, in Section 18.5 we were careful to distinguish between parent links and back links since the existence of a parent link does not indicate a cycle (cycles in undirected graphs must involve at least three vertices). But to ignore links back to a node’s parents in digraphs would be incorrect; we do consider a doubly-connected pair of vertices in a digraph to be a cycle. Theoretically, we could have defined back edges in undirected graphs in the same way as we have done here, but then we would have needed an explicit exception for the two-vertex case. More important, we can detect cycles in undirected graphs in time proportional to V (see Section 18.5), but we may need time proportional to E to find a cycle in a digraph (see Exercise 19.32).

The essential purpose of DFS is to provide a systematic way to visit all the vertices and all the edges of a graph. It therefore gives us a basic approach for solving reachability problems in digraphs, although, again, the situation is more complicated than for undirected graphs.

Single-source reachability Which vertices in a given digraph can be reached from a given start vertex s? How many such vertices are there?

Property 19.5 With a recursive DFS starting at s, we can solve the single-source reachability problem for a vertex s in time proportional to the number of edges in the subgraph induced by the reachable vertices.

Proof: This proof is essentially the same as the proof of Property 18.1, but it is worth restating to underline the distinction between reachability in digraphs and connectivity in undirected graphs. The property is certainly true for a digraph that has one vertex and no edges. For any digraph that has more than one vertex, we assume the property to be true for all digraphs that have fewer vertices. Now, the first edge that we take from s divides the digraph into the subgraphs induced by two subsets of vertices (see Figure 19.12): ( i ) the vertices that we can reach by directed paths that begin with that edge and do not otherwise include s;and( ii ) the vertices that we cannot reach with a directed path that begins with that edge without returning to s. We apply the inductive hypothesis to these subgraphs, noting that there are no directed edges from a vertex in the first subgraph to any vertex other than s in the second subgraph (such an edge would be a contradiction because its destination vertex should be in the first subgraph), that directed edges to s will be ignored because it has the lowest preorder number, and that all the vertices in the first subgraph have lower preorder numbers than any vertex in the second subgraph, so all directed edges from a vertex in the second subgraph to a vertex in the first subgraph will be ignored.

Figure 19.12 Decomposing a digraph

image

To prove by induction that DFS takes us everywhere reachable from a given node in a digraph, we use essentially the same proof as for Trémaux exploration. The key step is depicted here as a maze (top), for comparison with Figure 18.4. We break the graph into two smaller pieces (bottom), induced by two sets of vertices: Those vertices that can be reached by following the first edge from the start vertex without revisiting it (bottom piece), and those vertices that cannot be reached by following the first edge without going back through the start vertex (top piece). Any edge that goes from a vertex in the first set to the start vertex is skipped during the search of the first set because of the mark on the start vertex. Any edge that goes from a vertex in the second set to a vertex in the first set is skipped because all vertices in the first set are marked before the search of the second subgraph begins.

By contrast with undirected graphs, a DFS on a digraph does not give full information about reachability from any vertex other than the start node, because tree edges are directed and because the search structures have cross edges. When we leave a vertex to travel down a tree edge, we cannot assume that there is a way to get back to that vertex via digraph edges; indeed, there is not, in general. For example, there is no way to get back to 4 after we take the tree edge 4-11 in Figure 19.9. Moreover, when we ignore cross and forward edges (because they lead to vertices that have been visited and are no longer active), we are ignoring information that they imply (the set of vertices that are reachable from the destination). For example, following the cross edge 6-9 in Figure 19.9 is the only way for us to find out that 10, 11, and 12 are reachable from 6.

To determine which vertices are reachable from another vertex, we apparently need to start over with a new DFS from that vertex (see Figure 19.11). Can we make use of information from previous searches to make the process more efficient for later ones? We consider such reachability questions in Section 19.7.

To determine connectivity in undirected graphs, we rely on knowing that vertices are connected to their ancestors in the DFS tree, through (at least) the path in the tree. By contrast, the tree path goes in the wrong direction in a digraph: There is a directed path from a vertex in a digraph to an ancestor only if there is a back edge from a descendant to that or a more distant ancestor. Moreover, connectivity in undirected graphs for each vertex is restricted to the DFS tree rooted at that vertex; in contrast, in digraphs, cross edges can take us to any previously visited part of the search structure, even one in another tree in the DFS forest. For undirected graphs, we were able to take advantage of these properties of connectivity to identify each vertex with a connected component in a single DFS, then to use that information as the basis for a constant-time ADT operation to determine whether any two vertices are connected. For digraphs, as we see in this chapter, this goal is elusive.

We have emphasized throughout this and the previous chapter that different ways of choosing unvisited vertices lead to different search dynamics for DFS. For digraphs, the structural complexity of the DFS trees leads to differences in search dynamics that are even more pronounced than those we saw for undirected graphs. For example, Figure 19.11 illustrates that we get marked differences for digraphs even when we simply vary the order in which the vertices are examined in the top-level search function. Only a tiny fraction of even these possibilities is shown in the figure—in principle, each of the V ! different orders of examining vertices might lead to different results. In Section 19.7, we shall examine an important algorithm that specifically takes advantage of this flexibility, processing the unvisited vertices at the top level (the roots of the DFS trees) in a particular order that immediately exposes the strong components.

Exercises

19.30 Draw the DFS forest that results from a standard adjacency-lists DFS of the digraph

3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.

19.31 Draw the DFS forest that results from a standard adjacency-matrix DFS of the digraph

3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.

19.32 Describe a family of digraphs with V vertices and E edges for which a standard adjacency-lists DFS requires time proportional to E for cycle detection.

19.33 Show that, during a DFS in a digraph, no edge connects a node to another node whose preorder and postorder numbers are both smaller.

19.34 Show all possible DFS forests for the digraph

0-1 0-2 0-3 1-3 2-3.

Tabulate the number of tree, back, cross, and down edges for each forest.

19.35 If we denote the number of tree, back, cross, and down edges by t, b, c, and d, respectively, then we have t + b + c + d = E and t < V for any DFS of any digraph with V vertices and E edges. What other relationships among these variables can you infer? Which of the values are dependent solely on graph properties, and which are dependent on dynamic properties of the DFS?

19.36 Prove that every source in a digraph must be a root of some tree in the forest corresponding to any DFS of that digraph.

19.37 Construct a connected DAG that is a subgraph of Figure 19.1 by removing five edges (see Figure 19.11).

19.38 Implement a digraph class that provides the capability for a client to check that a digraph is indeed a DAG, and provide a DFS-based implementation.

19.39 Use your solution to Exercise 19.38 to estimate (empirically) the probability that a random digraph with V vertices and E edges is a DAG for various types of digraphs (see Exercises 19.11–18).

19.40 Run empirical studies to determine the relative percentages of tree, back, cross, and down edges when we run DFS on various types of digraphs (see Exercises 19.11–18).

19.41 Describe how to construct a sequence of directed edges on V vertices for which there will be no cross or down edges and for which the number of back edges will be proportional to V2 in a standard adjacency-lists DFS.

19.42 Describe how to construct a sequence of directed edges on V vertices for which there will be no back or down edges and for which the number of cross edges will be proportional to V2 in a standard adjacency-lists DFS.

19.43 Describe how to construct a sequence of directed edges on V vertices for which there will be no back or cross edges and for which the number of down edges will be proportional to V2 in a standard adjacency-lists DFS.

19.44 Give rules corresponding to Trémaux traversal for a maze where all the passages are one-way.

19.45 Extend your solutions to Exercises 17.56 through 17.60 to include arrows on edges (see the figures in this chapter for examples).