# Algorithms Third Edition in C++ Part 5. Graph Algorithms (2006)

### CHAPTER SEVENTEEN

Graph Properties and Types

### 17.1 Glossary

A substantial amount of nomenclature is associated with graphs. Most of the terms have straightforward definitions, and, for reference, it is convenient to consider them in one place: here. We have already used some of these concepts when considering basic algorithms in Part 1; others of them will not become relevant until we address associated advanced algorithms in *Chapters 18* through *22*.

**Definition 17.1** *A* **graph** *is a set of* **vertices** *and a set of* **edges** *that connect pairs of distinct vertices (with at most one edge connecting any pair of vertices).*

We use the names 0 through V-1 for the vertices in a *V* -vertex graph. The main reason that we choose this system is that we can access quickly information corresponding to each vertex, using vector indexing. In *Section 17.6*, we consider a program that uses a symbol table to establish a 1–1 mapping to associate *V* arbitrary vertex names with the *V* integers between 0 and *V −* 1. With that program in hand, we can use indices as vertex names (for notational convenience) without loss of generality. We sometimes assume that the set of vertices is defined implicitly, by taking the set of edges to define the graph and considering only those vertices that are included in at least one edge. To avoid cumbersome usage such as “the ten-vertex graph with the following set of edges,” we do not explicitly mention the number of vertices when that number is clear from the context. By convention, we always denote the number of vertices in a given graph by *V*, and denote the number of edges by *E*.

We adopt as standard this definition of a graph (which we first encountered in Chapter 5), but note that it embodies two technical simplifications. First, it disallows duplicate edges (mathematicians sometimes refer to such edges as *parallel* edges, and a graph that can contain them as a*multigraph*). Second, it disallows edges that connect vertices to themselves; such edges are called *self-loops*. Graphs that have no parallel edges or self-loops are sometimes referred to as *simple graphs*.

We use simple graphs in our formal definitions because it is easier to express their basic properties and because parallel edges and self-loops are not needed in many applications. For example, we can bound the number of edges in a simple graph with a given number of vertices.

**Property 17.1** *Agraphwith V vertices has at most V* (*V −* 1)*/* 2 *edges.*

*Proof*: The total of *V*^{2} possible pairs of vertices includes *V* self-loops and accounts twice for each edge between distinct vertices, so the number of edges is at most (*V*^{2} *− V*)*/*2=*V* (*V −* 1)*/*2. •

No such bound holds if we allow parallel edges: a graph that is not simple might consist of two vertices and billions of edges connecting them (or even a single vertex and billions of self-loops).

For some applications, we might consider the elimination of parallel edges and self-loops to be a data-processing problem that our implementations must address. For other applications, ensuring that a given set of edges represents a simple graph may not be worth the trouble. Throughout the book, whenever it is more convenient to address an application or to develop an algorithm by using an extended definition that includes parallel edges or self-loops, we shall do so. For example, self-loops play a critical role in a classical algorithm that we will examine in *Section 17.4*; and parallel edges are common in the applications that we address in *Chapter 22*. Generally, it is clear from the context whether we intend the term “graph” to mean “simple graph” or “multigraph” or “multigraph with self-loops.”

Mathematicians use the words *vertex* and *node* interchangeably, but we generally use *vertex* when discussing graphs and *node* when discussing representations—for example, in C++ data structures. We normally assume that a vertex can have a name and can carry other associated information. Similarly, the words *arc, edge,* and *link* are all widely used by mathematicians to describe the abstraction embodying a connection between two vertices, but we consistently use *edge* when discussing graphs and *link* when discussing C++ data structures.

When there is an edge connecting two vertices, we say that the vertices are *adjacent to* one another and that the edge is *incident on* both vertices. The *degree* of a vertex is the number of edges incident on it. We use the notation v-w to represent an edge that connects v and w; the notation w-v is an alternative way to represent the same edge.

A *subgraph* is a subset of a graph’s edges (and associated vertices) that constitutes a graph. Many computational tasks involve identifying subgraphs of various types. If we identify a subset of a graph’s vertices, we call that subset, together with all edges that connect two of its members, the*induced subgraph* associated with those vertices.

We can draw a graph by marking points for the vertices and drawing lines connecting them for the edges. A drawing gives us intuition about the structure of the graph; but this intuition can be misleading, because the graph is defined independently of the representation. For example, the two drawings in *Figure 17.1* and the list of edges represent the same graph, because the graph is only its (unordered) set of vertices and its (unordered) set of edges (pairs of vertices)—nothing more. Although it suffices to consider a graph simply as a set of edges, we examine other representations that are particularly suitable as the basis for graph data structures in *Section 17.4*.

**Figure 17.1 Three different representations of the same graph**

A graph is defined by its vertices and its edges, not by the way that we choose to draw it. These two drawings depict the same graph, as does the list of edges (bottom), given the additional information that the graph has 13 vertices labeled 0 through 12.

Placing the vertices of a given graph on the plane and drawing them and the edges that connect them is known as *graph drawing*. The possible vertex placements, edge-drawing styles, and aesthetic constraints on the drawing are limitless. Graph-drawing algorithms that respect various natural constraints have been studied heavily and have many successful applications (*see reference section*). For example, one of the simplest constraints is to insist that edges do not intersect. A *planar graph* is one that can be drawn in the plane without any edges crossing. Determining whether or not a graph is planar is a fascinating algorithmic problem that we discuss briefly in *Section 17.8*. Being able to produce a helpful visual representation is a useful goal, and graph drawing is a fascinating field of study, but successful drawings are often difficult to realize. Many graphs that have huge numbers of vertices and edges are abstract objects for which no suitable drawing is feasible.

For some applications, such as graphs that represent maps or circuits, a graph drawing can carry considerable information because the vertices correspond to points in the plane and the distances between them are relevant. We refer to such graphs as *Euclidean graphs*. For many other applications, such as graphs that represent relationships or schedules, the graphs simply embody connectivity information, and no particular geometric placement of vertices is ever implied. We consider examples of algorithms that exploit the geometric information in Euclidean graphs in*Chapters 20* and *21*, but we primarily work with algorithms that make no use of any geometric information, and stress that graphs are generally independent of any particular representation in a drawing or in a computer.

Focusing solely on the connections themselves, we might wish to view the vertex labels as merely a notational convenience, and to regard two graphs as being the same if they differ in only the vertex labels. Two graphs are *isomorphic* if we can change the vertex labels on one to make its set of edges identical to the other. Determining whether or not two graphs are isomorphic is a difficult computational problem (see *Figure 17.2* and *Exercise 17.5*). It is challenging because there are *V* ! possible ways to label the vertices—far too many for us to try all the possibilities. Therefore, despite the potential appeal of reducing the number of different graph structures that we have to consider by treating isomorphic graphs as identical structures, we rarely do so.

As we saw with trees in Chapter 5, we are often interested in basic structural properties that we can deduce by considering specific sequences of edges in a graph.

**Definition 17.2** *A* **path** *in a graph is a sequence of vertices in which each successive vertex (after the first) is adjacent to its predecessor in the path. In a* **simple path** *, the vertices and edges are distinct. A* **cycle** *is a path that is simple except that the first and final vertices are the same.*

**Figure 17.2 Graph isomorphism examples**

The top two graphs are isomorphic because we can relabel the vertices to make the two sets of edges identical (to make the middle graph the same as the top graph, change 10 to 4, 7 to 3, 2 to 5, 3 to 1, 12 to 0, 5 to 2, 9 to 11, 0 to 12, 11 to 9, 1 to 7, and 4 to 10). The bottom graph is not isomorphic to the others because there is no way to relabel the vertices to make its set of edges identical to either.

We sometimes use the term *cyclic path* to refer to a path whose first and final vertices are the same (and is otherwise not necessarily simple); and we use the term *tour* to refer to a cyclic path that includes every vertex. An equivalent way to define a path is as the sequence of *edges* that connect the successive vertices. We emphasize this in our notation by connecting vertex names in a path in the same way as we connect them in an edge. For example, the simple paths in *Figure 17.1* include 3-4-6-0-2, and 9-12-11, and the cycles in the graph include 0-6-4-3-5-0 and 5-4-3-5. We define the *length* of a path or a cycle to be its number of edges.

**Figure 17.3 Graph terminology**

This graph has 55 vertices, 70 edges, and 3 connected components. One of the connected components is a tree (right). The graph has many cycles, one of which is highlighted in the large connected component (left). The diagram also depicts a spanning tree in the small connected component (center). The graph as a whole does not have a spanning tree, because it is not connected.

We adopt the convention that each single vertex is a path of length 0 (a path from the vertex to itself with no edges on it, which is different from a self-loop). Apart from this convention, in a graph with no parallel edges and no self-loops, a pair of vertices uniquely determines an edge, paths must have on them at least two distinct vertices, and cycles must have on them at least three distinct edges and three distinct vertices.

We say that two simple paths are *disjoint* if they have no vertices in common other than, possibly, their endpoints. Placing this condition is slightly weaker than insisting that the paths have no vertices at all in common, and is useful because we can combine simple disjoint paths from *s* to *t* and*t* to *u* to get a simple disjoint path from *s* to *u* if *s* and *u* are different (and to get a cycle if *s* and *u* are the same). The term *vertex disjoint* is sometimes used to distinguish this condition from the stronger condition of *edge disjoint*, where we require that the paths have no *edge* in common.

**Definition 17.3** *A graph is a* **connected graph** *if there is a path from every vertex to every other vertex in the graph. A graph that is not connected consists of a set of* **connected components** *, which are maximal connected subgraphs.*

The term *maximal connected subgraph* means that there is no path from a subgraph vertex to any vertex in the graph that is not in the subgraph. Intuitively, if the vertices were physical objects, such as knots or beads, and the edges were physical connections, such as strings or wires, a connected graph would stay in one piece if picked up by any vertex, and a graph that is not connected comprises two or more such pieces.

**Definition 17.4** *An acyclic connected graph is called a* **tree** *(see Chapter 4). A set of trees is called a* **forest** *. A* **spanning tree** *of a connected graph is a subgraph that contains all of that graph’s vertices and is a single tree. A* **spanning forest** *of a graph is a subgraph that contains all of that graph’s vertices and is a forest.*

For example, the graph illustrated in *Figure 17.1* has three connected components, and is spanned by the forest 7-8 9-10 9-11 9-12 0-1 0-2 0-5 5-3 5-4 4-6 (there are many other spanning forests). *Figure 17.3* highlights these and other features in a larger graph.

We explore further details about trees in Chapter 4, and look at various equivalent definitions. For example, a graph *G* with *V* vertices is a tree if and only if it satisfies *any* of the following four conditions:

• *G* has *V* − 1 edges and no cycles.

• *G* has *V* − 1 edges and is connected.

• Exactly one simple path connects each pair of vertices in G.

• G is connected, but removing any edge disconnects it.

Any one of these conditions is necessary and sufficient to prove the other three, and we can develop other combinations of facts about trees from them (see *Exercise 17.1*). Formally, we should choose one condition to serve as a definition; informally, we let them collectively serve as the definition, and freely engage in usage such as the “acyclic connected graph” choice in Definition 17.4.

Graphs with all edges present are called *complete graphs* (see *Figure 17.4*). We define the *complement* of a graph *G* by starting with a complete graph that has the same set of vertices as the original graph and then removing the edges of *G*. The *union* of two graphs is the graph induced by the union of their sets of edges. The union of a graph and its complement is a complete graph. All graphs that have *V* vertices are subgraphs of the complete graph that has *V* vertices. The total number of different graphs that have *V* vertices is 2^{V}^{ (V −1)/2} (the number of different ways to choose a subset from the *V* (*V* - 1)*/*2 possible edges). A complete subgraph is called a *clique*.

**Figure 17.4 Complete graphs**

These complete graphs, with every vertex connected to every other vertex, have 10, 15, 21, 28, and 36 edges (bottom to top). Every graph with between 5 and 9 vertices (there are more than 68 billion such graphs) is a subgraph of one of these graphs.

Most graphs that we encounter in practice have relatively few of the possible edges present. To quantify this concept, we define the *density* of a graph to be the average vertex degree, or 2*E/V*. A *dense graph* is a graph whose average vertex degree is proportional to *V*; a *sparse graph* is a graph whose complement is dense. In other words, we consider a graph to be dense if *E* is proportional to *V*^{2} and sparse otherwise. This asymptotic definition is not necessarily meaningful for a particular graph, but the distinction is generally clear: A graph that has millions of vertices and tens of millions of edges is certainly sparse, and a graph that has thousands of vertices and millions of edges is certainly dense. We might contemplate processing a sparse graph with billions of vertices, but a dense graph with billions of vertices would have an overwhelming number of edges.

Knowing whether a graph is sparse or dense is generally a key factor in selecting an efficient algorithm to process the graph. For example, for a given problem, we might develop one algorithm that takes about *V*^{2} steps and another that takes about *E* lg *E* steps. These formulas tell us that the second algorithm would be better for sparse graphs, whereas the first would be preferred for dense graphs. For example, a dense graph with millions of edges might have only thousands of vertices: in this case *V*^{2} and *E* would be comparable in value, and the *V*^{2} algorithm would be 20 times faster than the *E* lg *E* algorithm. On the other hand, a sparse graph with millions of edges also has millions of vertices, so the *E* lg *E* algorithm could be *millions* of times faster than the *V*^{2} algorithm. We could make specific tradeoffs on the basis of analyzing these formulas in more detail, but it generally suffices in practice to use the terms *sparse* and *dense* informally to help us understand fundamental performance characteristics.

When analyzing graph algorithms, we assume that *V/E* is bounded above by a small constant, so that we can abbreviate expressions such as *V* (*V* + *E* ) to *V E*. This assumption comes into play only when the number of edges is tiny in comparison to the number of vertices—a rare situation. Typically, the number of edges far exceeds the number of vertices (*V/E* is much less than 1).

A *bipartite graph* is a graph whose vertices we can divide into two sets such that all edges connect a vertex in one set with a vertex in the other set. *Figure 17.5* gives an example of a bipartite graph. Bipartite graphs arise in a natural way in many situations, such as the matching problems described at the beginning of this chapter. Any subgraph of a bipartite graph is bipartite.

**Figure 17.5 A bipartite graph**

All edges in this graph connect odd-numbered vertices with even-numbered ones, so it is bipartite. The bottom diagram makes the property obvious.

Graphs as defined to this point are called *undirected graphs*. In *directed graphs*, also known as *digraphs*, edges are one-way: we consider the pair of vertices that defines each edge to be an *ordered* pair that specifies a one-way adjacency where we think about having the ability to get from the first vertex to the second but not from the second vertex to the first. Many applications (for example, graphs that represent the Web, scheduling constraints, or telephone-call transactions) are naturally expressed in terms of digraphs.

We refer to edges in digraphs as *directed edges*, though that distinction is generally obvious in context (some authors reserve the term *arc* for directed edges). The first vertex in a directed edge is called the *source*; the second vertex is called the *destination*. (Some authors use the terms *tail* and*head*, respectively, to distinguish the vertices in directed edges, but we avoid this usage because of overlap with our use of the same terms in data-structure implementations.) We draw directed edges as arrows pointing from source to destination, and often say that the edge *points to* the destination. When we use the notation v-w in a digraph, we mean it to represent an edge that points from v to w; it is different from w-v, which represents an edge that points from w to v. We speak of the *indegree* and *outdegree* of a vertex (the number of edges where it is the destination and the number of edges where it is the source, respectively).

Sometimes, we are justified in thinking of an undirected graph as a digraph that has two directed edges (one in each direction); other times, it is useful to think of undirected graphs simply in terms of connections. Normally, as discussed in detail in *Section 17.4*, we use the same representation for directed and undirected graphs (see *Figure 17.6*). That is, we generally maintain two representations of each edge for undirected graphs, one pointing in each direction, so that we can immediately answer questions such as, “Which vertices are connected to vertex *v*?”

**Figure 17.6 Two digraphs**

The drawing at the top is a representation of the example graph in *Figure 17.1* interpreted as a directed graph, where we take the edges to be ordered pairs and represent them by drawing an arrow from the first vertex to the second. It is also a DAG. The drawing at the bottom is a representation of the undirected graph from *Figure 17.1* that indicates the way that we usually represent undirected graphs: as digraphs with two edges corresponding to each connection (one in each direction).

*Chapter 19* is devoted to exploring the structural properties of digraphs; they are generally more complicated than the corresponding properties for undirected graphs. A *directed cycle* in a digraph is a cycle in which all adjacent vertex pairs appear in the order indicated by (directed) graph edges. A *directed acyclic graph (DAG)* is a digraph that has no directed cycles. A DAG (an acyclic digraph) is not the same as a tree (an acyclic undirected graph). Occasionally, we refer to the *underlying undirected graph* of a digraph, meaning the undirected graph defined by the same set of edges, but where these edges are not interpreted as directed.

Chapters 20 through 22 are generally concerned with algorithms for solving various computational problems associated with graphs in which other information is associated with the vertices and edges. In *weighted graphs*, we associate numbers (*weights*) with each edge, which generally represents a distance or cost. We also might associate a weight with each vertex, or multiple weights with each vertex and edge. In *Chapter 20* we work with weighted undirected graphs; in *Chapters 21* and *22* we study weighted digraphs, which we also refer to as *networks*. The algorithms in*Chapter 22* solve classic problems that arise from a particular interpretation of networks known as *flow networks*.

As was evident even in Chapter 1, the combinatorial structure of graphs is extensive. This extent of this structure is all the more remarkable because it springs forth from a simple mathematical abstraction. This underlying simplicity will be reflected in much of the code that we develop for basic graph processing. However, this simplicity sometimes masks complicated dynamic properties that require deep understanding of the combinatorial properties of graphs themselves. It is often far more difficult to convince ourselves that a graph algorithm works as intended than the compact nature of the code might suggest.

**Exercises**

**17.1** Prove that any acyclic connected graph that has V vertices has V - 1 edges.

• **17.2** Give all the connected subgraphs of the graph

0-1 0-2 0-3 1-3 2-3.

• **17.3** Write down a list of the nonisomorphic cycles of the graph in *Figure 17.1*. For example, if your list contains 3-4-5-3, it should not contain 3-5-4-3, 4-5-3-4, 4-3-5-4, 5-3-4-5, or 5-4-3-5.

**17.4** Consider the graph

3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.

Determine the number of connected components, give a spanning forest, list all the simple paths with at least three vertices, and list all the nonisomorphic cycles (see *Exercise 17.3*).

• **17.5** Consider the graphs defined by the following four sets of edges:

0-1 0-2 0-3 1-3 1-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8

0-1 0-2 0-3 0-3 1-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8

0-1 1-2 1-3 0-3 0-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8

4-1 7-9 6-2 7-3 5-0 0-2 0-8 1-6 3-9 6-3 2-8 1-5 9-8 4-5 4-7.

Which of these graphs are isomorphic to one another? Which of them are planar?

**17.6** Consider the more than 68 billion graphs referred to in the caption to *Figure 17.4*. What percentage of them has fewer than nine vertices?

• **17.7** How many different subgraphs are there in a given graph with *V* vertices and *E* edges?

• **17.8** Give tight upper and lower bounds on the number of connected components in graphs that have *V* vertices and *E* edges.

• **17.9** How many different undirected graphs are there that have *V* vertices and *E* edges?

••• **17.10** If we consider two graphs to be different only if they are not isomorphic, how many different graphs are there that have *V* vertices and *E* edges?

**17.11** How many *V* -vertex graphs are bipartite?