Programming Problems in Java: A Primer for The Technical Interview (2014)
Chapter 8. Graphical Search
We have already considered search over a binary search tree, now we consider search over general graphs. The most common graphical search algorithms are breadth first search and depth first search. In the following sections, we will look at basic implementation of each of these search algorithms. Afterward we will solve programming problems with these search techniques.
8.1 Definition of a graph
Before we begin, we formally define a graph. A graph is a collection of vertices with edge relations. Two vertices are adjacent if they share an edge between them. Adjacent vertices are called neighbors. In an undirected graph, edges are symmetric relations. That is if a and b are vertices and (a,b) is an edge, then (b,a) is also an edge. A graph is directed if edges are not symmetric. That is if (a,b) is an edge, (b,a) is not necessarily an edge.
We represent graphs with nodes and adjacency lists. Each vertex in a graph will be represented by the followingNode class.
Listing 8.1: Graph Node
class Node {
int value;
List<Node> neighbors;
Node(int value, List<Node> neighbors) {
this.value = value;
this.neighbors = neighbors;
}
}
Each node has a value stored. The edges originating at a node are stored in theListmemberneighbors. As such,neighborscontains the list of all vertices adjacent to theNode.
A graph is connected if for any pair of vertices a and z in the graph, there is a list of vertices starting with a and ending with z in which every adjacent pair in the list are neighbors. Such a list is called a path. A simple path is one that does not intersect itself. A cycle in a graph is a list of nodes starting and ending with the same node in which every adjacent pair in the list are neighbors. Trees are connected graphs that do not contain cycles. This allows tree traversal algorithms to not need to worry about infinite loops. However in graph traversals special care must be taken to ensure we do not double back and be caught in an infinite loop.
8.2 Breadth first search
With breadth first search, abbreviated bfs, we study our first graph search algorithm. bfs starts from an origin node and visits all reachable nodes in a graph in order of their distance from the origin. The number of edges from the origin measures distance. bfs is characterized by the use of a queue to maintain the order in which nodes are visited. The queue contains the known frontier of the search, which are all nodes that are known to exist but have not yet been visited.
As mentioned above, graphs can have cycles between the nodes. To avoid doubling back while traversing the graph, bfs must keep track of all visited nodes and not add nodes multiple times to the queue. To do this, we use aSetto track our history.
As with tree traversal, bfs is usually required to do more than simply pop every node from the queue. Often bfs is implemented to take a generalvisitfunction that is called each time a node is examined. We achieve this by implementing the followingVisitor interface.
Listing 8.2: Visitor Interface
interface Visitor {
public void visit(Node node);
}
Using theNodeclass defined previously, we can list an implementation of bfs with a parameter called visit.
Listing 8.3: Breadth First Search
void bfs(Node origin, Visitor visit) {
Set<Node> queued = new HashSet<Node>();
Queue<Node> queue = new LinkedList<Node>();
queue.add(origin);
while (!queue.isEmpty()) {
Node current = queue.poll();
visit.visit(current);
for (Node neighbor : current.neighbors) {
if (!queued.contains(neighbor)) {
queued.add(current);
queue.add(neighbor);
}
}
}
}
As we will see, it is the visit function that makesbfspowerful.
8.2.1 Find the distance between nodes in a graph
Since bfs visits each node in order of its distance from the origin, it can be used to solve the problem of finding the minimal distance from an origin to all other nodes in a graph. In doing so we implement a special case of Dijkstra’s algorithm for the case where all edges are of equal weight.
The idea is to update the known distance of a node when we first add it to the queue. Doing so will require keeping track of the state of node exploration. Theget_distancemethod sets distance of unvisited nodes to infinity. When a node is first encountered, all the children are given distance equal to the distance of the parent plus one. Since bfs visits each node in order of its distance, a parent’s distance will always be set before a child is encountered. Translating this into code, we have the following listing.
Listing 8.4: Shortest Path Distances
class DistanceVisitor implements Visitor {
private Map<Node, Integer> distances;
DistanceVisitor(Map<Node, Integer> distances) {
this.distances = distances;
}
public void visit(Node node) {
if (distances.isEmpty()) {
distances.put(node, 0);
}
for (Node neighbor : node.neighbors) {
int vertexDistance = distance(distances,
node);
int neighborDistance =
distance(distances, neighbor);
distances.put(neighbor,
Math.min(neighborDistance,
1 + vertexDistance));
}
}
private int distance(
Map<Node, Integer> distances,
Node vertex) {
Integer distance = distances.get(vertex);
if (null == distance) {
return Integer.MAX_VALUE;
}
return distance;
}
}
void find_distances(
Node origin,
Map<Node, Integer> distances) {
Visitor visit = new DistanceVisitor(distances);
bfs(origin, visit);
}
At completion of the algorithm, the parameterdistancesmaps nodes to their shortest path distances to the origin.
Tracking and modifying bfs to provide state information when visiting a node is a powerful technique that has many other uses.
8.2.2 Ancestry from genealogical data
This interview question asks a candidate to determine if two people are related. We can treat each person as a node. Every node has at most two parents. The question is framed as having incomplete genealogical data, that is, the full ancestry of each individual may not be known. This gives rise to graphs that may not be connected.
To solve this problem, we first represent the domain as a directed graph. Each person is a node, with the key being an identifier for a person. Every node is adjacent to at most two other nodes representing the parents. Edges are orientated from a child to parent. We use aMapfrom child id to parent ids to represent the graph.
To determine whether or not two individuals are related, we first use bfs to calculate the full ancestry of each individual. Since we are dealing with a graph represented in aMap, bfs will need to be modified to understand this representation. The visit function simply adds parents to an instance. In the listing below, we inline the visit function instead of passing it as a parameter.
Listing 8.5: Ancestry from Genealogical Data
void find_ancestry(
Map<Integer, List<Integer>> data,
int id,
Set<Integer> ancestry) {
Queue<Integer> queue = new LinkedList<Integer>();
queue.add(id);
ancestry.add(id);
while (!queue.isEmpty()) {
List<Integer> parents = data.get(queue.poll());
if (null == parents) {
continue;
}
for (int parent : parents) {
if (!ancestry.contains(parent)) {
ancestry.add(parent);
queue.add(parent);
}
}
}
}
If the intersection of these sets is non-empty, then we are assured that the individuals are related. Putting this all together, the solution to the problem is given below.
Listing 8.6: Decide if Two People are Related
boolean related(Map<Integer, List<Integer>> data,
int id1,
int id2) {
Set<Integer> ancestry1 = new HashSet<Integer>();
find_ancestry(data, id1, ancestry1);
Set<Integer> ancestry2 = new HashSet<Integer>();
find_ancestry(data, id2, ancestry2);
ancestry1.retainAll(ancestry2);
return !ancestry1.isEmpty();
}
The listing above is not efficient, but it does solve the problem. Speaking to efficiency, it requires a full bfs traversal of the ancestry of one individual before beginning the second. Since the number of parents grows exponentially in a full family tree, the set of ancestors has the possibility of being extremely large. There are many ways to overcome this problem, such as a level traversal of the ancestry graph.
8.3 Depth first search
Depth first search is a graph traversal algorithm characterized by its use of a stack to maintain the frontier of traversal. We often abbreviate it as dfs. There are many important differences with bfs. Depth first search attempts to discover the next longest path, while breadth first search iterates through a frontier of equidistant nodes. Further, for each vertex visited in depth first search the stack maintains a path from that node to the root of the search. In dfs, the order of traversal history is lost when the next node is visited.
As with bfs, an implementation of dfs requires maintaining a list of vertices that have been visited. There are implicitly two such lists. The nodes on the stack will have their children visited, and the nodes already visited will never again be elements of the stack.
Listing 8.7: Depth First Search
void dfs(Node origin, Visitor visit) {
Set<Node> queued = new HashSet<Node>();
Stack<Node> stack = new Stack<Node>();
stack.push(origin);
queued.add(origin);
while (!stack.isEmpty()) {
Node current = stack.pop();
for (Node neighbor : current.neighbors) {
if (!queued.contains(neighbor)) {
queued.add(neighbor);
stack.push(neighbor);
}
}
visit.visit(current);
}
}
The listing could be simplified with recursion. Instead of maintaining the iterators in the stack, we could call dfs with the currentNodeas the origin. However we provide an iterative stack implementation to show the similarity with bfs. Notice that aside from using a stack instead of queue the code is similar. However, surprisingly different behavior results from such a small change from bfs.
8.3.1 Determine execution order of a task
Given a graphical representation of sub-processes of a large task, where each node in the graph represents a sub-process. The children of a node depend on the completion of their parent. The problem is to determine an execution order for all the processes in which no process is started before its parents have completed. For this problem the directed graph is special in that it has no cycles, and is called a directed acyclic graph. If it had cycles, then no sequential execution order would exist.
This problem of determining sequential execution order can be solved by dfs. Suppose the origin is a node that has no incoming edges. If we begin dfs from this node, because there are no cycles we will eventually reach a node that has no children. This node represents a process that is dependent upon the entire contents of the stack during dfs. We save this to the back of a list holding the reverse of execution order.
We continue in this fashion until every node is visited and added to the list. If the graph has not been fully traversed but dfs has an empty stack, we choose the next node from the graph with no incoming edges and iterate.
When this process completes, the list contains the reverse of the execution order. Reversing the list, we then have execution order.
The following listing implements the determination of execution order of the graph starting withorigin. In this listing we assume there is exactly one node with no incoming edges, and that node is provided as the parameter. Since our implementation of dfs takes aVisitorparameter, the code is simplified by using an implementation ofVisitor.
Listing 8.8: Dependency graph
class TopologicalSortVisitor implements Visitor {
private List<Node> list;
TopologicalSortVisitor(List<Node> list) {
this.list = list;
}
public void visit(Node node) {
list.add(node);
}
}
void topological_sort(Node origin,
List<Node> list) {
Visitor visit = new TopologicalSortVisitor(list);
dfs(origin, visit);
Collections.reverse(list);
}
It is worth noting that this problem is a specialization of the general problem of topologically sorting nodes in a directed graph, and has many other applications such as classifying nodes in connected components.
8.3.2 Find cycles in a graph
Suppose we are given a graph of nodes represented as an adjacency list, the question is to determine if there exist any cycles in the graph. Depth first search visits nodes by following edges. An edge that is traversed is called a tree edge. When dfs attempts to visit an element that is already on the stack, that edge is called a back-edge. Such an edge provides positive identification of a cycle, as it loops back to a path already traversed. So the problem of determining if a graph contains a cycle is as simple as finding a single back-edge in a dfs.
Finding a cycle is simplified ifVisitoris provided the stack when called. However, this is not necessary. Consider the first node provided toVisitor. If this node has any children, then a cycle exists, else that child would have been added to the stack and visited first. Likewise, if any node is visited in which some child has not been previously visited then a cycle exists. With this intuition, we can quickly develop a solution.
Listing 8.9: Detect Cycles
class DetectCycleVisitor implements Visitor {
boolean cycleDetected;
private Set<Node> visited;
DetectCycleVisitor(Boolean cycle_detected,
Set<Node> visited) {
this.cycleDetected = cycle_detected;
this.visited = visited;
}
public void visit(Node node) {
visited.add(node);
for (Node neighbor : node.neighbors) {
if (visited.contains(neighbor)) {
cycleDetected = true;
}
}
}
}
boolean detect_cycle(Node origin) {
Set<Node> visited = new HashSet<Node>();
Visitor visit = new DetectCycleVisitor(
false,
visited);
dfs(origin, visit);
return ((DetectCycleVisitor)visit).cycleDetected;
}
When a cycle is detected by the presence of a back-edge, memoizing the tree edges and reconstructing the cycle from the paths from the origin to the first node in which a cycle was found could determine the nodes of the cycle.
Reconstructing a cycle is not much more involved. When the first cycle is detected, theVisitorwould begin looking for tree edges that connect to either the head or tail of a partially constructed cycle. When an edge matches the head, it should be pushed to the front. When an edge originates at the tail, it is pushed back. When the head and tail are equal, a cycle has been found.
8.3.3 Find all the words in a boggle game
Let’s consider a boggle board, defined as a two dimensional array of characters.
Listing 8.10: Definition of a Boggle Board
class GameBoard {
char[][] board;
GameBoard(char[][] board) {
this.board = board;
}
int getNumRows() {
return board.length;
}
int getNumCols() {
return board[0].length;
}
}
Each position is defined by a set of coordinates on the board. In the following list forPosition, we also override thehashCodeandequalsmethods so that we can comparePositionobjects by their coordinates later on.
Listing 8.11: Definition of a Position on the Board
class Position {
int x;
int y;
Position(int x, int y) {
this.x = x;
this.y = y;
}
public int hashCode() {
return x + y;
}
public boolean equals(Object obj) {
if (!(obj instanceof Position)) {
return false;
}
if (obj == this) {
return true;
}
Position other = (Position)obj;
return x == other.x && y == other.y;
}
}
The n × n game board contains a single letter in each square. The question is to find all words that are simple paths in the graph represented by the matrix. In this graph, elements of the matrix are neighbors if they are above, below, left, right or diagonal. The problem is complicated by the fact that words can start from any node in the graph and that all paths need to be checked.
To solve this problem, we require anis_wordfunction that returns true if a word has been found. We output each word found during the traversal that satisfies this function.
With this method available, let’s begin by translating a matrix Mgiven as a two dimensional array of characters into a directed graph. We begin by modifyingdfsto the matrix to look for all neighbors. Neighbors are in any adjacent entry. Since we want all paths, we do not use thequeuedvariable. But we do not want to backtrack on the stack, so instead of verifying that we have pushed aNodeonto the stack, we only verify that theNodeis not currently on the stack.
The following listing enumerates all words that can begin at a node in the graph.
Listing 8.12: Word Search in a Matrix from a Single Point
void boggle_dfs(GameBoard board,
Set<String> words,
Set<Position> visited,
Position pos,
StringBuilder word) {
for (int row_offset = -1;
row_offset <= 1;
row_offset++) {
for (int col_offset = -1;
col_offset <= 1;
col_offset++) {
Position next_pos = new Position(
pos.x + row_offset,
pos.y + col_offset);
if (next_pos.x < board.getNumRows()
&& next_pos.x >= 0
&& next_pos.y < board.getNumCols()
&& next_pos.y >= 0
&& !visited.contains(next_pos)) {
visited.add(next_pos);
word.append(
board.board[next_pos.x][next_pos.y]);
if (is_word(word.toString())) {
words.add(word.toString());
}
boggle_dfs(board,
words,
visited,
next_pos,
word);
visited.remove(next_pos);
word.setLength(word.length() - 1);
}
}
}
}
void boggle(GameBoard board,
Position pos,
Set<String> words) {
Set<Position> visited = new HashSet<Position>();
StringBuilder word = new StringBuilder();
boggle_dfs(board, words, visited, pos, word);
}
This problem is unlike our previous examples where we traversed from a single starting point. To this end we must attempt multiple traversals, one for each node in the graph. So using the previous listing, the following function provides a complete solution to the problem.
Listing 8.13: Word Search in a Matrix
void boggle(GameBoard board, Set<String> words) {
for (int row = 0;
row < board.getNumRows();
row++) {
for (int col = 0;
col < board.getNumCols();
col++) {
boggle(board,
new Position(row, col),
words);
}
}
}
8.4 A⋆ Search
We conclude graphical search with a brief example of A⋆ search. A⋆ search is a graphical search which uses a priority queue to order the frontier of exploration. The value of a node in the priority queue can be given by a heuristic measurement of the distance to a goal. In this way it differs from both bfs and dfs in that the order in which the nodes are to be visited are not fixed by the order in which they are discovered.
Suppose we are given maze represented by a square n × n matrix defined as aGameBoard. We are given a start position and an end position. The question is to write a function that produces a path through the maze.
To begin we refer to each position in the maze as coordinates, and apassablefunction determines whether or not a given node is passable. Only positions above, below, left, and right are neighbors. Neighbors that are passable are adjacent to a position in the board.
Now to solve the problem efficiently with A⋆ search, we would like to visit each node in order of its distance from the exit of the maze. The hope is that positions closer to the exit will have shorter paths to the exit.
However, if we knew the actual distance we could solve the problem by traversing from the exit to the start. Instead we use the heuristic that we want to visit the next node that is closest to the exit of the maze where distance is measured as if there is always a path directly from the node to the exit. So to apply A⋆ search, we will define a function which calculates the shortest possible distance between a position in the maze and the exit. While many metrics are available, we use Manhattan distance as our maze distance.
Listing 8.14: Admissible Maze Distance Heuristic
int distance(Position begin, Position end) {
return Math.abs(begin.x - end.x)
+ Math.abs(begin.y - end.y);
}
Let us also define the following object to represent a position in the maze and the distance between it and the exit.
Listing 8.15: Definition of DistancePosition
class DistancePosition {
int distance;
Position position;
DistancePosition(int distance,
Position position) {
this.distance = distance;
this.position = position;
}
}
In solving the maze, we being at the start and continue as withdfs by iteratively constructing a path. However we use a priority queue instead of stack, where the distance heuristic gives the priority of a node. Nodes are considered in lowest priority value first. When the path reaches the exit, the path through the maze is reconstructed and the function terminates. If no solution exists, we return an empty path.
Listing 8.16: Search Through a Maze
void maze(GameBoard maze,
Position start,
Position exit,
List<Position> route) {
List<Position> offsets = Arrays.asList(
new Position(-1, 0), new Position(1, 0),
new Position(0, -1), new Position(0, 1));
Map<Position, Position> parent =
new HashMap<Position, Position>();
PriorityQueue<DistancePosition> priority_queue =
new PriorityQueue<DistancePosition>(
maze.getNumRows() ⋆ maze.getNumCols(),
new Comparator<DistancePosition>() {
public int compare(DistancePosition a,
DistancePosition b) {
return a.distance - b.distance;
}
});
priority_queue.add(new DistancePosition(
distance(start, exit), start));
parent.put(start, start);
while (!priority_queue.isEmpty()) {
Position current =
priority_queue.peek().position;
if (exit.equals(current)) {
do {
route.add(0, current);
current = parent.get(current);
} while (current != start);
route.add(0, start);
return;
}
priority_queue.poll();
for (Position offset : offsets) {
Position neighbor = new Position(
current.x + offset.x,
current.y + offset.y);
if (neighbor.x < maze.getNumRows()
&& neighbor.x >= 0
&& neighbor.y < maze.getNumCols()
&& neighbor.y >= 0
&& !parent.containsKey(neighbor)
&& passable(maze, neighbor)) {
parent.put(neighbor, current);
priority_queue.add(new DistancePosition(
distance(neighbor, exit), neighbor));
}
}
}
}
Notice that as with the boggle question, we use a static set of offsets to choose the next position in the maze. However since diagonal positions are not adjacent we have a restricted set of offsets. Theparentvariable serves two purposes; it is used to determine if a position has already been queued and it is used in reconstruction of the path. Finally, notice the usage of thePriorityQueueclass where we pass to the constructor a customComparatorthat orders the elements by distance. While this question had a longer solution than most, the basic framework is similar to that of bfsand dfs.