Programming Problems in Java: A Primer for The Technical Interview (2014)
Chapter 7. Binary Search
Searching and sorting are the fundamental operations of computer programming, and in the next few chapters we examine each in some detail. There are three search techniques that every programmer should memorize. These are binary search, breadth first search, and depth first search. Binary search can be used for search in any sorted, random access data structure. Breadth first search and depth first search are graph search algorithms useful for finding minimum distance between nodes and detecting cycles. Before sorting, we will look at basic implementations and programming problems related to each of these search algorithms.
We begin with binary search.
7.1 Binary search
Binary search is fundamental, and it should be known by rote as it is often asked as a gating or screening question. It is one of the first algorithms we learn, and influences many data structures we use in programming. It is also one for the first applications of branch and bound, in that the update rule divides the search space in half.
The following is an implementation of binary search for searching an integer array. The result of the search is the location of the element with the specified key.
Listing 7.1: Binary Search
int find(int[] array, int key) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
if (key == array[mid]) {
return mid;
} else if (key < array[mid]) {
upper = mid - 1;
} else {
lower = mid + 1;
}
}
return lower;
}
A few details of this implementation require further discussion. First, the function above returns either the position of the entry or the point at which the algorithm was searching when it determined the key is not present. We will see this definition is useful in the sequel.
Second, if we did not find an entry, we know that key≠array[mid], and so we do not need to consider mid. So we can set lower and upper one past mid in the appropriate direction. Concretely, if the value we are looking for is less than mid, we can set upper to be mid- 1, as that is the lowest index that we have not evaluated.
Last, the update rule used to find the midpoint should always be used in programming interviews. It is the only means to write the expression to calculate the midpoint of a without the danger of an operation overflow. The more common way to write that line is mid = (upper + lower)∕2. However, it is possible for upper + lower to overflow.
Binary search provides order log n lookup, and is optimal for comparison search. However while the screening interview may ask for a quick implementation, its specific application to interview problems is sometimes subtle.
7.2 Find the first occurrence in a sorted array
An interview question that can be solved with slight modification to binary search is the problem of finding the leftmost occurrence of a target value n in a sorted array. Modifying both the update rules and the exit criteria of the previous implementation of binary search can solve this question.
While previous implementation of the binary search algorithm provides an incomplete answer, it does provide information on the location of the target. We should immediately see that once we find any index that matches the target value, we could scan linearly until we find the leftmost element with the target value. While this provides a correct solution, the run time of the algorithm is linear.
Note that when an index with the target value is found, we can update upper to be that index that we discovered. Doing so would always keep the target solution in range of our search. So we change the line upper = mid - 1 to upper = mid. The update rule for lower need not change. However, we see that if the lower bound is ever equal to the target after an update it must be pointing to the leftmost entry. Our final solution then is a modified binary search.
Listing 7.2: Binary Search to Find Leftmost Target
int find_leftmost(int[] array, int key) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper && array[lower] != key) {
int mid = lower + (upper - lower) / 2;
if (key <= array[mid]) {
upper = mid;
} else {
lower = mid + 1;
}
}
return lower;
}
A similar modification of the update rules can be made to locate the rightmost index with the target value.
7.3 Search in a rotated array
A more complicated question asks the candidate to search for a target value in a sorted array that has been rotated. A rotated array is a sorted array of length n in which every index i has been mapped to index i + j for some integer value j. Concretely, the array 12345 rotated by 2 is the array 45123.
For simplicity, we assume that there do not exist duplicate values in each given array.
There are two common solutions to this interview question; one requires a complex modification to the update rule. The other requires a two-step search algorithm.
Let’s again see what modifications to binary search are required. Given the example above, we immediate see that the problem with our candidate solution is only the update rules. Since we cannot guarantee that array is increasing from lower to mid and from mid to upper, we are not updating properly. What is needed is to consider the location of the pivot. By enumerating cases, we can see that the pivot is either in between lower and mid, mid and upper or no longer being considered in the sub array that is being searched. Since the array does not contain duplicates, we can determine whether or not the pivot is in the range lower to mid by testing if lower < mid and if the key is bounded by lower and mid. This leads to our first solution.
Listing 7.3: Binary Search in Rotated Array
int find_rotated(int[] array, int key) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
if (key == array[mid]) {
upper = mid;
} else if (array[lower] <= array[mid]) {
if (array[lower] <= key
&& key < array[mid]) {
upper = mid;
} else {
lower = mid + 1;
}
} else if (array[mid] <= array[upper]) {
if (array[mid] < key
&& key <= array[upper]) {
lower = mid + 1;
} else {
upper = mid;
}
}
}
return lower;
}
Let’s consider a second solution by addressing the problem of an unknown pivot another way. If we knew the location of the pivot j, then instead of special casing the ranges of lower to mid and mid to upper, we can instead modify the binary search update rule to use modulo arithmetic. Concretely, if n is the length of the array, we could set lower = j, upper = j + n - 1 = j - 1, and mid = lower + (upper -lower)∕2. Our previous binary search is then correct. This is implemented below.
Listing 7.4: Binary Search with Pivot Offset
int find_with_pivot_offset(int[] array,
int pivot,
int key) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
int mid_val = (mid + pivot) % array.length;
if (key <= array[mid_val]) {
upper = mid;
} else {
lower = mid + 1;
}
}
return (lower + pivot) % array.length;
}
How can we find j? In linear time we can search for the index of the minimum value. However, an optimal method can be built upon binary search using update rules similar to the first solution of this problem. That is we update the lower and upper bounds of a binary search to insure that the pivot is always between them. When this can no longer be satisfied, then the lower bound must be the pivot value.
Listing 7.5: Binary Search for the Pivot in Rotated Array
int find_pivot(int[] array) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
if (array[lower] <= array[mid]) {
if (array[mid] < array[upper]) {
break;
}
lower = mid + 1;
} else {
upper = mid;
}
}
return lower;
}
7.4 Find a fixed-point
An interesting mathematical programming problem asks the candidate to find the fixed-point in a function.
Concretely, suppose that a monotonically increasing integral function is defined on the integers [0,100] taking values in [0,100]. It is represented by an array A, where f(x) = A[x]. Since the function is bounded and increasing, there must be a value y such that f(y) = y. Such a y is called a fixed-point.
We can find the fixed-point with binary search with a process similar to what was used to find the pivot in a rotated array. The idea is to always update the bounds so that the fixed-point is in the range being searched. To do this, we update so that lower ≤ A[lower] and upper ≥ A[upper]. Below is an implementation of this solution.
Listing 7.6: Find a Fixed-Point
int fixed_point(int[] array) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
if (array[mid] <= mid) {
upper = mid;
} else {
lower = mid + 1;
}
}
return lower;
}
7.5 Find duplicate values
Suppose we are asked to find one value in a sequence of integers that appears twice. We can solve this problem with binary search.
The idea is to update the bounds so that we consider the half that is larger than the number of integers it could hold if there were no duplicates. The reason this works is similar to above. For if there is a duplicate value in the array, and the array has all the elements in the sequence, then the length of the array must be larger than the difference of its last and first elements.
Listing 7.7: Find Duplicates in a Sequence
int find_duplicates(int[] array) {
int lower = 0;
int upper = array.length - 1;
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
int mid_val = array[lower]
+ (array[upper] - array[lower]) / 2;
if (mid_val < array[mid]) {
upper = mid;
} else {
lower = mid + 1;
}
}
return lower;
}
Note the similarity with finding the fixed-point of a function. The expected middle value is what would be present if there were no duplicates. By calculating it above and comparing against the actual mid, we are converting the bounds so this and the fixed-point problem have the same problem.
7.6 Binary search in a Young Tableau
Finally, let’s consider an example that uses the technique of binary search in a more complex implementation. Suppose we are given an n × m matrix M of integers. The integers are sorted from smallest to largest in columns and in rows. That is Mij ≤ Mil whenever j ≤ l, and Mij ≤ Mkj whenever i≤ k. Such a matrix is called a Young tableau. The question is to find the coordinates of an element of the matrix with a specified value.
A coordinate is defined by the following class.
Listing 7.8: Definition of a Coordinate
class Coordinate {
int x;
int y;
Coordinate(int x, int y) {
this.x = x;
this.y = y;
}
}
And matrix data type we use is defined below.
Listing 7.9: Definition of a Matrix
class Matrix {
int[][] matrix;
Matrix(int rows, int cols) {
this.matrix = new int[rows][cols];
}
int getRows() {
return matrix.length;
}
int getCols() {
return getRows() == 0 ? 0 : matrix[0].length;
}
}
We want to choose bounds such that after a comparison the search space can be significantly reduced. If we try to flatten the matrix and search, preprocessing would require visiting all n2 elements. But we should be able to do better with a kind of binary search.
To apply binary search, we would need to find an update rule that reduces the search space significantly. Suppose we have a midpoint of the matrix at position i,j. If M[i.j] is less than our key value, we could increase the value of the matrix entry. So it is appropriate to increment either the row or column. If M[i,j] is greater than our key value we need to decrement either the row or column. To avoid branching we want to choose a starting position where we have exactly one choice of updates after comparing an entry to the key value. There are two natural locations that provide this property: the bottom left corner and the top right corner. Below we start in the bottom left corner of the matrix and perform a diagonal search upward.
Listing 7.10: Diagonal Search of a Matrix
Coordinate matrix_find(Matrix m, int key) {
int rows = m.getRows();
int cols = m.getCols();
Coordinate coord = new Coordinate(rows - 1, 0);
while (coord.x < rows
&& coord.y < cols
&& m.matrix[coord.x][coord.y] != key) {
if (m.matrix[coord.x][coord.y] > key) {
coord.x--;
} else {
coord.y++;
}
}
return coord;
}
The algorithm above is linear. To see this, note that in the worst case it may traverse to the opposite corner of the matrix. But as we’ve seen, binary search attains logarithmic run time by reducing the search space by a constant amount. And with some insight that bound can be achieved for search on the Young tableau.
First we will need to find a variant of binary search that returns the location of the first value within a sub-array that is greater than or equal to the search key. This is similar to thefind_leftmost function derived earlier with the exception that the sub-array boundaries are passed in as parameters.
Listing 7.11: Find Lower Bound
int lower_bound(int[] array,
int lower,
int upper,
int key) {
while (lower < upper) {
int mid = lower + (upper - lower) / 2;
if (key <= array[mid]) {
upper = mid;
} else {
lower = mid + 1;
}
}
return lower;
}
Now let’s consider the extreme matrix entries. M[1,1] is the smallest entry in the matrix, and M[n,m] is the largest. Generalizing, given a sub-matrix defined by entries M[p,q] and M[s,t], where p ≤ s, and q ≤ t, we know that M[p,q] is the smallest entry in the sub-matrix and M[s,t] is the largest.
With this insight, we can write a matrix binary search that reduces the search space by a constant fraction each step. This will produce an optimal solution. The idea is to start with a sub matrix bounded by M[p,q] on the top left and M[s,t] on the bottom right. We calculate a middle row i = (p + s)∕2, and search this row to find the column which best separates the matrix into four sub-matrices. The column j that achieves this is the first entry M[i,j] in row i that is greater than or equal to the key value. Now consider the separation of the matrix into the four sub matrices bounded by row i and column j. The key cannot appear in the top left sub-matrix because it is larger than M[i - 1,j - 1]. Further the key can not appear in the bottom right sub-matrix because it is smaller than M[i,j]. Hence we need only look at the remaining two sub-matrices: the upper right and lower left. We have narrowed the search down to half the original domain.
Listing 7.12: Binary Search in Young Tableau
boolean recursive_find(Matrix m,
int key,
Coordinate lower,
Coordinate upper,
Coordinate out) {
if (lower.x > upper.x
|| lower.y > upper.y
|| m.getRows() <= Math.max(lower.x, upper.x)
|| m.getCols() <= Math.max(lower.y,
upper.y)) {
return false;
}
int mid_row = lower.x + (upper.x - lower.x) / 2;
int mid_col = lower_bound(m.matrix[mid_row],
lower.y,
upper.y,
key);
if (mid_row < m.getRows()
&& mid_col < m.getCols()
&& m.matrix[mid_row][mid_col] == key) {
out.x = mid_row;
out.y = mid_col;
return true;
}
return recursive_find(
m,
key,
new Coordinate(mid_row + 1, lower.y),
new Coordinate(upper.x, mid_col - 1),
out)
|| recursive_find(
m,
key,
new Coordinate(lower.x, mid_col),
new Coordinate(mid_row - 1, upper.y),
out);
}
Coordinate matrix_find_binary_search(Matrix m,
int key) {
Coordinate lower = new Coordinate(0, 0);
Coordinate upper = new Coordinate(
m.getRows() - 1,
m.getCols() - 1);
Coordinate out = new Coordinate(0, 0);
recursive_find(m, key, lower, upper, out);
return out;
}
This poly-logarithmic algorithm above is quite difficult, and may take a couple readings. The best takeaway is that there is oftentimes a best separation of the search space, and that is worth finding.