Heaps - Programming Problems in Java: A Primer for The Technical Interview (2014)

Programming Problems in Java: A Primer for The Technical Interview (2014)

Chapter 5. Heaps

In this chapter we examine the heap data structure. The heap, also known as a priority queue, has applications in graph algorithms, caching, and sorting. We begin by discussing multiple variations of the heap. We then examine an implementation the standard operations of the binary heap. And finally, we look at a number of interview problems that can be solved by clever applications of a heap.

5.1 Definition

A heap is an abstract data structure similar to a binary search tree. Like a binary tree, a heap is a collection of nodes that have a data member value and at most two child nodes. The differentiating factor is that nodes in a heap satisfy the heap property. The heap property is held whenp.value≥ n.valuefor each child nodenof a parentp. A heap that satisfies this property is called a max heap. A min heap satisfies the heap property wherein a parent’s data value never exceeds any child’s data value. Note that the root of a min heap has the smallest value in the data structure, and the root of a max heap has the largest value in the data structure.

The heap operations areinsert,find_max,remove_max, andincrease_key. The names are self-descriptive of the operations. By maintaining the heap property as an invariant, thefind_max operation runs in constant time.

Heaps have a number of variants with different worst case running time for other operations. The Brodal heap is a complex collection of algorithms that obtains optimal worst case run time for heap operations. The Brodal heap has constant time implementationinsert,find_max, andincrease_key. Forremove_max, we use the Brodal variant to obtain a logarithmic run time.

A second variant is the Fibonacci heap. The Fibonacci heap is so named because it creates and maintains a forest of trees where the numbers of children of the root nodes are bounded below by Fibonacci constants. The relationship between the trees maintains the heap property. The Fibonacci heap achieves only amortized asymptotically optimal bounds on running time for the heap operations. This is a trade-off between complexity of implementation and efficiency.

A simpler variant is the binary heap. In the binary heap, there is a single binary tree structure. The tree structure is always balanced, and grows by insertion in level order. While the binary heap is simple to implement and understand, it does not achieve the asymptotic efficiency of the other variants. Namely, whilefind_maxtakes constant time andremove_maxachieves logarithmic time, all other operations in a binary heap have at least logarithmic time lower bounds.

Throughout this section we will take a closer look at the binary heap.

5.2 Operations

In the following section we implement the basic operations for the binary heap:insert,find_max,remove_max, andincrease_key. Each operates on a binary tree node satisfying the heap property. We begin therefore with theheapifyoperation which can initialize a binary heap fromaListin-place.

5.2.1 Initialization

Initialization of a binary heap from aListinvolves a logical representation of a binary tree and reordering of the elements so that they satisfy the heap property. We begin by describing the binary tree structure used to represent the heap.

The binary tree structure comes from corresponding the index of entries in theListto level order position of nodes in a complete binary tree. Index 0 of theListcorresponds to the root of the binary tree. For a node at position i in theList, the left child is at position 2 * i + 1, and the right child is at 2 * i + 2. It will be useful to have a method that computes these indexes.


Listing 5.1: Heap Index of Children


int lchild_index(int index) {
return (index ⋆ 2) + 1;
}

int rchild_index(int index) {
return (index ⋆ 2) + 2;
}


Notice with this scheme the next available leaf node is the rightmost element passed the end of the heap. The parent of a node at index i is located at (i - 1)2. Integer division achieves rounding down.


Listing 5.2: Heap Index of Parent


int parent_index(int index) {
return (index != 0) ? (index - 1) / 2 : 0;
}


Creating the heap starts with the single element of theListat index 0. This single element sub-array trivially satisfies the heap property. The operation next iteratively adds elements from theListin such a way that the heap property is maintained. Concretely, new elements are first added as leaves. Afterward, parent and child pairs are repeatedly swapped if in violation of the heap property. When the algorithm terminates, the maximum value of the heap is at the first index and the heap property holds.


Listing 5.3: Heap Creation


void heapify(List<Integer> array) {
for (int i = 0; i < array.size(); i++) {
int parent = parent_index(i);
while (i != parent
&& array.get(parent) < array.get(i)) {
int temp = array.get(i);
array.set(i, array.get(parent));
array.set(parent, temp);
i = parent;
parent = parent_index(i);
}
}
}


While a binary heap is not an optimally efficient heap structure, as we see the binary heap can be constructed in-place instead of requiring memory allocation for the tree. The time savings realized from not having to allocate ancillary objects often overcome the disadvantage of using an implementation not optimized for worst case behavior.

5.2.2 Insertion

Insertion of a new value into a binary heap begins with adding the value as the next available leaf node in the binary tree. Afterward, the parent-child relationships are updated to satisfy the heap property as was done in creating the heap. Insertion requires theListto be dynamically resized, and hence is not an in-place operation.


Listing 5.4: Insert Into a Heap


void insert(List<Integer> heap, int value) {
heap.add(value);
int index = heap.size() - 1;
int parent = parent_index(index);
while (index != 0
&& heap.get(index) > heap.get(parent)) {
int temp = heap.get(index);
heap.set(index, heap.get(parent));
heap.set(parent, temp);
index = parent;
parent = parent_index(index);
}
}


5.2.3 Find maximum element

In a binary heap, thefind_maxoperation requires only returning the first index of theList. Correctness of this action is guaranteed by the heap property.


Listing 5.5: Find Maximum in a Heap


int find_max(List<Integer> heap) {
return heap.get(0);
}


5.2.4 Remove maximum element

Removal of the maximum heap value must accomplish two objectives aside from replacing the value at the initial position. First, it must promote the second minimum value to the first position. Doing so causes the tree to pivot, and may have a cascading effect on the structure of the heap. Secondly, the tree structure and parent-child relationships must be maintained.

Both requirements are satisfied by the following sequence. First replace the root value of the heap with the value of the last leaf node, and then resize the array to the smaller size. This maintains the tree structure and removes the maximum element from the array. We are required to then fix up the heap property. In this operation, the heap property is maintained by iteratively swapping parent-child pairs if in violation of the heap property.


Listing 5.6: Remove Maximum from a Heap


void remove_max(List<Integer> heap) {
heap.remove(0);
int index = 0;
while (index < heap.size()) {
int child = lchild_index(index);
if (child >= heap.size()) {
break;
}
int right = rchild_index(index);
if (right < heap.size()
&& heap.get(right) > heap.get(child)) {
child = right;
}
if (heap.get(index) >= heap.get(child)) {
break;
}
int temp = heap.get(index);
heap.set(index, heap.get(child));
heap.set(child, temp);
index = child;
}
}


5.2.5 Increase key

When a key is increased in a max heap, the heap property may no longer hold for that node and its parent. Hence, by increasing the value of a key, we may need to decrease its index repeatedly until the array again has the heap property. This is done by our standard practice of iteratively swapping.


Listing 5.7: Increase a Key in a Heap


void increase_key(List<Integer> heap, int index) {
heap.set(index, heap.get(index) + 1);
int parent = parent_index(index);
while (0 != index
&& heap.get(index) > heap.get(parent)) {
int temp = heap.get(index);
heap.set(index, heap.get(parent));
heap.set(parent, temp);
index = parent;
parent = parent_index(index);
}
}


This operation generalizes to decrease key and key change in the obvious ways.

5.3 Enumeration of powers

Solving programming problems relies as much upon choosing the correct data structure as it does the proper algorithm. Consider the problem of enumerating powers of an initial set of non-negative numbers. Concretely, given a set of integers the question is to list the powers of the integers in increasing order without duplicates. For example, given the set of 2,3,4, the first six powers would be 1 = 20,2 = 21,3 = 31,4 = 41 = 4,8 = 23,9 = 32. We will define the following class to represent the number value and power term pair.


Listing 5.8: ValueTerm Pair


class ValueTerm<V, T> {
private V value;
private T term;

ValueTerm(V value, T term) {
this.value = value;
this.term = term;
}
}


Enumerating the powers can be solved by using a priority queue pairing a term and its lowest unvisited power.


Listing 5.9: Enumeration of Powers


void enumerate_powers(Set<Integer> set,
int num_powers,
List<Integer> out) {
PriorityQueue<ValueTerm<Integer,
Integer>> heap =
new PriorityQueue<ValueTerm<Integer,
Integer>>(
set.size(),
new Comparator<ValueTerm<Integer,
Integer>>() {
public int compare(ValueTerm<Integer,
Integer> a,
ValueTerm<Integer,
Integer> b) {
return a.value - b.value;
}
});
for (Integer value : set) {
heap.add(new ValueTerm<Integer,
Integer>(1, value));
}
int value = 0;
while (0 != num_powers && 0 != heap.size()) {
ValueTerm<Integer, Integer> entry =
heap.poll();
if (value != entry.value) {
value = entry.value;
out.add(value);
num_powers--;
}
heap.add(new ValueTerm<Integer, Integer>(
entry.value ⋆ entry.term, entry.term));
}
}


Note that by preserving the heap between calls, the solution can be converted to a generator.

5.4 Find top k elements

Finally, let’s consider the problem of determining the top k elements of a stream. Online algorithms with stream inputs usually constrain the problem to a single pass, as either the data is volatile or too large to fit into main memory.

Let’s begin by reviewing the simple case of finding the maximum element. The maximum element of an array can be found by a linear scan.


Listing 5.10: Find Max Value in an Array


int max(int[] array) {
int max_val = array[0];
for (int i = 0; i < array.length; i++) {
max_val = Math.max(array[i], max_val);
}
return max_val;
}


A little intuition leads us to recall that a single element array is a trivial priority queue. The generalization of this process to finding the top k elements of an n element stream is solved in a single pass with a min heap. To operate within the constraints of memory, we will bound size of the priority queue.

To operate on a bounded size heap, we only insert when the value of an item is larger than the smallest element in the heap. When it is, we remove the smallest element with aremove_maxoperation and then insert the new value.


Listing 5.11: Find Top k Elements of a Stream


void find_top_k(InputStream in,
int k,
PriorityQueue<Integer> heap)
throws IOException {
int val = 0;
while (heap.size() < k
&& (val = in.read()) != -1) {
heap.add(val);
}
while (in.available() > 0
&& (val = in.read()) != -1) {
if (val > heap.peek()) {
heap.poll();
heap.add(val);
}
}
}


The run time of this algorithm is an efficient O(log k * n), as each removal of the minimum element and insertion may take time logarithmic in the size of the heap. This is efficient for the online problem.