Sorting - Programming Problems in Java: A Primer for The Technical Interview (2014)

Programming Problems in Java: A Primer for The Technical Interview (2014)

Chapter 10. Sorting

Sorting is a fundamental task in programming, and no one should be taken off guard when asked to sort a collection of integers or strings. One should be able to produce a number of optimal sorts from memory. This means not only avoiding quadratic sorts, but also choosing appropriately between quick sort, merge sort, radix sort, and others depending on the problem constraints.

All optimal sorts have worst case run times of order nlog n over unbounded data. Insertion sort is often used for small data sets, but it has quadratic worst case running time. It should be avoided unless the problem is explicitly constrained so that it is efficient. Otherwise the only quadratic sorting algorithm you should ever give in an interview is quick sort. And even with a naive quick sort implementation, you should be able to explain that in its basic implementation it has order nlog(n) running time in expectation, but can be made to have deterministic order nlog(n) running time with a clever choice of pivots.

In this chapter we will implement five sorting algorithms and discuss their comparative merits. These algorithms are insertion sort, heap sort, quick sort, radix sort, and merge sort.

10.1 Insertion sort

Insertion sort is an inefficient sort that is practical for small data sets or data sets with only a small perturbation from sorted order.

Insertion sort works by iteratively inserting each element of an array into its proper position in a sorted sub-array. A standard implementation has an outer loop and an inner loop. The outer loop visits each value of an array from front to back, leaving behind a sorted sub-array. The inner loop searches for the position of the outer loop value in the sorted sub-array by scanning from back to front. During this search a portion of the sorted sub-array is shifted right one position, and the value is inserted into the vacated position.


Listing 10.1: Insertion Sort


void insertion_sort(int[] array) {
for (int i = 1; i < array.length; i++) {
int val = array[i];
for (int j = i - 1;
j >= 0 && array[j] > val;
j--) {
int tmp = array[j];
array[j] = array[j + 1];
array[j + 1] = tmp;
}
}
}


As can be seen, insertion sort is an in-place algorithm as it does not require allocation of memory from the heap. Insertion sort is also a stable sort, as the algorithm maintains the relative order of elements. Lastly, as only the indices and a temporary value are required insertion sort requires constant extra space.

The main drawback is that insertion sort has quadratic worst case behavior. Our next algorithm overcomes this drawback.

10.2 Heap sort

Heap sort uses a priority queue to achieve optimal worst case running time for a sort.

Heap-sort operates as follows. We create a max heap using the aPriorityQueue. Afterward, max values are iteratively removed and stored in their proper location in a partially sorted sub-array at the tail end of the array. Recalling that creating a heap is an order nlog n operation and removing the first element in a heap is a logarithmic time operation called once for each element in the array, we can see that heap sort is an optimal sort.


Listing 10.2: Heap Sort


void heap_sort(int[] array) {
PriorityQueue<Integer> heap =
new PriorityQueue<Integer>(array.length);
for (int value : array) {
heap.add(value);
}
for (int i = 0; i < array.length; i++) {
array[i] = heap.poll();
}
}


Since building the heap and removing the max require no external memory, heap sort is an in-place algorithm. Since the heap creation may shuffle the relative locations of elements, heap-sort is not a stable sort. And lastly, from our implementation of the in-place heapify method we can see that heap sort uses constant space.

10.3 Quick sort

Quick sort is a divide and conquer algorithm that, like selection, partitions the data on a properly chosen pivot and recursively operates on the sub-arrays produced.

With a random pivot selection, quick sort provides expected optimal run time. However the worst case run time is quadratic. This is overcome by choosing a pivot element that is near the median of the array. The implementation of median-of-3 provided in the section on selection would suffice to guarantee optimal worst case behavior. Instead, we use the linear timenth_elementpivot selection method presented earlier which is modified and reimplemented here for convenience. Usingnth_elementwill increase effective run time, but the worst case asymptotic bound in unaffected.


Listing 10.3: Quick Sort


void quick_sort(int[] array) {
quick_sort(array, 0, array.length - 1);
}

void quick_sort(int[] array, int low, int high) {
if ((high - low) < 1) {
return;
}
int mid = (low + high) / 2;
int i = low;
int j = high;
while (i <= j) {
while (array[i] <= array[mid] && i < mid) {
i++;
}
while (array[j] > array[mid]) {
j--;
}
if (i <= j) {
int temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
}
if (j > low) {
quick_sort(array, low, j);
}
if (i < high) {
quick_sort(array, i, high);
}
}


As seen in our study of selection, finding the median element is an in-place operation. Partitioning and sub-array recursing are also in-place, and hence quick sort is an in-place sort. Further, ifnth_elementis stable, then so is quick sort. Lastly, note that quick sort uses only a constant amount of extra space.

For these reasons quick sort is often the best generic sorting algorithm to use in applications.

10.4 Radix sort

In radix sort, values are first bucketed by the value of some digit prior to merging. By bucketing over more than two sets, radix sort is a generalization of quick sort. For input from a bounded set, radix sort achieves linear time worst case behavior. However this run time falls back to quadratic when the input is from an unbounded domain.

Radix sort can be implemented on any radix; for instance binary digits or decimal digits. Below we implement radix sort on hexadecimal digit representations of integers.


Listing 10.4: Radix Sort


void radix_sort(int[] array) {
List<List<Integer>> buckets =
new ArrayList<List<Integer>>(0xf + 1);
for (int i = 0; i < (0xf + 1); i++) {
buckets.add(new ArrayList<Integer>());
}
for (int i = 0; i < 4; i++) {
long offset = i ⋆ 4;
long mask = 0xf << offset;
for (int val : array) {
int bucket = (int) ((mask & val) >> offset);
buckets.get(bucket).add(val);
}
int index = 0;
for (List<Integer> bucket : buckets) {
for (Integer val : bucket) {
array[index++] = val;
}
bucket.clear();
}
}
}


Radix sort is only optimal for fixed length data, such as integers bounded by some maximum. In order to bucket the input, radix sort requires external memory and is not in-place. Radix sort is a stable sort, as the relative order of values is maintained through each iteration of the algorithm.

10.5 Merge sort

Merge sort is another divide-and-conquer algorithm for sorting. The idea springs from the observation that if one starts with two sorted sub-arrays, it is simple to combine them into a single sorted array. This can be done by first comparing the smallest elements of each array for the minimum. That element is removed from its current array and added to the result, and the process iterates. Since the original arrays are sorted, this involves only a linear scan. So all that is left is how to set up the smaller arrays. As with quick sort, we can always divide input into smaller halves until we are left with individual units. We can then combine these into sorted output.


Listing 10.5: Merge Sort


void merge_sort(int[] data) {
if (data.length <= 1)
return;
int mid = data.length / 2;
int[] left = Arrays.copyOfRange(data, 0, mid);
merge_sort(left);
int[] right = Arrays.copyOfRange(data,
mid,
data.length);
merge_sort(right);
int index = 0;
int left_index = 0;
int right_index = 0;
while (index < data.length) {
if (left_index == left.length) {
data[index] = right[right_index];
right_index++;
} else if (right_index == right.length) {
data[index] = left[left_index];
left_index++;
} else if (left[left_index]
< right[right_index]) {
data[index] = left[left_index];
left_index++;
} else {
data[index] = right[right_index];
right_index++;
}
index++;
}
}


As the merge is in-place, this implementation requires only memory allocation on the stack. However the recursive calls may require up to logarithmic extra space. Merge sort can be implemented as a stable sort.

While worst case behavior is asymptotically equivalent to that of quick sort, it has been argued that in-place quick sort is faster in practice. There are many possible reasons for this, including optimization of the inner loop of quick sort and lower stack space requirements. However, merge sort is the obvious choice when sorting over large data.

10.6 Sorting over large data

A favorite interview question at Google asks to sort 4GB of data using only 100MB of memory. While the specific numerical limits may change, the key insight is to use merge sort to accomplish sorting over large data.

This is an exercise in partitioning and merging. First, each contiguous sub-array of the smaller bound of data should be sorted with quick sort. Doing so is fast, in place, and requires only a constant amount of extra memory. Afterward, each of these sub-arrays is merged as with merge sort. A sample implementation of the merge is provided below.


Listing 10.6: Sorting Large Data with Merge Sort


void merge_arrays(List<List<Integer>> in,
List<Integer> out) {
PriorityQueue<List<Integer>> heap =
new PriorityQueue<List<Integer>>(in.size(),
new Comparator<List<Integer>>() {
public int compare(List<Integer> a,
List<Integer> b) {
return a.get(0) - b.get(0);
}
});
for (List<Integer> list : in) {
if (null != list && !list.isEmpty()) {
heap.add(list);
}
}
while (!heap.isEmpty()) {
List<Integer> list = heap.poll();
out.add(list.remove(0));
if (!list.isEmpty()) {
heap.add(list);
}
}
}


Afterward

And with that, we complete this step in our journey. I hope you’ve enjoyed this tour through elementary data structures and fundamental algorithms. We’ve covered a lot of material and solved some of the most difficult problems encountered in technical interviews. I enjoyed the process of writing this book, and sincerely hope you enjoyed reading it. If you’re new to computer programming, I hope you enjoyed learning the tricks and techniques the selection of problems covered. If you’re an experienced developer, I hope this volume reminded you of the joy of solving problems that keeps us in engineering. Thanks for coming along with me, and I look forward to seeing you again in volume two, Advanced Algorithms inJava.

Uploaded by .X0RN. {Zer07} [BЯ]