Algorithms in a Nutshell 2E (2015)
Chapter 2. The Mathematics of Algorithms
One of the most important factors in the choice of an algorithm is the speed with which it is likely to complete. Characterizing the expected computation time of an algorithm is inherently a mathematical process. This chapter presents the mathematical tools behind this time prediction. After reading the chapter, you should understand the various mathematical terms used throughout this book—and in the rest of the literature that describes algorithms.
Size of a Problem Instance
An instance of a problem is a particular input data set given to a program. In most problems, the execution time of a program increases with the size of this data set. At the same time, overly compact representations (possibly using compression techniques) may unnecessarily slow down the execution of a program. It is surprisingly difficult to define the optimal way to encode an instance because problems occur in the real world and must be translated into an appropriate representation to be solved by a program.
When evaluating an algorithm, we want as much as possible to assume that the encoding of the problem instance is not the determining factor in whether the algorithm can be implemented efficiently. Your representation of a problem instance should depend just on the type and variety of operations that need to be performed. Designing efficient algorithms often starts by selecting the proper data structures in which to represent the problem.
Because we cannot formally define the size of an instance, we assume that an instance is encoded in some generally accepted, concise manner. For example, when sorting n integers, we adopt the general convention that each of the n numbers fits into a 32-bit word in the computing platform, and the size of an instance to be sorted is n. In case some of the numbers require more than one word—but only a constant, fixed number of words—our measure of the size of an instance is off only by a constant. So an algorithm that performs a computation using integers stored using 64 bits may take twice as long as a similar algorithm coded using integers stored in 32 bits.
Algorithmic researchers accept that they are unable to compute with pinpoint accuracy the costs involved in using a particular encoding in an implementation. Therefore, they assert that performance costs that differ by a multiplicative constant are asymptotically equivalent, or in other words, will not matter as the problem size continues to grow. As an example, we can expect 64-bit integers to require more time than 32-bit integers, but we should be able to ignore that and assume that a good algorithm for a million 32-bit integers will also be good for a million 64-bit integers. Although such a definition would be impractical for real-world situations (who would be satisfied to learn they must pay a bill that is 1,000 times greater than expected?), it serves as the universal means by which algorithms are compared.
For all algorithms in this book, the constants are small for virtually all platforms. However, when implementing an algorithm in production code, you must pay attention to the details reflected by the constants.
To store collections of information, most programming languages support arrays, contiguous regions of memory indexed by an integer i to enable rapid access to the ith element. An array is one-dimensional when each element fits into a word in the platform (for example, an array of integers or Boolean values). Some arrays extend into multiple dimensions, enabling more interesting data representations.
Rate of Growth of Functions
We describe the behavior of an algorithm by representing the rate of growth of its execution time as a function of the size of the input problem instance. Characterizing an algorithm’s performance in this way is a common abstraction that ignores numerous details. To use this measure properly requires an awareness of the details hidden by the abstraction. Every program is run on a platform, which is a general term meant to encompass:
§ The computer on which the program is run, its CPU, data cache, floating-point unit (FPU), and other on-chip features
§ The programming language in which the program is written, along with the compiler/interpreter and optimization settings for generated code
§ The operating system
§ Other processes being run in the background
We assume that changing the platform will change the execution time of the program by a constant factor, and that we can therefore ignore platform differences in conformance with the asymptotically equivalent principle described earlier.
To place this discussion in context, we briefly discuss the Sequential Search algorithm, presented later in Chapter 5. Sequential Search examines a list of n ≥ 1 distinct elements, one at a time, until a desired value, v, is found. For now, assume that:
§ There are n distinct elements in the list
§ The element being sought, v, is in the list
§ Each element in the list is equally likely to be the desired value v
To understand the performance of Sequential Search, we must know how many elements it examines “on average.” Since v is known to be in the list and each element is equally likely to be v, the average number of examined elements, E(n), is the sum of the number of elements examined for each of the n values divided by n. Mathematically:
Thus, Sequential Search examines about half of the elements in a list of n distinct elements subject to these assumptions. If the number of elements in the list doubles, then Sequential Search should examine about twice as many elements; the expected number of probes is a linear function of n. That is, the expected number of probes is “about” c*n for some constant c; here, c=0.5. A fundamental insight of performance analysis is that the constant c is unimportant in the long run, because the most important cost factor is the size of the problem instance, n. As n gets larger and larger, the error in claiming that:
becomes less significant. In fact, the ratio between the two sides of this approximation approaches 1. That is:
although the error in the estimation is significant for small values of n. In this context we say that the rate of growth of the expected number of elements that Sequential Search examines is linear. That is, we ignore the constant multiplier and are concerned only when the size of an instance is large.
When using the abstraction of the rate of growth to choose between algorithms, remember that:
Constants matter
That’s why we use supercomputers and upgrade our computers on a regular basis.
The size of n is not always large
We will see in Chapter 4 that the rate of growth of the execution time of Quicksort is less than the rate of growth of the execution time of Insertion Sort. Yet Insertion Sort outperforms Quicksort for small arrays on the same platform.
An algorithm’s rate of growth determines how it will perform on increasingly larger problem instances. Let’s apply this underlying principle to a more complex example.
Consider evaluating four sorting algorithms for a specific sorting task. The following performance data was generated by sorting a block of n random strings. For string blocks of size n=1-512, 50 trials were run. The best and worst performances were discarded, and the chart in Figure 2-1shows the average running time (in microseconds) of the remaining 48 results. The variance between the runs is surprising.
Figure 2-1. Comparing four sort algorithms on small data sets
One way to interpret these results is to try to design a function that will predict the performance of each algorithm on a problem instance of size n. Because we are unlikely to guess such a function, we use commercially available software to compute a trend line with a statistical process known as regression analysis. The “fitness” of a trend line to the actual data is based on a value between 0 and 1, known as the R2 value. R2 values near 1 indicate a high fitness. For example, if R2 = 0.9948, there is only a 0.52% chance that the fitness of the trend line is due to random variations in the data.
Sort-4 is clearly the worst performing of these sort algorithms. Given the 512 data points as plotted in a spreadsheet, the trend line to which the data conforms is:
y= 0.0053*n2−0.3601*n+39.212
R2= 0.9948
Having an R2 confidence value so close to 1 declares this an accurate estimate. Sort-2 offers the fastest implementation over the given range of points. Its behavior is characterized by the following trend line equation:
y= 0.05765*n*log(n)+7.9653
Sort-2 marginally outperforms Sort-3 initially, and its ultimate behavior is perhaps 10% faster than Sort-3. Sort-1 shows two distinct behavioral patterns. For blocks of 39 or fewer strings, the behavior is characterized by:
y= 0.0016*n2+0.2939*n+3.1838
R2= 0.9761
However, with 40 or more strings, the behavior is characterized by:
y= 0.0798*n*log(n)+142.7818
The numeric coefficients in these equations are entirely dependent upon the platform on which these implementations execute. As described earlier, such incidental differences are not important. The long-term trend as n increases dominates the computation of these behaviors. Indeed,Figure 2-1 graphs the behavior using two different ranges to show that the real behavior for an algorithm may not be apparent until n gets large enough.
Algorithm designers seek to understand the behavioral differences that exist between algorithms. Sort-1 reflects the performance of qsort on Linux 2.6.9. When reviewing the source code (which can be found through any of the available Linux code repositories[1]), one discovers the following comment: “Qsort routine from Bentley & McIlroy’s Engineering a Sort Function.” Bentley and McIlroy (1993) describe how to optimize Quicksort by varying the strategy for problem sizes less than 7, between 8 and 39, and for 40 and higher. It is satisfying to see that the empirical results presented here confirm the underlying implementation.
Analysis in the Best, Average, and Worst Cases
One question to ask is whether the results of the previous section will be true for all input problem instances. How will the behavior of Sort-2 change with different input problem instances of the same size?
§ The data could contain large runs of elements already in sorted order.
§ The input could contain duplicate values.
§ Regardless of the size n of the input set, the elements could be drawn from a much smaller set and contain a significant number of duplicate values.
Although Sort-4 from Figure 2-1 was the slowest of the four algorithms for sorting n random strings, it turns out to be the fastest when the data is already sorted. This advantage rapidly fades away, however, with just 32 random items out of position, as shown in Figure 2-2.
Figure 2-2. Comparing sort algorithms on sorted/nearly sorted data
However, suppose an input array with n strings is “nearly sorted”—that is, n/4 of the strings (25% of them) are swapped with another position just four locations away. It may come as a surprise to see in Figure 2-3 that Sort-4 outperforms the others.
Figure 2-3. Sort-4 wins on nearly sorted data
The conclusion to draw is that for many problems, no single optimal algorithm exists. Choosing an algorithm depends on understanding the problem being solved and the underlying probability distribution of the instances likely to be treated, as well as the behavior of the algorithms being considered.
To provide some guidance, algorithms are typically presented with three common cases in mind:
Worst case
Defines a class of input instances for which an algorithm exhibits its worst runtime behavior. Instead of trying to identify the specific input, algorithm designers typically describe properties of the input that prevent an algorithm from running efficiently.
Average case
Defines the expected behavior when executing the algorithm on random input instances. While some input problems will require greater time to complete because of some special cases, the vast majority of input problems will not. This measure describes the expectation an average user of the algorithm should have.
Best case
Defines a class of input instances for which an algorithm exhibits its best runtime behavior. For these input instances, the algorithm does the least work. In reality, the best case rarely occurs.
By knowing the performance of an algorithm under each of these cases, you can judge whether an algorithm is appropriate to use in your specific situation.
Worst Case
For any particular value of n, the work done by an algorithm or program may vary dramatically over all the instances of size n. For a given program and a given value n, the worst-case execution time is the maximum execution time, where the maximum is taken over all instances of size n.
We are interested in the worst-case behavior of an algorithm because it often is the easiest case to analyze. It also explains how slow the program could be in any situation.
More formally, if Sn is the set of instances si of size n, and t is a function that measures the work done by an algorithm on each instance, then work done by an algorithm on Sn in the worst case is the maximum of t(si) over all si ∈ Sn. Denoting this worst-case performance on Sn by Twc(n), the rate of growth of Twc(n) defines the worst-case complexity of the algorithm.
There are not enough resources to compute each individual instance si on which to run the algorithm to determine empirically the one that leads to worst-case performance. Instead, an adversary crafts a worst-case problem instance given the description of the algorithm.
Average Case
A telephone system designed to support a large number n of telephones must, in the worst case, be able to complete all calls where n/2 people pick up their phones and call the other n/2 people. Although this system will never crash because of overload, it would be prohibitively expensive to construct. In reality, the probability that each of n/2 people calls a unique member of the other n/2 people is exceedingly small. Instead, one could design a system that is cheaper to build and use mathematical tools to consider the probability of crash due to overload.
For the set of instances of size n, we associate a probability distribution Pr{si}, which assigns a probability between 0 and 1 to each instance si such that the sum, over all instances of size n, of the probability of that instance is 1. More formally, if Sn is the set of instances of size n, then:
If t measures the work done by an algorithm on each instance, then the average-case work done by an algorithm on Sn is:
That is, the actual work done on instance si, t(si), is weighted with the probability that si will actually be presented as input. If Pr{si}=0, then the actual value of t(si) does not impact the expected work done by the program. Denoting this average-case work on Sn by Tac(n), then the rate of growth of Tac(n) defines the average-case complexity of the algorithm.
Recall that when describing the rate of growth of work or time, we consistently ignore constants. So when we say that Sequential Search of n elements takes, on average:
probes (subject to our earlier assumptions), then by convention we simply say that subject to these assumptions, we expect Sequential Search will examine a linear number of elements, or order n.
Best Case
Knowing the best case for an algorithm is useful even though the situation rarely occurs in practice. In many cases, it provides insight into the optimal circumstance for an algorithm. For example, the best case for Sequential Search is when it searches for a desired value, v, which ends up being the first element in the list. Consider a slightly different approach, which we’ll call Counting Search, that counts the number of times that v appears in a list. If the computed count is zero, then the item was not found, so it returns false; otherwise, it returns true. Note that Counting Search always searches through the entire list; therefore, even though its worst-case behavior is O(n)—the same as Sequential Search—its best-case behavior remains O(n), so it is unable to take advantage of either the best-case or average-case situations in which it could have performed better.
Performance Families
We compare algorithms by evaluating their performance on input data of size n. This methodology is the standard means developed over the past half-century for comparing algorithms. By doing so, we can determine which algorithms scale to solve problems of a nontrivial size by evaluating the running time needed by the algorithm in relation to the size of the provided input. A secondary performance evaluation is to consider how much memory or storage an algorithm needs; we address these concerns within the individual algorithm descriptions, as appropriate.
We use the following classifications which are ordered by decreasing efficiency:
§ Constant
§ Logarithmic
§ Sublinear
§ Linear
§ n log(n)
§ Quadratic
§ Exponential
When evaluating the performance of an algorithm, keep in mind that you must identify the most expensive computation within an algorithm to determine its classification. For example, consider an algorithm that is subdivided into two tasks, a task classified as linear followed by a task classified as quadratic. The overall performance of the algorithm must therefore be classified as quadratic.
We’ll now illustrate these performance classifications by example.
Constant Behavior
When analyzing the performance of the algorithms in this book, we frequently claim that some primitive operations provide constant performance. Clearly this claim is not an absolute determinant for the actual performance of the operation since we do not refer to specific hardware. For example, comparing whether two 32-bit numbers x and y are the same value should have the same performance regardless of the actual values of x and y. A constant operation is defined to have O(1) performance.
What about the performance of comparing two 256-bit numbers? Or two 1,024-bit numbers? It turns out that for a predetermined fixed size k, you can compare two k-bit numbers in constant time. The key is that the problem size (i.e., the values x and y being compared) cannot grow beyond the fixed size k. We abstract the extra effort, which is multiplicative in terms of k, using the notation O(1).
Log n Behavior
A bartender offers the following $10,000 bet to any patron. “I will choose a number from 1 to 1,000,000 and you can guess 20 numbers, one at a time; after each guess, I will either tell you Too Low, Too High, or You Win. If you guess the number in 20 questions, I give you $10,000; otherwise, you give me $10,000.” Would you take this bet? You should, because you can always win. Table 2-1 shows a sample scenario for the range 1–10 that asks a series of questions, reducing the problem size by about half each time.
Table 2-1. Sample behavior for guessing number from 1–10
Number |
First guess |
Second guess |
Third guess |
Fourth Guess |
1 |
Is it 5? Too High |
Is it 3? Too High |
Is it 2? Too High |
Must be 1! You Win |
2 |
Is it 5? Too High |
Is it 3? Too High |
Is it 2? You Win |
|
3 |
Is it 5? Too High |
Is it 3? You Win |
||
4 |
Is it 5? Too High |
Is it 3? Too Low |
Must be 4! You Win |
|
5 |
Is it 5? You Win |
|||
6 |
Is it 5? Too Low |
Is it 8? Too High |
Is it 7? Too Low |
Must be 6! You Win |
7 |
Is it 5? Too Low |
Is it 8? Too High |
Is it 7? You Win |
|
8 |
Is it 5? Too Low |
Is it 8? You Win |
||
9 |
Is it 5? Too Low |
Is it 8? Too Low |
Is it 9? You Win |
|
10 |
Is it 5? Too Low |
Is it 8? Too Low |
Is it 9? Too Low |
Must be 10! You Win |
In each turn, depending upon the specific answers from the bartender, the size of the potential range containing the hidden number is cut in about half each time. Eventually, the range of the hidden number will be limited to just one possible number; this happens after 1+⌊log (n)⌋ turns. The floor function ⌊ x ⌋ rounds the number x down to the largest integer smaller than or equal to x. For example, if the bartender chooses a number between 1 and 10, you could guess it in 1+⌊log (10)⌋ = 1+⌊3.32⌋, or four guesses, as shown in the table.
This same approach works equally well for 1,000,000 numbers. In fact, the Guessing algorithm shown in Example 2-1 works for any range [low, high] and determines the value of n in 1+⌊log (high-low+1)⌋ turns. If there are 1,000,000 numbers, this algorithm will locate the number in at most 1+⌊log (1,000,000)⌋ = 1+⌊19.93⌋, or 20 guesses (the worst case).
Example 2-1. Java code to guess number in range [low,high]
// Compute number of turns when n is guaranteed to be in range [low,high].
public static int turns(int n, int low, int high) {
int turns = 0;
// Continue while more than two potential numbers remain.
while (high - low >= 2) {
turns++;
int mid = (low + high)/2;
if (mid == n) {
return turns;
} else if (mid < n) {
low = mid + 1;
} else {
high = mid - 1;
}
}
// At this point, you can make final guess
return 1 + turns;
}
Logarithmic algorithms are extremely efficient because they rapidly converge on a solution. These algorithms succeed because they reduce the size of the problem by about half each time. The Guessing algorithm reaches a solution after at most k=1+⌊log (n)⌋iterations, and at the ith iteration (i>0), the algorithm computes a guess that is known to be within ±ϵ = 2k-i from the actual hidden number. The quantity ϵ is considered the error, or uncertainty. After each iteration of the loop, ϵ is cut in half.
Another example showing efficient behavior is Newton’s method for computing the roots of equations in one variable, namely, for what values of x does f(x) = 0? To find the roots of f(x)=x*sin(x) - 5*x - cos(x), compute its derivative f’(x)=x * cos(x) + sin(x) - 5 - sin(x) = x * cos(x) - 5. The Newton iteration computes xn+1=xn - f(xn)/f’(xn). Starting with a “guess” that x is zero, this algorithm quickly determines an appropriate solution of x=−0.189302759, as shown in Table 2-2. The binary and decimal digits enclosed in brackets, [], are the accurate digits.
Table 2-2. Newton’s method
n |
xn in decimal |
xn in bits (binary digits) |
0 |
0.0 |
|
1 |
−0.2 |
[1011111111001]0011001100110011001100110… |
2 |
−[0.18]8516717588… |
[1011111111001000001]0000101010000110110… |
3 |
−[0.1893]59749489… |
[101111111100100000111]10011110000101101… |
4 |
−[0.189]298621848… |
[10111111110010000011101]011101111111011… |
5 |
−[0.18930]3058226… |
[1011111111001000001110110001]0101001001… |
6 |
−[0.1893027]36274… |
[1011111111001000001110110001001]0011100… |
7 |
−[0.189302759]639… |
[101111111100100000111011000100101]01001… |
Sublinear O(nd) Behavior for d < 1
In some cases, the behavior of an algorithm is better than linear, yet not as efficient as logarithmic. As discussed in Chapter 9, the kd-tree in multiple dimensions can partition a set of n d-dimensional points efficiently. If the tree is balanced, the search time for range queries that conform to the axes of the points is O(n1-1/d). For 2-dimensional queries, the resulting performance is O(sqrt(n)).
Linear Performance
Some problems clearly seem to require more effort to solve than others. Any eight-year-old can evaluate 7+5 to get 12. How much harder is the problem 37+45?
Specifically, how hard is it to add two n-digit numbers an… a1+bn… b1 to result in a cn+1… c1 digit value? The primitive operations used in this Addition algorithm are as follows:
A sample Java implementation of Addition is shown in Example 2-2, where an n-digit number is representedl as an array of int values; for the examples in this section, it is assumed that each of these values is a decimal digit d such that 0≤ d ≤9.
Example 2-2. Java implementation of add
public static void add (int[] n1, int[] n2, int[] sum) {
int position = n1.length-1;
int carry = 0;
while (position >= 0) {
int total = n1[position] + n2[position] + carry;
sum[position+1] = total % 10;
if (total > 9) { carry = 1; } else { carry = 0; }
position--;
}
sum[0] = carry;
}
As long as the input problem can be stored in memory, add computes the addition of the two numbers as represented by the input integer arrays n1 and n2. Would this implementation be as efficient as the following plus alternative, listed in Example 2-3, which computes the exact same answer using different computations?
Example 2-3. Java implementation of plus
public static void plus(int[] n1, int[] n2, int[] sum) {
int position = n1.length;
int carry = 0;
while (--position >= 0) {
int total = n1[position] + n2[position] + carry;
if (total > 9) {
sum[position+1] = total-10;
carry = 1;
} else {
sum[position+1] = total;
carry = 0;
}
}
sum[0] = carry;
}
Do these small implementation details affect the performance of an algorithm? Let’s consider two other potential factors that can impact the algorithm’s performance:
§ add and plus can trivially be converted into C programs. How does the choice of language affect the algorithm’s performance?
§ The programs can be executed on different computers. How does the choice of computer hardware affect the algorithm’s performance?
The implementations were executed 10,000 times on numbers ranging from 256 digits to 32,768 digits. For each digit size a random number of that size was generated; thereafter, for each of the 10,000 trials, these two numbers were circular shifted (one left and one right) to create two different numbers to be added. Two different programming languages were used (C and Java). We start with the hypothesis that as the problem size doubles, the execution time for the algorithm doubles as well. We would like to know that this overall behavior occurs regardless of the machine, programming language, or implementation variation used. Each variation was executed on a set of configurations:
g
C version was compiled with debugging information included.
O1, O2, O3
C version was compiled under these different optimization levels. Increasing numbers imply better performance.
Java
Java implementation of algorithm.
Table 2-3 contains the results for both add and plus. The seventh and final column compares the ratio of the performance of plus on problems of size 2n versus problems of size n. Define t(n) to be the actual running time of the Addition algorithm on an input of size n. This growth pattern provides empirical evidence of the time to compute plus for two n-digit numbers.
Table 2-3. Time (in milliseconds) to execute 10,000 add/plus invocations on random digits of size n
n |
Add-g |
Add-java |
Add-O3 |
Plus-g |
Plus-java |
Plus-O3 |
Ratio |
256 |
33 |
19 |
10 |
31 |
20 |
11 |
|
512 |
67 |
22 |
20 |
58 |
32 |
23 |
2.09 |
1024 |
136 |
49 |
40 |
126 |
65 |
46 |
2.00 |
2048 |
271 |
98 |
80 |
241 |
131 |
95 |
2.07 |
4096 |
555 |
196 |
160 |
489 |
264 |
195 |
2.05 |
8192 |
1107 |
392 |
321 |
972 |
527 |
387 |
1.98 |
16384 |
2240 |
781 |
647 |
1972 |
1052 |
805 |
2.08 |
32768 |
4604 |
1554 |
1281 |
4102 |
2095 |
1721 |
2.14 |
65536 |
9447 |
3131 |
2572 |
8441 |
4200 |
3610 |
2.10 |
131072 |
19016 |
6277 |
5148 |
17059 |
8401 |
7322 |
2.03 |
262144 |
38269 |
12576 |
10336 |
34396 |
16811 |
14782 |
2.02 |
524288 |
77147 |
26632 |
21547 |
69699 |
35054 |
30367 |
2.05 |
1048576 |
156050 |
51077 |
53916 |
141524 |
61856 |
66006 |
2.17 |
We can classify the Addition algorithm as being linear with respect to its input size n. That is, there is some constant c > 0 such that t(n) ≤ c*n for all n > n0. We don’t actually need to know the full details of the c or n0 value, just that they exist. An argument can be made to establish a linear-time lower bound on the complexity of Addition by showing that every digit must be examined (consider the consequences of not probing one of the digits).
For all plus executions (regardless of language or compilation configuration) of Addition, we can set c to 1/15 and choose n0 to be 256. Other implementations of Addition would have different constants, yet their overall behavior would still be linear. This result may seem surprising given that most programmers assume that integer arithmetic is a constant time operation; however, constant time addition is achievable only when the integer representation (such as 16-bit or 64-bit) uses a fixed integer size n.
When considering differences in algorithms, the constant c is not as important as knowing the order of the algorithm. Seemingly inconsequential differences resulted in different performance. The plus implementation of Addition is markedly more efficient after eliminating the modulo operator (%), which is notoriously slow when used with values that are not powers of 2. In this case, “% 10” is just not efficient since a division by 10 must occur, which is a costly operation on binary computers. This is not to say that we ignore the value of c. Certainly if we executeAddition a large number of times, even small changes to the actual value of c can have a large impact on the performance of a program.
n log n Performance
A common behavior in efficient algorithms is best described by this performance family. To explain how this behavior occurs in practice, let’s define t(n) to represent the time that an algorithm takes to solve an input problem instance of size n. “Divide and conquer” is an efficient way to solve a problem in which a problem of size n is divided into (roughly equal) subproblems of size n/2, which are solved recursively. The solutions of these sub-problems are combined together in some form to solve the original problem of size n. Mathematically, this can be stated as:
t(n)=2*t(n/2)+O(n)
That is, t(n) includes the cost of the two subproblems together with no more than a linear time cost to merge the results. Now, on the right side of the equation, t(n/2) is the time to solve a problem of size n/2; using the same logic, this can be represented as:
t(n/2)=2*t(n/4)+O(n/2)
and so the original equation is now:
t(n)=2*[2*t(n/4)+O(n/2)]+O(n)
If we expand this out once more, we see that:
t(n)=2*[2*[2*t(n/8)+O(n/4)]+O(n/2)]+O(n)
This last equation reduces to t(n)=8*t(n/8)+O(n)+2*O(n/2)+4*O(n/4) which can be simplified as 8*t(n/8)+3*O(n). We can then say that t(n)=2k*t(n/2k) + k*O(n). This expansion ends when 2k=n, that is, when k=log(n). In the final base case when the problem size is 1, the performance t(1) is a constant c. Thus, the closed-form formula for t(n)=n*c+O(n*log(n)). Because n*log(n) is asymptotically greater than c*n for any fixed constant c, t(n) can be simply written as O(n log n).
Quadratic Performance
Now consider a similar problem where two integers of size n are multiplied together. Example 2-4 shows an implementation of Multiplication, an elementary school algorithm.
Example 2-4. mult implementation of Multiplication in Java
public static void mult(int[] n1, int[] n2, int[] result) {
int pos = result.length-1;
// clear all values
for (int i = 0; i < result.length; i++) { result[i] = 0; }
for (int m = n1.length-1; m>=0; m--) {
int off = n1.length-1 - m;
for (int n = n2.length-1; n>=0; n--,off++) {
int prod = n1[m]*n2[n];
// compute partial total by carrying previous digit's position
result[pos-off] += prod % 10;
result[pos-off-1] += result[pos-off]/10 + prod/10;
result[pos-off] %= 10;
}
}
}
Once again, an alternative program is written, times, which eliminates the need for the costly modulo operator, and skips the innermost computations when n1[m] is zero (note that times is not shown here, but can be found in the provided code repository). The times variation contains 203 lines of generated Java code to remove the two modulo operators. Does this variation show cost savings that validate the extra maintenance and development cost in managing this generated code?
Table 2-4 shows the behavior of these implementations of Multiplication using the same random input set used when demonstrating Addition. Figure 2-4 graphically depicts the performance, showing the parabolic growth curve that is the trademark of quadratic behavior.
Figure 2-4. Comparison of mult versus times
Table 2-4. Time (in milliseconds) to execute 10,000 multiplications
n |
multn(ms) |
timesn(ms) |
mult2n/multn |
4 |
2 |
41 |
|
8 |
8 |
83 |
4 |
16 |
33 |
129 |
4.13 |
32 |
133 |
388 |
4.03 |
64 |
530 |
1276 |
3.98 |
128 |
2143 |
5009 |
4.04 |
256 |
8519 |
19014 |
3.98 |
512 |
34231 |
74723 |
4.02 |
Even though the times variation is twice as slow, both times and mult exhibit the same asymptotic performance. The ratio of mult2n/multn is roughly 4, which demonstrates that the performance of Multiplication is quadratic. Let’s define t(n) to be the actual running time of theMultiplication algorithm on an input of size n. By this definition, there must be some constant c > 0 such that t(n)≤c*n2 for all n > n0. We don’t actually need to know the full details of the c and n0 values, just that they exist. For the mult implementation of Multiplication on our platform, we can set c to 1/7 and choose n0 to be 16.
Once again, individual variations in implementation are unable to “break” the inherent quadratic performance behavior of an algorithm. However, other algorithms exist (Zuras, 1994) to multiply a pair of n-digit numbers that are significantly faster than quadratic. These algorithms are important for applications such as data encryption, in which one frequently multiplies large integers.
Less Obvious Performance Computations
In most cases, reading the description of an algorithm (as shown in Addition and Multiplication) is sufficient to classify an algorithm as being linear or quadratic. The primary indicator for quadratic, for example, is a nested loop structure. Some algorithms defy such straightforward analysis, however. Consider the GCD algorithm in Example 2-5, designed by Euclid to compute the greatest common divisor between two integers stored using arrays of digits.
Example 2-5. Euclid’s GCD algorithm
public static void gcd (int a[], int b[], int gcd[]) {
if (isZero(a)) { assign (gcd, a); return; }
if (isZero(b)) { assign (gcd, b); return; }
// ensure a and b are not modified
a = copy (a);
b = copy (b);
while (!isZero(b)) {
// last argument to subtract represents sign of result which
// we can ignore since we only subtract smaller from larger.
if (compareTo(a, b) > 0) {
subtract (a, b, gcd, new int[1]);
assign (a, gcd);
} else {
subtract (b, a, gcd, new int[1]);
assign (b, gcd);}
}
// value held in a is the computed gcd of original (a,b)
assign (gcd, a);
}
This algorithm repeatedly compares two numbers (a and b) and subtracts the smaller number from the larger until zero is reached. The implementations of the helper methods (isZero, assign, compareTo, subtract) can be found in the accompanying code repository.
This algorithm produces the greatest common divisor of two numbers, but there is no clear answer as to how many iterations will be required based on the size of the input. During each pass through the loop, either a or b is reduced and never becomes negative, so we can guarantee that the algorithm will terminate, but some GCD requests take longer than others; for example, using this algorithm, gcd(1000,1) takes 999 steps! Clearly the performance of this algorithm is more sensitive to its inputs than Addition or Multiplication, in that there are different input instances of the same size that require very different computation times. This GCD algorithm exhibits its worst-case performance when asked to compute the GCD of (10n−1, 1); it needs to process the while loop 10n−1 times! Since we have already shown that Addition and subtraction are O(n) in terms of the input size n, GCD requires n*(10n−1) operations of its loop. Converting this equation to base 2, we have n*23.3219*n-n, which exhibits exponential performance. We classify this algorithm as O(n*2n).
The gcd implementation in Example 2-5 will be outperformed handily by the ModGCD algorithm described in Example 2-6, which relies on the modulo operator to compute the integer remainder of a divided by b.
Example 2-6. ModGCD algorithm for GCD computation
public static void modgcd (int a[], int b[], int gcd[]) {
if (isZero(a)) { assign (gcd, a); return; }
if (isZero(b)) { assign (gcd, b); return; }
// align a and b to have same number of digits and work on copies
a = copy(normalize(a, b.length));
b = copy(normalize(b, a.length));
// ensure that a is greater than b. Also return trivial gcd
int rc = compareTo(a,b);
if (rc == 0) { assign (gcd, a); return; }
if (rc < 0) {
int t[] = b;
b = a;
a = t;
}
int quot[] = new int[a.length];
int remainder[] = new int[a.length];while (!isZero(b)) {
int t[] = copy (b);
divide (a, b, quot, remainder);
assign (b, remainder);
assign (a, t);
}
// value held in a is the computed gcd of (a,b).
assign (gcd, a);
}
ModGCD arrives at a solution more rapidly because it won’t waste time subtracting really small numbers from large numbers within the while loop. This difference is not simply an implementation detail; it reflects a fundamental shift in how the algorithm solves the problem.
The computations shown in Figure 2-5 (and enumerated in Table 2-5) show the result of generating 142 random n-digit numbers and computing the GCD of all 10,011 pairs of these numbers.
Figure 2-5. Comparison of gcd versus modgcd
Table 2-5. Time (in milliseconds) to execute 10,011 gcd computations
n |
modgcd |
gcd |
n2/modgcd |
n2/gcd |
modgcd2n/modgcdn |
gcd2n/gcdn |
4 |
68 |
45 |
0.24 |
0.36 |
0.23 |
0.85 |
8 |
226 |
408 |
0.28 |
0.16 |
3.32 |
9.07 |
16 |
603 |
1315 |
0.42 |
0.19 |
2.67 |
3.22 |
32 |
1836 |
4050 |
0.56 |
0.25 |
3.04 |
3.08 |
64 |
5330 |
18392 |
0.77 |
0.22 |
2.9 |
4.54 |
128 |
20485 |
76180 |
0.8 |
0.22 |
3.84 |
4.14 |
Even though the ModGCD implementation is nearly three times faster than the corresponding GCD implementation, the performance of ModGCD is quadratic, or O(n2), whereas GCD is exponential. That is, the worst-case performance of GCD (not exhibited in this small input set) is orders of magnitude slower than the worst-case performance of ModGCD.
More sophisticated algorithms for computing GCD have been designed—though most are impractical except for extremely large integers—and analysis suggests that the problem allows for more efficient algorithms.
Exponential Performance
Consider a lock with three numeric dials in sequence, each of which contains the digits from 0 to 9. Each dial can be set independently to one of these ten digits. Assume you have a found such a lock—but don’t have its combination—it is simply a matter of some manual labor to try each of the 1,000 possible combinations, from 000 to 999. To generalize this problem, assume the lock had n dials, then the total number of possibilities is 10n. Solving this problem using a brute force approach would be considered exponential performance or O(10n), in this case in base 10. In many cases, the exponential base is 2, but this performance holds true for any base b > 1.
Exponential algorithms are practical only for very small values of n. Some algorithms might have a worst-case behavior that is exponential, yet still are heavily used in practice because of their average-case behavior. A good example is the Simplex Method for solving linear programming problems.
Benchmark Operations
The Python operator ** rapidly performs exponentiation. The sample computation 2**851 is shown below.
150150336576094004599423153910185137226235191870990070733 557987815252631252384634158948203971606627616971080383694 109252383653813326044865235229218132798103200794538451818 051546732566997782908246399595358358052523086606780893692 34238529227774479195332149248
In Python, computations are relatively independent of the underlying platform. That is, computing 2851 in Java or C on most platforms would cause a numeric overflow. But a fast computation in Python yields the result shown in the example. Is it an advantage or a disadvantage that the underlying architecture is hidden from us, abstracted away? Consider the following two hypotheses:
Hypothesis H1
Computing 2n has consistent behavior, regardless of the value of n.
Hypothesis H2
Large numbers (such as shown previously in expanded form) can be treated in the same way as any other number, such as 123,827 or 997.
To refute hypothesis H1, we conduct 10,000 evaluations of 2n. The total execution time for each value of n is shown in Figure 2-6.
Figure 2-6. Execution times for computing 2x in Python
Oddly enough, the performance seems to have different behaviors, one for x smaller than 16, a second for x values around 145, and the third for x greater than 200. This behavior reveals that Python uses an ExponentiationBySquaring algorithm for computing powers using the ** operator. Manually computing 2x using a for loop would cause quadratic performance.
To refute hypothesis H2, we conduct an experiment that pre-computes the value of 2n and then evaluates the time to compute 3.14159*2n. The total execution time of these 10,000 trials is shown in Figure 2-7.
Figure 2-7. Execution times for computing large multiplication
Why do the points in Figure 2-7 not appear on a straight line? For what value of x does the line break? The multiplication operation (*) appears to be overloaded. It does different things depending upon whether the numbers being multiplied are floating-point numbers, or integers that each fit into a single word of the machine, or integers that are so large that they must each be stored in several words of the machine, or some combination of these.
The break in the plot occurs for x={64,65} and appears to correspond to a shift in the storage of large floating point numbers. Again, there may be unexpected slowdowns in computations that can only be uncovered by such benchmarking efforts.
Lower and Upper Bounds
We have simplified the presentation of the “Big O” notation in this book. For example, when discussing the behavior of the Addition algorithm that is linear with respect to its input size n, we argued that there exists some constant c > 0 such that t(n) ≤ c*n for all n > n0; recall that t(n) represents the actual running time of Addition. By this reasoning, we claim the performance of Addition is O(n). The careful reader will note that we could just as easily have used a function f(n)=c*2n that grows more rapidly than c*n. Indeed, although it is technically accurate to claim thatAddition is O(2n), such a statement provides very little information—it would be like saying that you need no more than one week to perform a five-minute task. To explain why, consider the notation Ω(g(n)), which declares that g(n) ≤ t(n) is a lower bound for the actual running time. One can often compute both the upper (O) and lower (Ω) bound for the execution time of an algorithm, in which case the appropriate notation to use is Θ(f(n)), which asserts that f(n) is asymptotically both an upper bound (as is the case for O(f(n)) and a lower bound for t(n).
We chose the more informal (and widely accepted use) of O(f(n)) to simplify the presentations and analyses. We ensure that when discussing algorithmic behavior, there is no better f’(n) that can be used to classify the algorithms we have identified as O(f(n)).
References
Bentley, Jon Louis and M. Douglas McIlroy, “Engineering a Sort Function,” Software—Practice and Experience, 23(11): 1249–1265, 1993, http://citeseer.ist.psu.edu/bentley93engineering.html.
Zuras, D. “More on Squaring and Multiplying Large Integers,” IEEE Transactions on Computers, 43(8): 899–908, 1994, http://doi.ieeecomputersociety.org/10.1109/12.295852.
[1] http://lxr.linux.no/linux+v2.6.11/fs/xfs/support/qsort.c