The Benchmarking Interlude - Functions and Generators - Learning Python (2013)

Learning Python (2013)

Part IV. Functions and Generators

Chapter 21. The Benchmarking Interlude

Now that we know about coding functions and iteration tools, we’re going to take a short side trip to put both of them to work. This chapter closes out the function part of this book with a larger case study that times the relative performance of the iteration tools we’ve met so far.

Along the way, this case study surveys Python’s code timing tools, discusses benchmarking techniques in general, and allows us to explore code that’s a bit more realistic and useful than most of what we’ve seen up to this point. We’ll also measure the speed of current Python implementations—a data point that may or may not be significant, depending on the type of code you write.

Finally, because this is the last chapter in this part of the book, we’ll close with the usual sets of “gotchas” and exercises to help you start coding the ideas you’ve read about. First, though, let’s have some fun with a tangible Python application.

Timing Iteration Alternatives

We’ve met quite a few iteration alternatives in this book. Like much in programming, they represent tradeoffs—in terms of both subjective factors like expressiveness, and more objective criteria such as performance. Part of your job as a programmer and engineer is selecting tools based on factors like these.

In terms of performance, I’ve mentioned a few times that list comprehensions sometimes have a speed advantage over for loop statements, and that map calls can be faster or slower than both depending on call patterns. The generator functions and expressions of the preceding chapter tend to be slightly slower than list comprehensions, though they minimize memory space requirements and don’t delay result generation.

All that is generally true today, but relative performance can vary over time because Python’s internals are constantly being changed and optimized, and code structure can influence speed arbitrarily. If you want to verify their performance for yourself, you need to time these alternatives on your own computer and your own version of Python.

Timing Module: Homegrown

Luckily, Python makes it easy to time code. For example, to get the total time taken to run multiple calls to a function with arbitrary positional arguments, the following first-cut function might suffice:

# File timer0.py

import time

def timer(func, *args): # Simplistic timing function

start = time.clock()

for i in range(1000):

func(*args)

return time.clock() - start # Total elapsed time in seconds

This works—it fetches time values from Python’s time module, and subtracts the system start time from the stop time after running 1,000 calls to the passed-in function with the passed-in arguments. On my computer in Python 3.3:

>>> from timer0 import timer

>>> timer(pow, 2, 1000) # Time to call pow(2, 1000) 1000 times

0.00296260674205626

>>> timer(str.upper, 'spam') # Time to call 'spam'.upper() 1000 times

0.0005165746166859719

Though simple, this timer is also fairly limited, and deliberately exhibits some classic mistakes in both function design and benchmarking. Among these, it:

§ Doesn’t support keyword arguments in the tested function call

§ Hardcodes the repetitions count

§ Charges the cost of range to the tested function’s time

§ Always uses time.clock, which might not be best outside Windows

§ Doesn’t give callers a way to verify that the tested function actually worked

§ Only gives total time, which might fluctuate on some heavily loaded machines

In other words, timing code is more complex than you might expect! To be more general and accurate, let’s expand this into still simple but more useful timer utility functions we can use both to see how iteration alternative options stack up now, and apply to other timing needs in the future. These functions are coded in a module file so they can be used in a variety of programs, and have docstrings giving some basic details that PyDoc can display on request—see Figure 15-2 in Chapter 15 for a screenshot of the documentation pages rendered for the timing modules we’re coding here:

# File timer.py

"""

Homegrown timing tools for function calls.

Does total time, best-of time, and best-of-totals time

"""

import time, sys

timer = time.clock if sys.platform[:3] == 'win' else time.time

def total(reps, func, *pargs, **kargs):

"""

Total time to run func() reps times.

Returns (total time, last result)

"""

repslist = list(range(reps)) # Hoist out, equalize 2.x, 3.x

start = timer() # Or perf_counter/other in 3.3+

for i in repslist:

ret = func(*pargs, **kargs)

elapsed = timer() - start

return (elapsed, ret)

def bestof(reps, func, *pargs, **kargs):

"""

Quickest func() among reps runs.

Returns (best time, last result)

"""

best = 2 ** 32 # 136 years seems large enough

for i in range(reps): # range usage not timed here

start = timer()

ret = func(*pargs, **kargs)

elapsed = timer() - start # Or call total() with reps=1

if elapsed < best: best = elapsed # Or add to list and take min()

return (best, ret)

def bestoftotal(reps1, reps2, func, *pargs, **kargs):

"""

Best of totals:

(best of reps1 runs of (total of reps2 runs of func))

"""

return bestof(reps1, total, reps2, func, *pargs, **kargs)

Operationally, this module implements both total time and best time calls, and a nested best of totals that combines the other two. In each, it times a call to any function with any positional and keyword arguments passed individually, by fetching the start time, calling the function, and subtracting the start time from the stop time. Points to notice about how this version addresses the shortcomings of its predecessor:

§ Python’s time module gives access to the current time, with precision that varies per platform. On Windows its clock function is claimed to give microsecond granularity and so is very accurate. Because the time function may be better on Unix, this script selects between them automatically based on the platform string in the sys module; it starts with “win” if running in Windows. See also the sidebar New Timer Calls in 3.3 on other time options in 3.3 and later not used here for portability; we will also be timing Python 2.X where these newer calls are not available, and their results on Windows appear similar in 3.3 in any event.

§ The range call is hoisted out of the timing loop in the total function, so its construction cost is not charged to the timed function in Python 2.X. In 3.X range is an iterable, so this step is neither required nor harmful, but we still run the result through list so its traversal cost is the same in both 2.X and 3.X. This doesn’t apply to the bestof function, since no range factors are charged to the test’s time.

§ The reps count is passed in as an argument, before the test function and its arguments, to allow repetition to vary per call.

§ Any number of both positional and keyword arguments are collected with starred-argument syntax, so they must be sent individually, not in a sequence or dictionary. If needed, callers can unpack argument collections into individual arguments with stars in the call, as done by thebestoftotal function at the end. See Chapter 18 for a refresher if this code doesn’t make sense.

§ The first function in this module returns total elapsed time for all calls in a tuple, along with the timed function’s final return value so callers can verify its operation.

§ The second function does similar, but returns the best (minimum) time among all calls instead of the total—more useful if you wish to filter out the impacts of other activity on your computer, but less for tests that run too quickly to produce substantial runtimes.

§ To address the prior point, the last function in this file runs nested total tests within a best-of test, to get the best-of-totals time. The nested total operation can make runtimes more useful, but we still get the best-of filter. This function’s code may be easier to understand if you remember that every function is a passable object, even the testing functions themselves.

From a larger perspective, because these functions are coded in a module file, they become generally useful tools anywhere we wish to import them. Modules and imports were introduced in Chapter 3, and you’ll learn more about them in the next part of this book; for now, simply import the module and call the function to use one of this file’s timers. In simple usage, this module is similar to its predecessor, but will be more robust in larger contexts. In Python 3.3 again:

>>> import timer

>>> timer.total(1000, pow, 2, 1000)[0] # Compare to timer0 results above

0.0029542985410557776

>>> timer.total(1000, str.upper, 'spam') # Returns (time, last call's result)

(0.000504845391709686, 'SPAM')

>>> timer.bestof(1000, str.upper, 'spam') # 1/1000 as long as total time

(4.887177027512735e-07, 'SPAM')

>>> timer.bestof(1000, pow, 2, 1000000)[0]

0.00393515497972885

>>> timer.bestof(50, timer.total, 1000, str.upper, 'spam')

(0.0005468751145372153, (0.0005004469323637295, 'SPAM'))

>>> timer.bestoftotal(50, 1000, str.upper, 'spam')

(0.000566912540591602, (0.0005195069228989269, 'SPAM'))

The last two calls here calculate the best-of-totals times—the lowest time among 50 runs, each of which computes the total time to call str.upper 1,000 times (roughly corresponding to the total times at the start of this listing). The function used in the last call is really just a convenience that maps to the call form preceding it; both return the best-of tuple, which embeds the last total call’s result tuple.

Compare these last two results to the following generator-based alternative:

>>> min(timer.total(1000, str.upper, 'spam') for i in range(50))

(0.0005155971812769167, 'SPAM')

Taking the min of an iteration of total results this way has a similar effect because the times in the result tuples dominate comparisons made by min (they are leftmost in the tuple). We could use this in our module too (and will in later variations); it varies slightly by omitting a very small overhead in the best-of function’s code and not nesting result tuples, though either result suffices for relative comparisons. As is, the best-of function must pick a high initial lowest time value—though 136 years is probably longer than most of the tests you’re likely to run!

>>> ((((2 ** 32) / 60) / 60) / 24) / 365 # Plus a few extra days

136.19251953323186

>>> ((((2 ** 32) // 60) // 60) // 24) // 365 # Floor: see Chapter 5

136

NEW TIMER CALLS IN 3.3

This section uses the time module’s clock and time calls because they apply to all readers of this book. Python 3.3 introduces new interfaces in this module that are designed to be more portable. Specifically, the behavior of this module’s clock and time calls varies per platform, but its new perf_counter and process_time functions have well-defined and platform-neutral semantics:

§ time.perf_counter() returns the value in fractional seconds of a performance counter, defined as a clock with the highest available resolution to measure a short duration. It includes time elapsed during sleep states and is system-wide.

§ time.process_time() returns the value in fractional seconds of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep, and is process-wide by definition.

For both of these calls, the reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid. The perf_counter call can be thought of as wall time, and as of Python 3.3 is used by default for benchmarking in the timeit module discussed ahead; process_time gives CPU time portably.

The time.clock call is still usable on Windows today, as shown in this book. It is documented as being deprecated in 3.3’s manuals, but issues no warning when used there—meaning it may or may not become officially deprecated in later releases. If needed, you can detect a Python 3.3 or later with code like this, which I opted to not use for the sake of brevity and timer comparability:

if sys.version_info[0] >= 3 and sys.version_info[1] >= 3:

timer = time.perf_counter # or process_time

else:

timer = time.clock if sys.platform[:3] == 'win' else time.time

Alternatively, the following code would also add portability and insulate you from future deprecations, though it depends on exception topics we haven’t studied in full yet, and its choices may also make cross-version speed comparisons invalid—timers may differ in resolution!

try:

timer = time.perf_counter # or process_time

except AttributeError:

timer = time.clock if sys.platform[:3] == 'win' else time.time

If I were writing this book for Python 3.3+ readers only, I’d use the new and apparently improved calls here, and you should in your work too if they apply to you. The newer calls won’t work for users of any other Pythons, though, and that’s still the majority of the Python world today. It would be easier to pretend that the past doesn’t matter, but that would not only be evasive of reality, it might also be just plain rude.

Timing Script

Now, to time iteration tool speed (our original goal), run the following script—it uses the timer module we wrote to time the relative speeds of the list construction techniques we’ve studied:

# File timeseqs.py

"Test the relative speed of iteration tool alternatives."

import sys, timer # Import timer functions

reps = 10000

repslist = list(range(reps)) # Hoist out, list in both 2.X/3.X

def forLoop():

res = []

for x in repslist:

res.append(abs(x))

return res

def listComp():

return [abs(x) for x in repslist]

def mapCall():

return list(map(abs, repslist)) # Use list() here in 3.X only!

# return map(abs, repslist)

def genExpr():

return list(abs(x) for x in repslist) # list() required to force results

def genFunc():

def gen():

for x in repslist:

yield abs(x)

return list(gen()) # list() required to force results

print(sys.version)

for test in (forLoop, listComp, mapCall, genExpr, genFunc):

(bestof, (total, result)) = timer.bestoftotal(5, 1000, test)

print ('%-9s: %.5f => [%s...%s]' %

(test.__name__, bestof, result[0], result[-1]))

This script tests five alternative ways to build lists of results. As shown, its reported times reflect on the order of 10 million steps for each of the five test functions—each builds a list of 10,000 items 1,000 times. This process is repeated 5 times to get the best-of time for each of the 5 test functions, yielding a whopping 250 million total steps for the script at large (impressive but reasonable on most machines these days).

Notice how we have to run the results of the generator expression and function through the built-in list call to force them to yield all of their values; if we did not, in both 2.X and 3.X we would just produce generators that never do any real work. In Python 3.X only we must do the same for the map result, since it is now an iterable object as well; for 2.X, the list around map must be removed manually to avoid charging an extra list construction overhead per test (though its impact seems negligible in most tests).

In a similar way, the inner loops’ range result is hoisted out to the top of the module to remove its construction cost from total time, and wrapped in a list call so that its traversal cost isn’t skewed by being a generator in 3.X only (much as we did in the timer module too). This may be overshadowed by the cost of the inner iterations loop, but it’s best to remove as many variables as we can.

Also notice how the code at the bottom steps through a tuple of five function objects and prints the __name__ of each: as we’ve seen, this is a built-in attribute that gives a function’s name.[42]

Timing Results

When the script of the prior section is run under Python 3.3, I get these results on my Windows 7 laptop—map is slightly faster than list comprehensions, both are quicker than for loops, and generator expressions and functions place in the middle (times here are total time in seconds):

C:\code> c:\python33\python timeseqs.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

forLoop : 1.33290 => [0...9999]

listComp : 0.69658 => [0...9999]

mapCall : 0.56483 => [0...9999]

genExpr : 1.08457 => [0...9999]

genFunc : 1.07623 => [0...9999]

If you study this code and its output long enough, you’ll notice that generator expressions run slower than list comprehensions today. Although wrapping a generator expression in a list call makes it functionally equivalent to a square-bracketed list comprehension, the internalimplementations of the two expressions appear to differ (though we’re also effectively timing the list call for the generator test):

return [abs(x) for x in repslist] # 0.69 seconds

return list(abs(x) for x in repslist) # 1.08 seconds: differs internally

Though the exact cause would require deeper analysis (and possibly source code study), this seems to make sense given that the generator expression must do extra work to save and restore its state during value production; the list comprehension does not, and runs quicker by a small constant here and in later tests.

Interestingly, when I ran this on Windows Vista under Python 3.0 for the fourth edition of this book, and on Windows XP with Python 2.5 for the third, the results were relatively similar—list comprehensions were nearly twice as fast as equivalent for loop statements, and map was slightly quicker than list comprehensions when mapping a function such as the abs (absolute value) built-in this way. Python 2.5’s absolute times were roughly four to five times slower than the current 3.3 output, but this likely reflects quicker laptops much more than any improvements in Python.

In fact, most of the Python 2.7 results for this script are slightly quicker than 3.3 on this same machine today—I removed the list call from the map test in the following to avoid creating the results list twice in that test, though it adds only a very small constant time if left in:

c:\code> c:\python27\python timeseqs.py

2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]

forLoop : 1.24902 => [0...9999]

listComp : 0.66970 => [0...9999]

mapCall : 0.57018 => [0...9999]

genExpr : 0.90339 => [0...9999]

genFunc : 0.90542 => [0...9999]

For comparison, following are the same tests’ speed results under the current PyPy, the optimized Python implementation discussed in Chapter 2, whose current 1.9 release implements the Python 2.7 language. PyPy is roughly 10X (an order of magnitude) quicker here; it will do even better when we revisit Python version comparisons later in this chapter using tools with different code structures (though it will lose on a few other tests as well):

c:\code> c:\PyPy\pypy-1.9\pypy.exe timeseqs.py

2.7.2 (341e1e3821ff, Jun 07 2012, 15:43:00)

[PyPy 1.9.0 with MSC v.1500 32 bit]

forLoop : 0.10106 => [0...9999]

listComp : 0.05629 => [0...9999]

mapCall : 0.10022 => [0...9999]

genExpr : 0.17234 => [0...9999]

genFunc : 0.17519 => [0...9999]

On PyPy alone, list comprehensions beat map in this test, but the fact that all of PyPy’s results are so much quicker today seems the larger point here. On CPython, map is still quickest so far.

The impact of function calls: map

Watch what happens, though, if we change this script to perform an inline operation on each iteration, such as addition, instead of calling a built-in function like abs (the omitted parts of the following file are the same as before, and I put list back in around map for testing on 3.3 only):

# File timeseqs2.py (differing parts)

...

def forLoop():

res = []

for x in repslist:

res.append(x + 10)

return res

def listComp():

return [x + 10 for x in repslist]

def mapCall():

return list(map((lambda x: x + 10), repslist)) # list() in 3.X only

def genExpr():

return list(x + 10 for x in repslist) # list() in 2.X + 3.X

def genFunc():

def gen():

for x in repslist:

yield x + 10

return list(gen()) # list in 2.X + 3.X

...

Now the need to call a user-defined function for the map call makes it slower than the for loop statements, despite the fact that the looping statements version is larger in terms of code—or equivalently, the removal of function calls may make the others quicker (more on this in an upcoming note). On Python 3.3:

c:\code> c:\python33\python timeseqs2.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

forLoop : 1.35136 => [10...10009]

listComp : 0.73730 => [10...10009]

mapCall : 1.68588 => [10...10009]

genExpr : 1.10963 => [10...10009]

genFunc : 1.11074 => [10...10009]

These results have also been consistent in CPython. The prior edition’s Python 3.0 results on a slower machine were again relatively similar, though about twice as slow due to test machine differences (Python 2.5 results on an even slower machine were again four to five times as slow as the current results).

Because the interpreter optimizes so much internally, performance analysis of Python code like this is a very tricky affair. Without numbers, though, it’s virtually impossible to guess which method will perform the best—the best you can do is time your own code, on your computer, with your version of Python.

In this case, what we can say for certain is that on this Python, using a user-defined function in map calls seems to slow performance substantially (though + may also be slower than a trivial abs), and that list comprehensions run quickest in this case (though slower than map in some others). List comprehensions seem consistently twice as fast as for loops, but even this must be qualified—the list comprehension’s relative speed might be affected by its extra syntax (e.g., if filters), Python changes, and usage modes we did not time here.

As I’ve mentioned before, however, performance should not be your primary concern when writing Python code—the first thing you should do to optimize Python code is to not optimize Python code! Write for readability and simplicity first, then optimize later, if and only if needed. It could very well be that any of the five alternatives is quick enough for the data sets your program needs to process; if so, program clarity should be the chief goal.

NOTE

For deeper truth, change this code to apply a simple user-defined function in all five iteration techniques timed. For instance (from timeseqs2B.py of the book’s examples):

def F(x): return x

def listComp():

return [F(x) for x in repslist]

def mapCall():

return list(map(F, repslist))

The results, in file timeseqs-results.txt, are then relatively similar to using a built-in function like abs—at least in CPython, map is quickest. More generally, among the five iteration techniques, map is fastest today if all five call any function, built in or not, but slowest when the others do not.

That is, map appears to be slower simply because it requires function calls, and function calls are relatively slow in general. Since map can’t avoid calling functions, it can lose simply by association! The other iteration tools win because they can operate without function calls. We’ll prove this finding in tests run under the timeit module ahead.

Timing Module Alternatives

The timing module of the preceding section works, but it could be a bit more user-friendly. Most obviously, its functions require passing in a repetitions count as a first argument, and provide no default for it—a minor point, perhaps, but less than ideal in a general-purpose tool. We could also leverage the min technique we saw earlier to simplify the return value slightly and remove a minor overhead charge.

The following implements an alternative timer module that addresses these points, allowing the repeat count to be passed in as a keyword argument named _reps:

# File timer2.py (2.X and 3.X)

"""

total(spam, 1, 2, a=3, b=4, _reps=1000) calls and times spam(1, 2, a=3, b=4)

_reps times, and returns total time for all runs, with final result.

bestof(spam, 1, 2, a=3, b=4, _reps=5) runs best-of-N timer to attempt to

filter out system load variation, and returns best time among _reps tests.

bestoftotal(spam 1, 2, a=3, b=4, _rep1=5, reps=1000) runs best-of-totals

test, which takes the best among _reps1 runs of (the total of _reps runs);

"""

import time, sys

timer = time.clock if sys.platform[:3] == 'win' else time.time

def total(func, *pargs, **kargs):

_reps = kargs.pop('_reps', 1000) # Passed-in or default reps

repslist = list(range(_reps)) # Hoist range out for 2.X lists

start = timer()

for i in repslist:

ret = func(*pargs, **kargs)

elapsed = timer() - start

return (elapsed, ret)

def bestof(func, *pargs, **kargs):

_reps = kargs.pop('_reps', 5)

best = 2 ** 32

for i in range(_reps):

start = timer()

ret = func(*pargs, **kargs)

elapsed = timer() - start

if elapsed < best: best = elapsed

return (best, ret)

def bestoftotal(func, *pargs, **kargs):

_reps1 = kargs.pop('_reps1', 5)

return min(total(func, *pargs, **kargs) for i in range(_reps1))

This module’s docstring at the top of the file describes its intended usage. It uses dictionary pop operations to remove the _reps argument from arguments intended for the test function and provide it with a default (it has an unusual name to avoid clashing with real keyword arguments meant for the function being timed).

Notice how the best of totals here uses the min and generator scheme we saw earlier instead of nested calls, in part because this simplifies results and avoids a minor time overhead in the prior version (whose code fetches best of time after total time has been computed), but also because it must support two distinct repetition keywords with defaults—total and bestof can’t both use the same argument name. Add argument prints in the code if it would help to trace its operation.

To test with this new timer module, you can change the timing scripts as follows, or use the precoded version in the book’s examples file timeseqs_timer2.py; the results are essentially the same as before (this is primarily just an API change), so I won’t list them again here:

import sys, timer2

...

for test in (forLoop, listComp, mapCall, genExpr, genFunc):

(total, result) = timer2.bestoftotal(test, _reps1=5, _reps=1000)

# Or:

# (total, result) = timer2.bestoftotal(test)

# (total, result) = timer2.bestof(test, _reps=5)

# (total, result) = timer2.total(test, _reps=1000)

# (bestof, (total, result)) = timer2.bestof(timer2.total, test, _reps=5)

print ('%-9s: %.5f => [%s...%s]' %

(test.__name__, total, result[0], result[-1]))

You can also run a few interactive tests as we did for the original version—the results are again essentially the same as before, but we pass in the repetition counts as keywords that provide defaults if omitted; in Python 3.3:

>>> from timer2 import total, bestof, bestoftotal

>>> total(pow, 2, 1000)[0] # 2 ** 1000, 1K dflt reps

0.0029562534118596773

>>> total(pow, 2, 1000, _reps=1000)[0] # 2 ** 1000, 1K reps

0.0029733585316193967

>>> total(pow, 2, 1000, _reps=1000000)[0] # 2 ** 1000, 1M reps

1.2451676814889865

>>> bestof(pow, 2, 100000)[0] # 2 ** 100K, 5 dflt reps

0.0007550688578703557

>>> bestof(pow, 2, 1000000, _reps=30)[0] # 2 ** 1M, best of 30

0.004040229286800923

>>> bestoftotal(str.upper, 'spam', _reps1=30, _reps=1000) # Best of 30, tot of 1K

(0.0004945823198454491, 'SPAM')

>>> bestof(total, str.upper, 'spam', _reps=30) # Nested calls work too

(0.0005463863968202531, (0.0004994694969298052, 'SPAM'))

To see how keywords are supported now, define a function with more arguments and pass some by name:

>>> def spam(a, b, c, d): return a + b + c + d

>>> total(spam, 1, 2, c=3, d=4, _reps=1000)

(0.0009730369554290519, 10)

>>> bestof(spam, 1, 2, c=3, d=4, _reps=1000)

(9.774353202374186e-07, 10)

>>> bestoftotal(spam, 1, 2, c=3, d=4, _reps1=1000, _reps=1000)

(0.00037289161070930277, 10)

>>> bestoftotal(spam, *(1, 2), _reps1=1000, _reps=1000, **dict(c=3, d=4))

(0.00037289161070930277, 10)

Using keyword-only arguments in 3.X

One last point on this thread: we can also make use of Python 3.X keyword-only arguments here to simplify the timer module’s code. As we learned in Chapter 18, keyword-only arguments are ideal for configuration options such as our functions’ _reps argument. They must be coded after a* and before a ** in the function header, and in a function call they must be passed by keyword and appear before the ** if used. The following is a keyword-only-based alternative to the prior module. Though simpler, it compiles and runs under Python 3.X only, not 2.X:

# File timer3.py (3.X only)

"""

Same usage as timer2.py, but uses 3.X keyword-only default arguments

instead of dict pops for simpler code. No need to hoist range() out

of tests in 3.X: always a generator in 3.X, and this can't run on 2.X.

"""

import time, sys

timer = time.clock if sys.platform[:3] == 'win' else time.time

def total(func, *pargs, _reps=1000, **kargs):

start = timer()

for i in range(_reps):

ret = func(*pargs, **kargs)

elapsed = timer() - start

return (elapsed, ret)

def bestof(func, *pargs, _reps=5, **kargs):

best = 2 ** 32

for i in range(_reps):

start = timer()

ret = func(*pargs, **kargs)

elapsed = timer() - start

if elapsed < best: best = elapsed

return (best, ret)

def bestoftotal(func, *pargs, _reps1=5, **kargs):

return min(total(func, *pargs, **kargs) for i in range(_reps1))

This version is used the same way as the prior version and produces identical results, so I won’t relist its outputs on the same tests here; experiment on your own as you wish. If you do, pay attention to the argument ordering rules in calls. A former bestof that ran total, for instance, called like this:

(elapsed, ret) = total(func, *pargs, _reps=1, **kargs)

See Chapter 18 for more on keyword-only arguments in 3.X; they can simplify code for configurable tools like this one but are not backward compatible with 2.X Pythons. If you want to compare 2.X and 3.X speed, or support programmers using either Python line, the prior version is likely a better choice.

Also keep in mind that for trivial functions like some of those tested for the prior version, the costs of the timer’s code may sometimes be as significant as those of a simple timed function, so you should not take timer results too absolutely. The timer’s results can help you judge relativespeeds of coding alternatives, though, and may be more meaningful for operations that run longer or are repeated often.

Other Suggestions

For more insight, try modifying the repetition counts used by these modules, or explore the alternative timeit module in Python’s standard library, which automates timing of code, supports command-line usage modes, and finesses some platform-specific issues—in fact, we’ll put it to work in the next section.

You might also want to look at the profile standard library module for a complete source code profiler tool. We’ll learn more about it in Chapter 36 in the context of development tools for large projects. In general, you should profile code to isolate bottlenecks before recoding and timing alternatives as we’ve done here.

You might try modifying or emulating the timing script to measure the speed of the 3.X and 2.7 set and dictionary comprehensions shown in the preceding chapter, and their for loop equivalents. Using them is less common in Python programs than building lists of results, so we’ll leave this task in the suggested exercise column (please, no wagering...); the next section will partly spoil the surprise.

Finally, keep the timing module we wrote here filed away for future reference—we’ll repurpose it to measure performance of alternative numeric square root operations in an exercise at the end of this chapter. If you’re interested in pursuing this topic further, we’ll also experiment with techniques for timing dictionary comprehensions versus for loops interactively in the exercises.


[42] A preview: notice how we must pass functions into the timer manually here. In Chapter 39 and Chapter 40 we’ll see decorator-based timer alternatives with which timed functions are called normally, but require extra “@” syntax where defined. Decorators may be more useful to instrument functions with timing logic when they are already being used within a larger system, and don’t as easily support the more isolated test call patterns assumed here—when decorated, every call to the function runs the timing logic, which is either a plus or minus depending on your goals.

Timing Iterations and Pythons with timeit

The preceding section used homegrown timing functions to compare code speed. As mentioned there, the standard library also ships with a module named timeit that can be used in similar ways, but offers added flexibility and may better insulate clients from some platform differences.

As usual in Python, it’s important to understand fundamental principles like those illustrated in the prior section. Python’s “batteries included” approach means you’ll usually find precoded options as well, though you still need to know the ideas underlying them to use them properly. Indeed, this module is a prime example of this—it seems to have had a history of being misused by people who don’t yet understand the principles it embodies. Now that we’ve learned the basics, though, let’s move ahead to a tool that can automate much of our work.

Basic timeit Usage

Let’s start with this module’s fundamentals before leveraging them in larger scripts. With timeit, tests are specified by either callable objects or statement strings; the latter can hold multiple statements if they use ; separators or \n characters for line breaks, and spaces or tabs to indent statements in nested blocks (e.g., \n\t). Tests may also give setup actions, and can be launched from both command lines and API calls, and from both scripts and the interactive prompt.

Interactive usage and API calls

For example, the timeit module’s repeat call returns a list giving the total time taken to run a test a number of times, for each of repeat runs—the min of this list yields the best time among the runs, and helps filter out system load fluctuations that can otherwise skew timing results artificially high.

The following shows this call in action, timing a list comprehension on two versions of CPython and the optimized PyPy implementation of Python described in Chapter 2 (it currently supports Python 2.7 code). The results here give the best total time in seconds among 5 runs that each execute the code string 1,000 times; the code string itself constructs a 1,000-item list of integers each time through (see Appendix B for the Windows launcher used for variety in the first two of these commands):

c:\code> py −3

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit...

>>> import timeit

>>> min(timeit.repeat(stmt="[x ** 2 for x in range(1000)]", number=1000, repeat=5))

0.5062382371756811

c:\code> py −2

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32

>>> import timeit

>>> min(timeit.repeat(stmt="[x ** 2 for x in range(1000)]", number=1000, repeat=5))

0.0708020004193198

c:\code> c:\pypy\pypy-1.9\pypy.exe

Python 2.7.2 (341e1e3821ff, Jun 07 2012, 15:43:00)

[PyPy 1.9.0 with MSC v.1500 32 bit] on win32

>>>> import timeit

>>>> min(timeit.repeat(stmt="[x ** 2 for x in range(1000)]", number=1000, repeat=5))

0.0059330329674303905

You’ll notice that PyPy checks in at 10X faster than CPython 2.7 here, and a whopping 100X faster than CPython 3.3, despite the fact that PyPy is a potentially slower 32-bit build. This is a small artificial benchmark, of course, but seems arguably stunning nonetheless, and reflects a relative speed ranking that is generally supported by other tests run in this book (though as we’ll see, CPython still beats PyPy on some types of code).

This particular test measures the speed of both a list comprehension and integer math. The latter varies between lines: CPython 3.X has a single integer type, and CPython 2.X has both short and long integers. This may explain part of the size of the difference, but the results are valid nonetheless. Noninteger tests yield similar rankings (e.g., a floating-point test in the solutions to this part’s exercises), and integer math matters—the one and two order of magnitude (power of 10) speedups here will be realized by many real programs, because integers and iterations are ubiquitous in Python code.

These results also differ from the preceding section’s relative version speeds, where CPython 2.7 was slightly quicker than 3.3, and PyPy was 10X quicker overall, a figure affirmed by most other tests in this book too. Apart from the different type of code being timed here, the different coding structure inside timeit may have an effect too—for code strings like those tested here, timeit builds, compiles, and executes a function def statement string that embeds the test string, thereby avoiding a function call per inner loop. As we’ll see in the next section, though, this appears irrelevant from a relative-speed perspective.

Command-line usage

The timeit module has reasonable defaults and can be also run as a script, either by explicit filename or automatically located on the module search path with Python’s –m flag (see Appendix A). All the following run Python (a.k.a. CPython) 3.3. In this mode timeit reports the average time for a single –n loop, in either microseconds (labeled “usec”), milliseconds (“msec”), or seconds (“sec”); to compare results here to the total time values reported by other tests, multiply by the number of loops run—500 usec here * 1,000 loops is 500 msec, or half a second in total time:

c:\code> C:\python33\Lib\timeit.py -n 1000 "[x ** 2 for x in range(1000)]"

1000 loops, best of 3: 506 usec per loop

c:\code> python -m timeit -n 1000 "[x ** 2 for x in range(1000)]"

1000 loops, best of 3: 504 usec per loop

c:\code> py −3 -m timeit -n 1000 -r 5 "[x ** 2 for x in range(1000)]"

1000 loops, best of 5: 505 usec per loop

As an example, we can use command lines to verify that choice of timer call doesn’t impact cross-version speed comparisons run in this chapter so far—3.3 uses its new calls by default, and that might matter if timer precision differs widely. To prove that this is irrelevant, the following uses the -c flag to force timeit to use time.clock in all versions, an option that 3.3’s manuals call deprecated, but required to even the score with prior versions (I’m setting my system path to include PyPy here for command brevity):

c:\code> set PATH=%PATH%;C:\pypy\pypy-1.9

c:\code> py −3 -m timeit -n 1000 -r 5 -c "[x ** 2 for x in range(1000)]"

1000 loops, best of 5: 502 usec per loop

c:\code> py −2 -m timeit -n 1000 -r 5 -c "[x ** 2 for x in range(1000)]"

1000 loops, best of 5: 70.6 usec per loop

c:\code> pypy -m timeit -n 1000 -r 5 -c "[x ** 2 for x in range(1000)]"

1000 loops, best of 5: 5.44 usec per loop

C:\code> py −3 -m timeit -n 1000 -r 5 -c "[abs(x) for x in range(10000)]"

1000 loops, best of 5: 815 usec per loop

C:\code> py −2 -m timeit -n 1000 -r 5 -c "[abs(x) for x in range(10000)]"

1000 loops, best of 5: 700 usec per loop

C:\code> pypy -m timeit -n 1000 -r 5 -c "[abs(x) for x in range(10000)]"

1000 loops, best of 5: 61.7 usec per loop

These results are essentially the same as those for earlier tests in this chapter on the same types of code. When applying x ** 2, CPython 2.7 and PyPy are again 10X and 100X faster than CPython 3.3, respectively, showing that timer choice isn’t a factor. For the abs(x) we timed under the homegrown timer earlier (timeseqs.py), these two Pythons are faster than 3.3 by a small constant and 10X just as before, implying that timeit’s different code structure doesn’t impact relative comparisons—the type of code being tested fully determines the size of speed differences.

Subtle point: notice that the results of the last three of these tests, which mimic tests run for the homegrown timer earlier, are basically the same as before, but seem to incur a small net overhead for range usage differences—it was a prebuilt list formerly, but here is either a 3.X generator or a 2.X list built anew on each inner total loop. In other words, we’re not timing the exact same thing, but the relative speeds of the Pythons tested are the same.

Timing multiline statements

To time larger multiline sections of code in API call mode, use line breaks and tabs or spaces to satisfy Python’s syntax; code read from a source file already will. Because you pass Python string objects to a Python function in this mode, there are no shell considerations, though be careful to escape nested quotes if needed. The following, for instance, times Chapter 13 loop alternatives in Python 3.3; you can use the same pattern to time the file-line-reader alternatives in Chapter 14:

c:\code> py −3

>>> import timeit

>>> min(timeit.repeat(number=10000, repeat=3,

stmt="L = [1, 2, 3, 4, 5]\nfor i in range(len(L)): L[i] += 1"))

0.01397292797131814

>>> min(timeit.repeat(number=10000, repeat=3,

stmt="L = [1, 2, 3, 4, 5]\ni=0\nwhile i < len(L):\n\tL[i] += 1\n\ti += 1"))

0.015452276471516813

>>> min(timeit.repeat(number=10000, repeat=3,

stmt="L = [1, 2, 3, 4, 5]\nM = [x + 1 for x in L]"))

0.009464995838568635

To run multiline statements like these in command-line mode, appease your shell by passing each statement line as a separate argument, with whitespace for indentation—timeit concatenates all the lines together with a newline character between them, and later reindents for its own statement nesting purposes. Leading spaces may work better for indentation than tabs in this mode, and be sure to quote the code arguments if required by your shell:

c:\code> py −3 -m timeit -n 1000 -r 3 "L = [1,2,3,4,5]" "i=0" "while i < len(L):"

" L[i] += 1" " i += 1"

1000 loops, best of 3: 1.54 usec per loop

c:\code> py −3 -m timeit -n 1000 -r 3 "L = [1,2,3,4,5]" "M = [x + 1 for x in L]"

1000 loops, best of 3: 0.959 usec per loop

Other usage modes: Setup, totals, and objects

The timeit module also allows you to provide setup code that is run in the main statement’s scope, but whose time is not charged to the main statement’s total—potentially useful for initialization code you wish to exclude from total time, such as imports of required modules, test function definition, and test data creation. Because they’re run in the same scope, any names created by setup code are available to the main test statement; names defined in the interactive shell generally are not.

To specify setup code, use a –s in command-line mode (or many of these for multiline setups) and a setup argument string in API call mode. This can focus tests more sharply, as in the following, which splits list initialization off to a setup statement to time just iteration. As a rule of thumb, though, the more code you include in a test statement, the more applicable its results will generally be to realistic code:

c:\code> python -m timeit -n 1000 -r 3 "L = [1,2,3,4,5]" "M = [x + 1 for x in L]"

1000 loops, best of 3: 0.956 usec per loop

c:\code> python -m timeit -n 1000 -r 3 -s "L = [1,2,3,4,5]" "M = [x + 1 for x in L]"

1000 loops, best of 3: 0.775 usec per loop

Here’s a setup example in API call mode: I used the following type of code to time the sort-based option in Chapter 18’s minimum value example—ordered ranges sort much faster than random numbers, and are faster sorted than scanned linearly in the example’s code under 3.3 (adjacent strings are concatenated here):

>>> from timeit import repeat

>>> min(repeat(number=1000, repeat=3,

setup='from mins import min1, min2, min3\n'

'vals=list(range(1000))',

stmt= 'min3(*vals)'))

0.0387865921275079

>>> min(repeat(number=1000, repeat=3,

setup='from mins import min1, min2, min3\n'

'import random\nvals=[random.random() for i in range(1000)]',

stmt= 'min3(*vals)'))

0.275656482278373

With timeit, you can also ask for just total time, use the module’s class API, time callable objects instead of strings, accept automatic loop counts, and use class-based techniques and additional command-line switches and API argument options we don’t have space to show here—consult Python’s library manual for more details:

c:\code> py −3

>>> import timeit

>>> timeit.timeit(stmt='[x ** 2 for x in range(1000)]', number=1000) # Total time

0.5238125259325834

>>> timeit.Timer(stmt='[x ** 2 for x in range(1000)]').timeit(1000) # Class API

0.5282652329644009

>>> timeit.repeat(stmt='[x ** 2 for x in range(1000)]', number=1000, repeat=3)

[0.5299034147194845, 0.5082454007998365, 0.5095136232504416]

>>> def testcase():

y = [x ** 2 for x in range(1000)] # Callable objects or code strings

>>> min(timeit.repeat(stmt=testcase, number=1000, repeat=3))

0.5073828140463377

Benchmark Module and Script: timeit

Rather than go into more details on this module, let’s study a program that deploys it to time both coding alternatives and Python versions. The following file, pybench.py, is set up to time a set of statements coded in scripts that import and use it, under either the version running its code or all Python versions named in a list. It uses some application-level tools described ahead. Because it mostly applies ideas we’ve already learned and is amply documented, though, I’m going to list this as mostly self-study material, and an exercise in reading Python code.

"""

pybench.py: Test speed of one or more Pythons on a set of simple

code-string benchmarks. A function, to allow stmts to vary.

This system itself runs on both 2.X and 3.X, and may spawn both.

Uses timeit to test either the Python running this script by API

calls, or a set of Pythons by reading spawned command-line outputs

(os.popen) with Python's -m flag to find timeit on module search path.

Replaces $listif3 with a list() around generators for 3.X and an

empty string for 2.X, so 3.X does same work as 2.X. In command-line

mode only, must split multiline statements into one separate quoted

argument per line so all will be run (else might run/time first line

only), and replace all \t in indentation with 4 spaces for uniformity.

Caveats: command-line mode (only) may fail if test stmt embeds double

quotes, quoted stmt string is incompatible with shell in general, or

command-line exceeds a length limit on platform's shell--use API call

mode or homegrown timer; does not yet support a setup statement: as is,

time of all statements in the test stmt are charged to the total time.

"""

import sys, os, timeit

defnum, defrep= 1000, 5 # May vary per stmt

def runner(stmts, pythons=None, tracecmd=False):

"""

Main logic: run tests per input lists, caller handles usage modes.

stmts: [(number?, repeat?, stmt-string)], replaces $listif3 in stmt

pythons: None=this python only, or [(ispy3?, python-executable-path)]

"""

print(sys.version)

for (number, repeat, stmt) in stmts:

number = number or defnum

repeat = repeat or defrep # 0=default

if not pythons:

# Run stmt on this python: API call

# No need to split lines or quote here

ispy3 = sys.version[0] == '3'

stmt = stmt.replace('$listif3', 'list' if ispy3 else '')

best = min(timeit.repeat(stmt=stmt, number=number, repeat=repeat))

print('%.4f [%r]' % (best, stmt[:70]))

else:

# Run stmt on all pythons: command line

# Split lines into quoted arguments

print('-' * 80)

print('[%r]' % stmt)

for (ispy3, python) in pythons:

stmt1 = stmt.replace('$listif3', 'list' if ispy3 else '')

stmt1 = stmt1.replace('\t', ' ' * 4)

lines = stmt1.split('\n')

args = ' '.join('"%s"' % line for line in lines)

cmd = '%s -m timeit -n %s -r %s %s' % (python, number, repeat, args)

print(python)

if tracecmd: print(cmd)

print('\t' + os.popen(cmd).read().rstrip())

This file is really only half the picture, though. Testing scripts use this module’s function, passing in concrete though variable lists of statements and Pythons to be tested, as appropriate for the usage mode desired. For example, the following script, pybench_cases.py, tests a handful of statements and Pythons, and allows command-line arguments to determine part of its operation: –a tests all listed Pythons instead of just one, and an added –t traces constructed command lines so you can see how multiline statements and indentation are handled per the command-line formats shown earlier (see both files’ docstrings for details):

"""

pybench_cases.py: Run pybench on a set of pythons and statements.

Select modes by editing this script or using command-line arguments (in

sys.argv): e.g., run a "C:\python27\python pybench_cases.py" to test just

one specific version on stmts, "pybench_cases.py -a" to test all pythons

listed, or a "py −3 pybench_cases.py -a -t" to trace command lines too.

"""

import pybench, sys

pythons = [ # (ispy3?, path)

(1, 'C:\python33\python'),

(0, 'C:\python27\python'),

(0, 'C:\pypy\pypy-1.9\pypy')

]

stmts = [ # (num,rpt,stmt)

(0, 0, "[x ** 2 for x in range(1000)]"), # Iterations

(0, 0, "res=[]\nfor x in range(1000): res.append(x ** 2)"), # \n=multistmt

(0, 0, "$listif3(map(lambda x: x ** 2, range(1000)))"), # \n\t=indent

(0, 0, "list(x ** 2 for x in range(1000))"), # $=list or ''

(0, 0, "s = 'spam' * 2500\nx = [s[i] for i in range(10000)]"), # String ops

(0, 0, "s = '?'\nfor i in range(10000): s += '?'"),

]

tracecmd = '-t' in sys.argv # -t: trace command lines?

pythons = pythons if '-a' in sys.argv else None # -a: all in list, else one?

pybench.runner(stmts, pythons, tracecmd)

Benchmark Script Results

Here is this script’s output when run to test a specific version (the Python running the script)—this mode uses direct API calls, not command lines, with total time listed in the left column, and the statement tested on the right. I’m again using the 3.3 Windows launcher in the first two of these tests to time CPython 3.3 and 2.7, and am running release 1.9 of the PyPy implementation in the third:

c:\code> py −3 pybench_cases.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

0.5015 ['[x ** 2 for x in range(1000)]']

0.5655 ['res=[]\nfor x in range(1000): res.append(x ** 2)']

0.6044 ['list(map(lambda x: x ** 2, range(1000)))']

0.5425 ['list(x ** 2 for x in range(1000))']

0.8746 ["s = 'spam' * 2500\nx = [s[i] for i in range(10000)]"]

2.8060 ["s = '?'\nfor i in range(10000): s += '?'"]

c:\code> py −2 pybench_cases.py

2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]

0.0696 ['[x ** 2 for x in range(1000)]']

0.1285 ['res=[]\nfor x in range(1000): res.append(x ** 2)']

0.1636 ['(map(lambda x: x ** 2, range(1000)))']

0.0952 ['list(x ** 2 for x in range(1000))']

0.6143 ["s = 'spam' * 2500\nx = [s[i] for i in range(10000)]"]

2.0657 ["s = '?'\nfor i in range(10000): s += '?'"]

c:\code> c:\pypy\pypy-1.9\pypy pybench_cases.py

2.7.2 (341e1e3821ff, Jun 07 2012, 15:43:00)

[PyPy 1.9.0 with MSC v.1500 32 bit]

0.0059 ['[x ** 2 for x in range(1000)]']

0.0102 ['res=[]\nfor x in range(1000): res.append(x ** 2)']

0.0099 ['(map(lambda x: x ** 2, range(1000)))']

0.0156 ['list(x ** 2 for x in range(1000))']

0.1298 ["s = 'spam' * 2500\nx = [s[i] for i in range(10000)]"]

5.5242 ["s = '?'\nfor i in range(10000): s += '?'"]

The following shows this script’s output when run to test multiple Python versions for each statement string. In this mode the script itself is run by Python 3.3, but it launches shell command lines that start other Pythons to run the timeit module on the test statement strings. This mode must split, format, and quote multiline statements for use in command lines according to timeit expectations and shell requirements.

This mode also relies on the -m Python command-line flag to locate timeit on the module search path and run it as a script, and the os.popen and sys.argv standard library tools to run a shell command and inspect command-line arguments, respectively. See Python manuals and other sources for more on these calls; os.popen is also mentioned briefly in the files coverage of Chapter 9, and demonstrated in the loops coverage in Chapter 13. Run with a –t flag to watch the command lines run:

c:\code> py −3 pybench_cases.py -a

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

--------------------------------------------------------------------------------

['[x ** 2 for x in range(1000)]']

C:\python33\python

1000 loops, best of 5: 499 usec per loop

C:\python27\python

1000 loops, best of 5: 71.4 usec per loop

C:\pypy\pypy-1.9\pypy

1000 loops, best of 5: 5.71 usec per loop

--------------------------------------------------------------------------------

['res=[]\nfor x in range(1000): res.append(x ** 2)']

C:\python33\python

1000 loops, best of 5: 562 usec per loop

C:\python27\python

1000 loops, best of 5: 130 usec per loop

C:\pypy\pypy-1.9\pypy

1000 loops, best of 5: 9.81 usec per loop

--------------------------------------------------------------------------------

['$listif3(map(lambda x: x ** 2, range(1000)))']

C:\python33\python

1000 loops, best of 5: 599 usec per loop

C:\python27\python

1000 loops, best of 5: 161 usec per loop

C:\pypy\pypy-1.9\pypy

1000 loops, best of 5: 9.45 usec per loop

--------------------------------------------------------------------------------

['list(x ** 2 for x in range(1000))']

C:\python33\python

1000 loops, best of 5: 540 usec per loop

C:\python27\python

1000 loops, best of 5: 92.3 usec per loop

C:\pypy\pypy-1.9\pypy

1000 loops, best of 5: 15.1 usec per loop

--------------------------------------------------------------------------------

["s = 'spam' * 2500\nx = [s[i] for i in range(10000)]"]

C:\python33\python

1000 loops, best of 5: 873 usec per loop

C:\python27\python

1000 loops, best of 5: 614 usec per loop

C:\pypy\pypy-1.9\pypy

1000 loops, best of 5: 118 usec per loop

--------------------------------------------------------------------------------

["s = '?'\nfor i in range(10000): s += '?'"]

C:\python33\python

1000 loops, best of 5: 2.81 msec per loop

C:\python27\python

1000 loops, best of 5: 1.94 msec per loop

C:\pypy\pypy-1.9\pypy

1000 loops, best of 5: 5.68 msec per loop

As you can see, in most of these tests, CPython 2.7 is still quicker than CPython 3.3, and PyPy is noticeably faster than both of them—except on the last test where PyPy is twice as slow as CPython, presumably due to memory management differences. On the other hand, timing results are often relative at best. In addition to other general timing caveats mentioned in this chapter:

§ timeit may skew results in ways beyond our scope to explore here (e.g., garbage collection).

§ There is a baseline overhead, which differs per Python version, that is ignored here (but appears trivial).

§ This script runs very small statements that may or may not reflect real-world code (but are still valid).

§ Results may occasionally vary in ways that seem random (using process time may help here).

§ All results here are highly prone to change over time (in each new Python release, in fact!).

In other words, you should draw your own conclusions from these numbers, and run these tests on your Pythons and machines for results more relevant to your needs. To time the baseline overhead of each Python, run timeit with no statement argument, or equivalently, with a passstatement.

More Fun with Benchmarks

For more insight, try running the script on other Python versions and other statement test strings. The file pybench_cases2.py in this book’s examples distribution adds more tests to see how CPython 3.3 compares to 3.2, how PyPy’s 2.0 beta stacks up against its current release, and how additional use cases fare.

A win for map and a rare loss for PyPy

For example, the following tests in pybench_cases2.py measure the impact of charging other iteration operations with a function call, which improves map’s chances of winning the day per this chapter’s earlier note—map usually loses by its association with function calls in general:

# pybench_cases2.py

pythons += [

(1, 'C:\python32\python'),

(0, 'C:\pypy\pypy-2.0-beta1\pypy')]

stmts += [

# Use function calls: map wins

(0, 0, "[ord(x) for x in 'spam' * 2500]"),

(0, 0, "res=[]\nfor x in 'spam' * 2500: res.append(ord(x))"),

(0, 0, "$listif3(map(ord, 'spam' * 2500))"),

(0, 0, "list(ord(x) for x in 'spam' * 2500)"),

# Set and dicts

(0, 0, "{x ** 2 for x in range(1000)}"),

(0, 0, "s=set()\nfor x in range(1000): s.add(x ** 2)"),

(0, 0, "{x: x ** 2 for x in range(1000)}"),

(0, 0, "d={}\nfor x in range(1000): d[x] = x ** 2"),

# Pathological: 300k digits

(1, 1, "len(str(2**1000000))")] # Pypy loses on this today

Here is the script’s results on these statement tests on CPython 3.X, showing how map is quickest when function calls level the playing field (it lost earlier when the other tests ran an inline x ** 2):

c:\code> py −3 pybench_cases2.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

0.7237 ["[ord(x) for x in 'spam' * 2500]"]

1.3471 ["res=[]\nfor x in 'spam' * 2500: res.append(ord(x))"]

0.6160 ["list(map(ord, 'spam' * 2500))"]

1.1244 ["list(ord(x) for x in 'spam' * 2500)"]

0.5446 ['{x ** 2 for x in range(1000)}']

0.6053 ['s=set()\nfor x in range(1000): s.add(x ** 2)']

0.5278 ['{x: x ** 2 for x in range(1000)}']

0.5414 ['d={}\nfor x in range(1000): d[x] = x ** 2']

1.8933 ['len(str(2**1000000))']

As before, on these tests today 2.X clocks in faster than 3.X and PyPy is faster still on all of these tests but the last—which it loses by a full order of magnitude (10X), though it wins all the other tests here by the same degree. However, if you run file tests precoded in pybench_cases2.pyyou’ll see that PyPy also loses to CPython when reading files line by line, as for the following test tuple on the stmts list:

(0, 0, "f=open('C:/Python33/Lib/pdb.py')\nfor line in f: x=line\nf.close()"),

This test opens and reads a 60K, 1,675-line text file line by line using file iterators. Its input loop presumably dominates overall test time. On this test, CPython 2.7 is twice as fast as 3.3, but PyPy is again an order of magnitude slower than CPython in general. You can find this case in thepybench_cases2 results files, or verify interactively or by command line (this is just what pybench does internally):

c:\code> py −3 -m timeit -n 1000 -r 5 "f=open('C:/Python33/Lib/pdb.py')"

"for line in f: x=line" "f.close()"

>>> import timeit

>>> min(timeit.repeat(number=1000, repeat=5,

stmt="f=open('C:/Python33/Lib/pdb.py')\nfor line in f: x=line\nf.close()"))

For another example that measures both list comprehensions and PyPy’s current file speed, see the file listcomp-speed.txt in the book examples package; it uses direct PyPy command lines to run code from Chapter 14 with similar results: PyPy’s line input is slower today by roughly a factor of 10.

I’ll omit other Pythons’ output here both for space and because these findings could very well change by the time you read these words. As usual, different types of code can exhibit different types of performance. While PyPy may optimize much algorithmic code, it may or may not optimize yours. You can find additional results in the book’s examples package, but you may be better served by running these tests on your own to verify these findings today or observe their possibly different results in the future.

The impact of function calls revisited

As suggested earlier, map also wins for added user-defined functions—the following tests prove the earlier note’s claim that map wins the race in CPython if any function must be applied by its alternatives:

stmts = [

(0, 0, "def f(x): return x\n[f(x) for x in 'spam' * 2500]"),

(0, 0, "def f(x): return x\nres=[]\nfor x in 'spam' * 2500: res.append(f(x))"),

(0, 0, "def f(x): return x\n$listif3(map(f, 'spam' * 2500))"),

(0, 0, "def f(x): return x\nlist(f(x) for x in 'spam' * 2500)")]

c:\code> py −3 pybench_cases2.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

1.5400 ["def f(x): return x\n[f(x) for x in 'spam' * 2500]"]

2.0506 ["def f(x): return x\nres=[]\nfor x in 'spam' * 2500: res.append(f(x))"]

1.2489 ["def f(x): return x\nlist(map(f, 'spam' * 2500))"]

1.6526 ["def f(x): return x\nlist(f(x) for x in 'spam' * 2500)"]

Compare this with the preceding section’s ord tests; though user-defined functions may be slower than built-ins, the larger speed hit today seems to be functions in general, whether they are built-in or not. Notice that the total time here includes the cost of making a helper function, though only one for every 10,000 inner loop repetitions—a negligible factor per both common sense and additional tests run.

Comparing techniques: Homegrown versus batteries

For perspective, let’s see how this section’s timeit-based results compare to the homegrown-based timer results of the prior section, by running the file timeseqs3.py in this book’s examples package—it uses the homegrown timer but performs the same x ** 2 operation and uses the same repetition counts as pybench_cases.py:

c:\code> py −3 timeseqs3.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

forLoop : 0.55022 => [0...998001]

listComp : 0.48787 => [0...998001]

mapCall : 0.59499 => [0...998001]

genExpr : 0.52773 => [0...998001]

genFunc : 0.52603 => [0...998001]

c:\code> py −3 pybench_cases.py

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]

0.5015 ['[x ** 2 for x in range(1000)]']

0.5657 ['res=[]\nfor x in range(1000): res.append(x ** 2)']

0.6025 ['list(map(lambda x: x ** 2, range(1000)))']

0.5404 ['list(x ** 2 for x in range(1000))']

0.8711 ["s = 'spam' * 2500\nx = [s[i] for i in range(10000)]"]

2.8009 ["s = '?'\nfor i in range(10000): s += '?'"]

The homegrown timer results are very similar to the pybench-based results of this section that use timeit, though it’s not entirely apples-to-apples—the homegrown timer-based timeseqs3.py incurs a function call per its middle totals loop and a slight overhead in best of logic of the timer itself, but also uses a prebuilt list instead of a 3.X range generator in its inner loop, which seems to make it slightly net faster on comparable tests (and I’d call this example a “sanity check,” but I’m not sure the term applies in benchmarking!).

Room for improvement: Setup

Like most software, this section’s program is open-ended and could be expanded arbitrarily. As one example, the files pybench2.py and pybench2_cases.py in the book’s examples package add support for timeit’s setup statement option described earlier, in both API call and command-line modes.

This feature was omitted initially for brevity, and frankly, because my tests didn’t seem to require it—timing more code gives a more complete picture when comparing Pythons, and setup actions cost the same when timing alternatives on a single Python. Even so, it’s sometimes useful to provide setup code that is run once in the tested code’s scope, but whose time is not charged to the statement’s total—a module import, object initialization, or helper function definition, for example.

I won’t list these two files in whole, but here are their important varying bits as an example of software evolution at work—as for the test statement, the setup code statement is passed as is in API call mode, but is split and space-indented in command-line mode and passed with one -sargument per line (“$listif3” isn’t used because setup code is not timed):

# pybench2.py

...

def runner(stmts, pythons=None, tracecmd=False):

for (number, repeat, setup, stmt) in stmts:

if not pythons:

...

best = min(timeit.repeat(

setup=setup, stmt=stmt, number=number, repeat=repeat))

else:

setup = setup.replace('\t', ' ' * 4)

setup = ' '.join('-s "%s"' % line for line in setup.split('\n'))

...

for (ispy3, python) in pythons:

...

cmd = '%s -m timeit -n %s -r %s %s %s' %

(python, number, repeat, setup, args)

# pybench2_cases.py

import pybench2, sys

...

stmts = [ # (num,rpt,setup,stmt)

(0, 0, "", "[x ** 2 for x in range(1000)]"),

(0, 0, "", "res=[]\nfor x in range(1000): res.append(x ** 2)"),

(0, 0, "def f(x):\n\treturn x",

"[f(x) for x in 'spam' * 2500]"),

(0, 0, "def f(x):\n\treturn x",

"res=[]\nfor x in 'spam' * 2500:\n\tres.append(f(x))"),

(0, 0, "L = [1, 2, 3, 4, 5]", "for i in range(len(L)): L[i] += 1"),

(0, 0, "L = [1, 2, 3, 4, 5]", "i=0\nwhile i < len(L):\n\tL[i] += 1\n\ti += 1")]

...

pybench2.runner(stmts, pythons, tracecmd)

Run this script with the –a and –t command-line flags to see how command lines are constructed for setup code. For instance, the following test specification tuple generates the command line that follows it for 3.3—not nice to look at, perhaps, but sufficient to pass lines from Windows totimeit, to be concatenated with line breaks between and inserted into a generated timing function with appropriate reindentation:

(0, 0, "def f(x):\n\treturn x",

"res=[]\nfor x in 'spam' * 2500:\n\tres.append(f(x))")

C:\python33\python -m timeit -n 1000 -r 5 -s "def f(x):" -s " return x" "res=[]"

"for x in 'spam' * 2500:" " res.append(f(x))"

In API call mode, code strings are passed unchanged, because there’s no need to placate a shell, and embedded tabs and end-of-line characters suffice. Experiment on your own to uncover more about Python code alternatives’ speed. You may eventually run into shell limitations for larger sections of code in command-line mode, but both our homegrown timer and pybench’s timeit-based API call mode support more arbitrary code. Benchmarks can be great sport, but we’ll have to leave future improvements as suggested exercises.

Other Benchmarking Topics: pystones

This chapter has focused on code timing fundamentals that you can use on your own code, that apply to Python benchmarking in general, and that served as a common use case for developing larger examples for this book. Benchmarking Python is a broader and richer domain than so far implied, though. If you’re interested in pursuing this topic further, search the Web for links. Among the topics you’ll find:

§ pystone.py—a program designed for measuring Python speed across a range of code that ships with Python in its Lib\test directory

§ http://speed.python.org—a project site for coordinating work on common Python benchmarks

§ http://speed.pypy.org—the PyPy benchmarking site that the preceding bullet is partially emulating

The pystone test, for example, is based on a C language benchmark program that was translated to Python by Python original creator Guido van Rossum. It provides another way to measure the relative speeds of Python implementations, and seems to generally support our findings here:

c:\Python33\Lib\test> cd C:\python33\lib\test

c:\Python33\Lib\test> py −3 pystone.py

Pystone(1.1) time for 50000 passes = 0.685303

This machine benchmarks at 72960.4 pystones/second

c:\Python33\Lib\test> cd c:\python27\lib\test

c:\Python27\Lib\test> py −2 pystone.py

Pystone(1.1) time for 50000 passes = 0.463547

This machine benchmarks at 107864 pystones/second

c:\Python27\Lib\test> c:\pypy\pypy-1.9\pypy pystone.py

Pystone(1.1) time for 50000 passes = 0.099975

This machine benchmarks at 500125 pystones/second

Since it’s time to wrap up this chapter, this will have to suffice as independent confirmation of our tests’ results. Analyzing the meaning of pystone’s results is left as suggested exercise; its code is not identical across 3.X and 2.X, but appears to differ today only in terms of print operations and an initialization of a global. Also keep in mind that benchmarking is just one of many aspects of Python code analysis; for pointers on options in related domains (e.g., testing), see Chapter 36’s review of Python development tools.

Function Gotchas

Now that we’ve reached the end of the function story, let’s review some common pitfalls. Functions have some jagged edges that you might not expect. They’re all relatively obscure, and a few have started to fall away from the language completely in recent releases, but most have been known to trip up new users.

Local Names Are Detected Statically

As you know, Python classifies names assigned in a function as locals by default; they live in the function’s scope and exist only while the function is running. What you may not realize is that Python detects locals statically, when it compiles the def’s code, rather than by noticing assignments as they happen at runtime. This leads to one of the most common oddities posted on the Python newsgroup by beginners.

Normally, a name that isn’t assigned in a function is looked up in the enclosing module:

>>> X = 99

>>> def selector(): # X used but not assigned

print(X) # X found in global scope

>>> selector()

99

Here, the X in the function resolves to the X in the module. But watch what happens if you add an assignment to X after the reference:

>>> def selector():

print(X) # Does not yet exist!

X = 88 # X classified as a local name (everywhere)

# Can also happen for "import X", "def X"...

>>> selector()

UnboundLocalError: local variable 'X' referenced before assignment

You get the name usage error shown here, but the reason is subtle. Python reads and compiles this code when it’s typed interactively or imported from a module. While compiling, Python sees the assignment to X and decides that X will be a local name everywhere in the function. But when the function is actually run, because the assignment hasn’t yet happened when the print executes, Python says you’re using an undefined name. According to its name rules, it should say this; the local X is used before being assigned. In fact, any assignment in a function body makes a name local. Imports, =, nested defs, nested classes, and so on are all susceptible to this behavior.

The problem occurs because assigned names are treated as locals everywhere in a function, not just after the statements where they’re assigned. Really, the previous example is ambiguous: was the intention to print the global X and create a local X, or is this a real programming error? Because Python treats X as a local everywhere, it’s seen as an error; if you mean to print the global X, you need to declare it in a global statement:

>>> def selector():

global X # Force X to be global (everywhere)

print(X)

X = 88

>>> selector()

99

Remember, though, that this means the assignment also changes the global X, not a local X. Within a function, you can’t use both local and global versions of the same simple name. If you really meant to print the global and then set a local of the same name, you’d need to import the enclosing module and use module attribute notation to get to the global version:

>>> X = 99

>>> def selector():

import __main__ # Import enclosing module

print(__main__.X) # Qualify to get to global version of name

X = 88 # Unqualified X classified as local

print(X) # Prints local version of name

>>> selector()

99

88

Qualification (the .X part) fetches a value from a namespace object. The interactive namespace is a module called __main__, so __main__.X reaches the global version of X. If that isn’t clear, check out Chapter 17.

In recent versions Python has improved on this story somewhat by issuing for this case the more specific “unbound local” error message shown in the example listing (it used to simply raise a generic name error); this gotcha is still present in general, though.

Defaults and Mutable Objects

As noted briefly in Chapter 17 and Chapter 18, mutable values for default arguments can retain state between calls, though this is often unexpected. In general, default argument values are evaluated and saved once when a def statement is run, not each time the resulting function is later called. Internally, Python saves one object per default argument attached to the function itself.

That’s usually what you want—because defaults are evaluated at def time, it lets you save values from the enclosing scope, if needed (functions defined within loops by factories may even depend on this behavior—see ahead). But because a default retains an object between calls, you have to be careful about changing mutable defaults. For instance, the following function uses an empty list as a default value, and then changes it in place each time the function is called:

>>> def saver(x=[]): # Saves away a list object

x.append(1) # Changes same object each time!

print(x)

>>> saver([2]) # Default not used

[2, 1]

>>> saver() # Default used

[1]

>>> saver() # Grows on each call!

[1, 1]

>>> saver()

[1, 1, 1]

Some see this behavior as a feature—because mutable default arguments retain their state between function calls, they can serve some of the same roles as static local function variables in the C language. In a sense, they work much like global variables, but their names are local to the functions and so will not clash with names elsewhere in a program.

To other observers, though, this seems like a gotcha, especially the first time they run into it. There are better ways to retain state between calls in Python (e.g., using the nested scope closures we met in this part and the classes we will study in Part VI).

Moreover, mutable defaults are tricky to remember (and to understand at all). They depend upon the timing of default object construction. In the prior example, there is just one list object for the default value—the one created when the def is executed. You don’t get a new list every time the function is called, so the list grows with each new append; it is not reset to empty on each call.

If that’s not the behavior you want, simply make a copy of the default at the start of the function body, or move the default value expression into the function body. As long as the value resides in code that’s actually executed each time the function runs, you’ll get a new object each time through:

>>> def saver(x=None):

if x is None: # No argument passed?

x = [] # Run code to make a new list each time

x.append(1) # Changes new list object

print(x)

>>> saver([2])

[2, 1]

>>> saver() # Doesn't grow here

[1]

>>> saver()

[1]

By the way, the if statement in this example could almost be replaced by the assignment x = x or [], which takes advantage of the fact that Python’s or returns one of its operand objects: if no argument was passed, x would default to None, so the or would return the new empty list on the right.

However, this isn’t exactly the same. If an empty list were passed in, the or expression would cause the function to extend and return a newly created list, rather than extending and returning the passed-in list like the if version. (The expression becomes [] or [], which evaluates to the new empty list on the right; see the section “Truth Tests” if you don’t recall why.) Real program requirements may call for either behavior.

Today, another way to achieve the value retention effect of mutable defaults in a possibly less confusing way is to use the function attributes we discussed in Chapter 19:

>>> def saver():

saver.x.append(1)

print(saver.x)

>>> saver.x = []

>>> saver()

[1]

>>> saver()

[1, 1]

>>> saver()

[1, 1, 1]

The function name is global to the function itself, but it need not be declared because it isn’t changed directly within the function. This isn’t used in exactly the same way, but when coded like this, the attachment of an object to the function is much more explicit (and arguably less magical).

Functions Without returns

In Python functions, return (and yield) statements are optional. When a function doesn’t return a value explicitly, the function exits when control falls off the end of the function body. Technically, all functions return a value; if you don’t provide a return statement, your function returns the None object automatically:

>>> def proc(x):

print(x) # No return is a None return

>>> x = proc('testing 123...')

testing 123...

>>> print(x)

None

Functions such as this without a return are Python’s equivalent of what are called “procedures” in some languages. They’re usually invoked as statements, and the None results are ignored, as they do their business without computing a useful result.

This is worth knowing, because Python won’t tell you if you try to use the result of a function that doesn’t return one. As we noted in Chapter 11, for instance, assigning the result of a list append method won’t raise an error, but you’ll get back None, not the modified list:

>>> list = [1, 2, 3]

>>> list = list.append(4) # append is a "procedure"

>>> print(list) # append changes list in place

None

Chapter 15’s section Common Coding Gotchas discusses this more broadly. In general, any functions that do their business as a side effect are usually designed to be run as statements, not expressions.

Miscellaneous Function Gotchas

Here are two additional function-related gotchas—mostly reviews, but common enough to reiterate.

Enclosing scopes and loop variables: Factory functions

We described this gotcha in Chapter 17’s discussion of enclosing function scopes, but as a reminder: when coding factory functions (a.k.a. closures), be careful about relying on enclosing function scope lookup for variables that are changed by enclosing loops—when a generated function is later called, all such references will remember the value of the last loop iteration in the enclosing function’s scope. In this case, you must use defaults to save loop variable values instead of relying on automatic lookup in enclosing scopes. See Loop variables may require defaults, not scopesin Chapter 17 for more details on this topic.

Hiding built-ins by assignment: Shadowing

Also in Chapter 17, we saw how it’s possible to reassign built-in names in a closer local or global scope; the reassignment effectively hides and replaces that built-in’s name for the remainder of the scope where the assignment occurs. This means you won’t be able to use the original built-in value for the name. As long as you don’t need the built-in value of the name you’re assigning, this isn’t an issue—many names are built in, and they may be freely reused. However, if you reassign a built-in name your code relies on, you may have problems. So either don’t do that, or use tools like PyChecker that can warn you if you do. The good news is that the built-ins you commonly use will soon become second nature, and Python’s error trapping will alert you early in testing if your built-in name is not what you think it is.

Chapter Summary

This chapter rounded out our look at functions and built-in iteration tools with a larger case study that measured the performance of iteration alternatives and Pythons, and closed with a review of common function-related mistakes to help you avoid pitfalls. The iteration story has one last sequel in Part VI, where we’ll learn how to code user-defined iterable objects that generate values with classes and __iter__, in Chapter 30’s operator overloading coverage.

This concludes the functions part of this book. In the next part, we will expand on what we already know about modules—files of tools that form the topmost organizational unit in Python, and the structure in which our functions always live. After that, we will explore classes, tools that are largely packages of functions with special first arguments. As we’ll see, user-defined classes can implement objects that tap into the iteration protocol, just like the generators and iterables we met here. In fact, everything we have learned in this part of the book will apply when functions pop up later in the context of class methods.

Before moving on to modules, though, be sure to work through this chapter’s quiz and the exercises for this part of the book, to practice what we’ve learned about functions here.

Test Your Knowledge: Quiz

1. What conclusions can you draw from this chapter about the relative speed of Python iteration tools?

2. What conclusions can you draw from this chapter about the relative speed of the Pythons timed?

Test Your Knowledge: Answers

1. In general, list comprehensions are usually the quickest of the bunch; map beats list comprehensions in Python only when all tools must call functions; for loops tend to be slower than comprehensions; and generator functions and expressions are slower than comprehensions by a constant factor. Under PyPy, some of these findings differ; map often turns in a different relative performance, for example, and list comprehensions seem always quickest, perhaps due to function-level optimizations.

At least that’s the case today on the Python versions tested, on the test machine used, and for the type of code timed—these results may vary if any of these three variables differ. Use the homegrown timer or standard library timeit to test your use cases for more relevant results. Also keep in mind that iteration is just one component of a program’s time: more code gives a more complete picture.

2. In general, PyPy 1.9 (implementing Python 2.7) is typically faster than CPython 2.7, and CPython 2.7 is often faster than CPython 3.3. In most cases timed, PyPy is some 10X faster than CPython, and CPython 2.7 is often a small constant faster than CPython 3.3. In cases that use integer math, CPython 2.7 can be 10X faster than CPython 3.3, and PyPy can be 100X faster than 3.3. In other cases (e.g., string operations and file iterators), PyPy can be slower than CPython by 10X, though timeit and memory management differences may influence some results. The pystone benchmark confirms these relative rankings, though the sizes of the differences it reports differ due to the code timed.

At least that’s the case today on the Python versions tested, on the test machine used, and for the type of code timed—these results may vary if any of these three variables differ. Use the homegrown timer or standard library timeit to test your use cases for more relevant results. This is especially true when timing Python implementations, which may be arbitrarily optimized in each new release.

Test Your Knowledge: Part IV Exercises

In these exercises, you’re going to start coding more sophisticated programs. Be sure to check the solutions in Part IV in Appendix D, and be sure to start writing your code in module files. You won’t want to retype these exercises if you make a mistake.

1. The basics. At the Python interactive prompt, write a function that prints its single argument to the screen and call it interactively, passing a variety of object types: string, integer, list, dictionary. Then, try calling it without passing any argument. What happens? What happens when you pass two arguments?

2. Arguments. Write a function called adder in a Python module file. The function should accept two arguments and return the sum (or concatenation) of the two. Then, add code at the bottom of the file to call the adder function with a variety of object types (two strings, two lists, two floating points), and run this file as a script from the system command line. Do you have to print the call statement results to see results on your screen?

3. varargs. Generalize the adder function you wrote in the last exercise to compute the sum of an arbitrary number of arguments, and change the calls to pass more or fewer than two arguments. What type is the return value sum? (Hints: a slice such as S[:0] returns an empty sequence of the same type as S, and the type built-in function can test types; but see the manually coded min examples in Chapter 18 for a simpler approach.) What happens if you pass in arguments of different types? What about passing in dictionaries?

4. Keywords. Change the adder function from exercise 2 to accept and sum/concatenate three arguments: def adder(good, bad, ugly). Now, provide default values for each argument, and experiment with calling the function interactively. Try passing one, two, three, and four arguments. Then, try passing keyword arguments. Does the call adder(ugly=1, good=2) work? Why? Finally, generalize the new adder to accept and sum/concatenate an arbitrary number of keyword arguments. This is similar to what you did in exercise 3, but you’ll need to iterate over a dictionary, not a tuple. (Hint: the dict.keys method returns a list you can step through with a for or while, but be sure to wrap it in a list call to index it in 3.X; dict.values may help here too.)

5. Dictionary tools. Write a function called copyDict(dict) that copies its dictionary argument. It should return a new dictionary containing all the items in its argument. Use the dictionary keys method to iterate (or, in Python 2.2 and later, step over a dictionary’s keys without callingkeys). Copying sequences is easy (X[:] makes a top-level copy); does this work for dictionaries, too? As explained in this exercise’s solution, because dictionaries now come with similar tools, this and the next exercise are just coding exercises but still serve as representative function examples.

6. Dictionary tools. Write a function called addDict(dict1, dict2) that computes the union of two dictionaries. It should return a new dictionary containing all the items in both its arguments (which are assumed to be dictionaries). If the same key appears in both arguments, feel free to pick a value from either. Test your function by writing it in a file and running the file as a script. What happens if you pass lists instead of dictionaries? How could you generalize your function to handle this case, too? (Hint: see the type built-in function used earlier.) Does the order of the arguments passed in matter?

7. More argument-matching examples. First, define the following six functions (either interactively or in a module file that can be imported):

8. def f1(a, b): print(a, b) # Normal args

9. def f2(a, *b): print(a, b) # Positional varargs

10.

11.def f3(a, **b): print(a, b) # Keyword varargs

12.

13.def f4(a, *b, **c): print(a, b, c) # Mixed modes

14.

15.def f5(a, b=2, c=3): print(a, b, c) # Defaults

16.

17.def f6(a, b=2, *c): print(a, b, c) # Defaults and positional varargs

Now, test the following calls interactively, and try to explain each result; in some cases, you’ll probably need to fall back on the matching algorithm shown in Chapter 18. Do you think mixing matching modes is a good idea in general? Can you think of cases where it would be useful?

>>> f1(1, 2)

>>> f1(b=2, a=1)

>>> f2(1, 2, 3)

>>> f3(1, x=2, y=3)

>>> f4(1, 2, 3, x=2, y=3)

>>> f5(1)

>>> f5(1, 4)

>>> f6(1)

>>> f6(1, 3, 4)

18.Primes revisited. Recall the following code snippet from Chapter 13, which simplistically determines whether a positive integer is prime:

19.x = y // 2 # For some y > 1

20.while x > 1:

21. if y % x == 0: # Remainder

22. print(y, 'has factor', x)

23. break # Skip else

24. x -= 1

25.else: # Normal exit

26. print(y, 'is prime')

Package this code as a reusable function in a module file (y should be a passed-in argument), and add some calls to the function at the bottom of your file. While you’re at it, experiment with replacing the first line’s // operator with / to see how true division changes the / operator in Python 3.X and breaks this code (refer back to Chapter 5 if you need a reminder). What can you do about negatives, and the values 0 and 1? How about speeding this up? Your outputs should look something like this:

13 is prime

13.0 is prime

15 has factor 5

15.0 has factor 5.0

27.Iterations and comprehensions. Write code to build a new list containing the square roots of all the numbers in this list: [2, 4, 9, 16, 25]. Code this as a for loop first, then as a map call, then as a list comprehension, and finally as a generator expression. Use the sqrt function in the built-in math module to do the calculation (i.e., import math and say math.sqrt(x)). Of the four, which approach do you like best?

28.Timing tools. In Chapter 5, we saw three ways to compute square roots: math.sqrt(X), X ** .5, and pow(X, .5). If your programs run a lot of these, their relative performance might become important. To see which is quickest, repurpose the timerseqs.py script we wrote in this chapter to time each of these three tools. Use the bestof or bestoftotal functions in one of this chapter’s timer modules to test (you can use either the original, the 3.X-only keyword-only variant, or the 2.X/3.X version, and may use Python’s timeit module as well). You might also want to repackage the testing code in this script for better reusability—by passing a test functions tuple to a general tester function, for example (for this exercise a copy-and-modify approach is fine). Which of the three square root tools seems to run fastest on your machine and Python in general? Finally, how might you go about interactively timing the speed of dictionary comprehensions versus for loops?

29.Recursive functions. Write a simple recursion function named countdown that prints numbers as it counts down to zero. For example, a call countdown(5) will print: 5 4 3 2 1 stop. There’s no obvious reason to code this with an explicit stack or queue, but what about a nonfunction approach? Would a generator make sense here?

30.Computing factorials. Finally, a computer science classic (but demonstrative nonetheless). We employed the notion of factorials in Chapter 20’s coverage of permutations: N!, computed as N*(N-1)*(N-2)*...1. For instance, 6! is 6*5*4*3*2*1, or 720. Code and time four functions that, for a call fact(N), each return N!. Code these four functions (1) as a recursive countdown per Chapter 19; (2) using the functional reduce call per Chapter 19; (3) with a simple iterative counter loop per Chapter 13; and (4) using the math.factorial library tool perChapter 20. Use Chapter 21’s timeit to time each of your functions. What conclusions can you draw from your results?