Py Boxes: Modules, Packages, and Programs - Introducing Python (2014)

Introducing Python (2014)

Chapter 5. Py Boxes: Modules, Packages, and Programs

During your bottom-up climb, you’ve progressed from built-in data types to constructing ever-larger data and code structures. In this chapter, you’ll finally get down to brass tacks and learn how to write realistic, large programs in Python.

Standalone Programs

Thus far, you’ve been writing and running code fragments such as the following within Python’s interactive interpreter:

>>> print("This interactive snippet works.")

This interactive snippet works.

Now let’s make your first standalone program. On your computer, create a file called test1.py containing this single line of Python code:

print("This standalone program works!")

Notice that there’s no >>> prompt, just a single line of Python code. Ensure that there is no indentation in the line before print.

If you’re running Python in a text terminal or terminal window, type the name of your Python program followed by the program filename:

$ python test1.py

This standalone program works!

NOTE

You can save all of the interactive snippets that you’ve seen in this book so far to files and run them directly. If you’re cutting and pasting, ensure that you delete the initial >>> and … (include the final space).

Command-Line Arguments

On your computer, create a file called test2.py that contains these two lines:

import sys

print('Program arguments:', sys.argv)

Now, use your version of Python to run this program. Here’s how it might look in a Linux or Mac OS X terminal window using a standard shell program:

$ python test2.py

Program arguments: ['test2.py']

$ python test2.py tra la la

Program arguments: ['test2.py', 'tra', 'la', 'la']

Modules and the import Statement

We’re going to step up another level, creating and using Python code in more than one file. A module is just a file of Python code.

The text of this book is organized in a hierarchy: words, sentences, paragraphs, and chapters. Otherwise, it would be unreadable after a page or two. Code has a roughly similar bottom-up organization: data types are like words, statements are like sentences, functions are like paragraphs, and modules are like chapters. To continue the analogy, in this book, when I say that something will be explained in Chapter 8, in programming, that’s like referring to code in another module.

We refer to code of other modules by using the import statement. This makes the code and variables in the imported module available to your program.

Import a Module

The simplest use of the import statement is import module, where module is the name of another Python file, without the .py extension. Let’s simulate a weather station and print a weather report. One main program prints the report, and a separate module with a single function returns the weather description used by the report.

Here’s the main program (call it weatherman.py):

import report

description = report.get_description()

print("Today's weather:", description)

And here is the module (report.py):

def get_description(): # see the docstring below?

"""Return random weather, just like the pros"""

from random import choice

possibilities = ['rain', 'snow', 'sleet', 'fog', 'sun', 'who knows']

return choice(possibilities)

If you have these two files in the same directory and instruct Python to run weatherman.py as the main program, it will access the report module and run its get_description() function. We wrote this version of get_description() to return a random result from a list of strings, so that’s what the main program will get back and print:

$ python weatherman.py

Today's weather: who knows

$ python weatherman.py

Today's weather: sun

$ python weatherman.py

Today's weather: sleet

We used imports in two different places:

§ The main program weatherman.py imported the module report.

§ In the module file report.py, the get_description() function imported the choice function from Python’s standard random module.

We also used imports in two different ways:

§ The main program called import report and then ran report.get_description().

§ The get_description() function in report.py called from random import choice and then ran choice(possibilities).

In the first case, we imported the entire report module but needed to use report. as a prefix to get_description(). After this import statement, everything in report.py is available to the main program, as long as we tack report. before its name. By qualifying the contents of a module with the module’s name, we avoid any nasty naming conflicts. There could be a get_description() function in some other module, and we would not call it by mistake.

In the second case, we’re within a function and know that nothing else named choice is here, so we imported the choice() function from the random module directly. We could have written the function like the following snippet, which returns random results:

def get_description():

import random

possibilities = ['rain', 'snow', 'sleet', 'fog', 'sun', 'who knows']

return random.choice(possibilities)

Like many aspects of programming, pick the style that seems the most clear to you. The module-qualified name (random.choice) is safer but requires a little more typing.

These get_description() examples showed variations of what to import, but but not where to do the importing—they all called import from inside the function. We could have imported random from outside the function:

>>> import random

>>> def get_description():

... possibilities = ['rain', 'snow', 'sleet', 'fog', 'sun', 'who knows']

... return random.choice(possibilities)

...

>>> get_description()

'who knows'

>>> get_description()

'rain'

You should consider importing from outside the function if the imported code might be used in more than one place, and from inside if you know its use will be limited. Some people prefer to put all their imports at the top of the file, just to make all the dependencies of their code explicit. Either way works.

Import a Module with Another Name

In our main weatherman.py program, we called import report. But what if you have another module with the same name or want to use a name that is more mnemonic or shorter? In such a situation, you can import using an alias. Let’s use the alias wr:

import report as wr

description = wr.get_description()

print("Today's weather:", description)

Import Only What You Want from a Module

With Python, you can import one or more parts of a module. Each part can keep its original name or you can give it an alias. First, let’s import get_description() from the report module with its original name:

from report import get_description

description = get_description()

print("Today's weather:", description)

Now, import it as do_it:

from report import get_description as do_it

description = do_it()

print("Today's weather:", description)

Module Search Path

Where does Python look for files to import? It uses a list of directory names and ZIP archive files stored in the standard sys module as the variable path. You can access and modify this list. Here’s the value of sys.path for Python 3.3 on my Mac:

>>> import sys

>>> for place insys.path:

... print(place)

...

/Library/Frameworks/Python.framework/Versions/3.3/lib/python33.zip

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/plat-darwin

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/lib-dynload

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages

That initial blank output line is the empty string '', which stands for the current directory. If '' is first in sys.path, Python looks in the current directory first when you try to import something: import report looks for report.py.

The first match will be used. This means that if you define a module named random and it’s in the search path before the standard library, you won’t be able to access the standard library’s random now.

Packages

We went from single lines of code, to multiline functions, to standalone programs, to multiple modules in the same directory. To allow Python applications to scale even more, you can organize modules into file hierarchies called packages.

Maybe we want different types of text forecasts: one for the next day and one for the next week. One way to structure this is to make a directory named sources, and create two modules within it: daily.py and weekly.py. Each has a function called forecast. The daily version returns a string, and the weekly version returns a list of seven strings.

Here’s the main program and the two modules. (The enumerate() function takes apart a list and feeds each item of the list to the for loop, adding a number to each item as a little bonus.)

Main program: boxes/weather.py.

from sources import daily, weekly

print("Daily forecast:", daily.forecast())

print("Weekly forecast:")

for number, outlook inenumerate(weekly.forecast(), 1):

print(number, outlook)

Module 1: boxes/sources/daily.py.

def forecast():

'fake daily forecast'

return 'like yesterday'

Module 2: boxes/sources/weekly.py.

def forecast():

"""Fake weekly forecast"""

return ['snow', 'more snow', 'sleet',

'freezing rain', 'rain', 'fog', 'hail']

You’ll need one more thing in the sources directory: a file named __init__.py. This can be empty, but Python needs it to treat the directory containing it as a package.

Run the main weather.py program to see what happens:

$ python weather.py

Daily forecast: like yesterday

Weekly forecast:

1 snow

2 more snow

3 sleet

4 freezing rain

5 rain

6 fog

7 hail

The Python Standard Library

One of Python’s prominent claims is that it has “batteries included”—a large standard library of modules that perform many useful tasks, and are kept separate to avoid bloating the core language. When you’re about to write some Python code, it’s often worthwhile to first check whether there’s a standard module that already does what you want. It’s surprising how often you encounter little gems in the standard library. Python also provides authoritative documentation for the modules, along with a tutorial. Doug Hellmann’s website Python Module of the Week and his bookThe Python Standard Library by Example (Addison-Wesley Professional) are also very useful guides.

Upcoming chapters in this book feature many of the standard modules that are specific to the Web, systems, databases, and so on. In this section, I’ll talk about some standard modules that have generic uses.

Handle Missing Keys with setdefault() and defaultdict()

You’ve seen that trying to access a dictionary with a nonexistent key raises an exception. Using the dictionary get() function to return a default value avoids an exception. The setdefault() function is like get(), but also assigns an item to the dictionary if the key is missing:

>>> periodic_table = {'Hydrogen': 1, 'Helium': 2}

>>> print(periodic_table)

{'Helium': 2, 'Hydrogen': 1}

If the key was not already in the dictionary, the new value is used:

>>> carbon = periodic_table.setdefault('Carbon', 12)

>>> carbon

12

>>> periodic_table

{'Helium': 2, 'Carbon': 12, 'Hydrogen': 1}

If we try to assign a different default value to an existing key, the original value is returned and nothing is changed:

>>> helium = periodic_table.setdefault('Helium', 947)

>>> helium

2

>>> periodic_table

{'Helium': 2, 'Carbon': 12, 'Hydrogen': 1}

defaultdict() is similar, but specifies the default value for any new key up front, when the dictionary is created. Its argument is a function. In this example, we pass the function int, which will be called as int() and return the integer 0:

>>> from collections import defaultdict

>>> periodic_table = defaultdict(int)

Now, any missing value will be an integer (int), with the value 0:

>>> periodic_table['Hydrogen'] = 1

>>> periodic_table['Lead']

0

>>> periodic_table

defaultdict(<class 'int'>, {'Lead': 0, 'Hydrogen': 1})

The argument to defaultdict() is a function that returns the value to be assigned to a missing key. In the following example, no_idea() is executed to return a value when needed:

>>> from collections import defaultdict

>>>

>>> def no_idea():

... return 'Huh?'

...

>>> bestiary = defaultdict(no_idea)

>>> bestiary['A'] = 'Abominable Snowman'

>>> bestiary['B'] = 'Basilisk'

>>> bestiary['A']

'Abominable Snowman'

>>> bestiary['B']

'Basilisk'

>>> bestiary['C']

'Huh?'

You can use the functions int(), list(), or dict() to return default empty values for those types: int() returns 0, list() returns an empty list ([]), and dict() returns an empty dictionary ({}). If you omit the argument, the initial value of a new key will be set to None.

By the way, you can use lambda to define your default-making function right inside the call:

>>> bestiary = defaultdict(lambda: 'Huh?')

>>> bestiary['E']

'Huh?'

Using int is one way to make your own counter:

>>> from collections import defaultdict

>>> food_counter = defaultdict(int)

>>> for food in ['spam', 'spam', 'eggs', 'spam']:

... food_counter[food] += 1

...

>>> for food, count infood_counter.items():

... print(food, count)

...

eggs 1

spam 3

In the preceding example, if food_counter had been a normal dictionary instead of a defaultdict, Python would have raised an exception every time we tried to increment the dictionary element food_counter[food] because it would not have been initialized. We would have needed to do some extra work, as shown here:

>>> dict_counter = {}

>>> for food in ['spam', 'spam', 'eggs', 'spam']:

... if notfood indict_counter:

... dict_counter[food] = 0

... dict_counter[food] += 1

...

>>> for food, count indict_counter.items():

... print(food, count)

...

spam 3

eggs 1

Count Items with Counter()

Speaking of counters, the standard library has one that does the work of the previous example and more:

>>> from collections import Counter

>>> breakfast = ['spam', 'spam', 'eggs', 'spam']

>>> breakfast_counter = Counter(breakfast)

>>> breakfast_counter

Counter({'spam': 3, 'eggs': 1})

The most_common() function returns all elements in descending order, or just the top count elements if given a count:

>>> breakfast_counter.most_common()

[('spam', 3), ('eggs', 1)]

>>> breakfast_counter.most_common(1)

[('spam', 3)]

You can combine counters. First, let’s see again what’s in breakfast_counter:

>>> breakfast_counter

>>> Counter({'spam': 3, 'eggs': 1})

This time, we’ll make a new list called lunch, and a counter called lunch_counter:

>>> lunch = ['eggs', 'eggs', 'bacon']

>>> lunch_counter = Counter(lunch)

>>> lunch_counter

Counter({'eggs': 2, 'bacon': 1})

The first way we combine the two counters is by addition, using +:

>>> breakfast_counter + lunch_counter

Counter({'spam': 3, 'eggs': 3, 'bacon': 1})

As you might expect, you subtract one counter from another by using -. What’s for breakfast but not for lunch?

>>> breakfast_counter - lunch_counter

Counter({'spam': 3})

Okay, now what can we have for lunch that we can’t have for breakfast?

>>> lunch_counter - breakfast_counter

Counter({'bacon': 1, 'eggs': 1})

Similar to sets in Chapter 4, you can get common items by using the intersection operator &:

>>> breakfast_counter & lunch_counter

Counter({'eggs': 1})

The intersection picked the common element ('eggs') with the lower count. This makes sense: breakfast only offered one egg, so that’s the common count.

Finally, you can get all items by using the union operator |:

>>> breakfast_counter | lunch_counter

Counter({'spam': 3, 'eggs': 2, 'bacon': 1})

The item 'eggs' was again common to both. Unlike addition, union didn’t add their counts, but picked the one with the larger count.

Order by Key with OrderedDict()

Many of the code examples in the early chapters of this book demonstrate that the order of keys in a dictionary is not predictable: you might add keys a, b, and c in that order, but keys() might return c, a, b. Here’s a repurposed example from Chapter 1:

>>> quotes = {

... 'Moe': 'A wise guy, huh?',

... 'Larry': 'Ow!',

... 'Curly': 'Nyuk nyuk!',

... }

>>> for stooge inquotes:

... print(stooge)

...

Larry

Curly

Moe

An OrderedDict() remembers the order of key addition and returns them in the same order from an iterator. Try creating an OrderedDict from a sequence of (key, value) tuples:

>>> from collections import OrderedDict

>>> quotes = OrderedDict([

... ('Moe', 'A wise guy, huh?'),

... ('Larry', 'Ow!'),

... ('Curly', 'Nyuk nyuk!'),

... ])

>>>

>>> for stooge inquotes:

... print(stooge)

...

Moe

Larry

Curly

Stack + Queue == deque

A deque (pronounced deck) is a double-ended queue, which has features of both a stack and a queue. It’s useful when you want to add and delete items from either end of a sequence. Here, we’ll work from both ends of a word to the middle to see if it’s a palindrome. The functionpopleft() removes the leftmost item from the deque and returns it; pop() removes the rightmost item and returns it. Together, they work from the ends toward the middle. As long as the end characters match, it keeps popping until it reaches the middle:

>>> def palindrome(word):

... from collections import deque

... dq = deque(word)

... while len(dq) > 1:

... if dq.popleft() != dq.pop():

... return False

... return True

...

...

>>> palindrome('a')

True

>>> palindrome('racecar')

True

>>> palindrome('')

True

>>> palindrome('radar')

True

>>> palindrome('halibut')

False

I used this as a simple illustration of deques. If you really wanted a quick palindrome checker, it would be a lot simpler to just compare a string with its reverse. Python doesn’t have a reverse() function for strings, but it does have a way to reverse a string with a slice, as illustrated here:

>>> def another_palindrome(word):

... return word == word[::-1]

...

>>> another_palindrome('radar')

True

>>> another_palindrome('halibut')

False

Iterate over Code Structures with itertools

itertools contains special-purpose iterator functions. Each returns one item at a time when called within a for … in loop, and remembers its state between calls.

chain() runs through its arguments as though they were a single iterable:

>>> import itertools

>>> for item initertools.chain([1, 2], ['a', 'b']):

... print(item)

...

1

2

a

b

cycle() is an infinite iterator, cycling through its arguments:

>>> import itertools

>>> for item initertools.cycle([1, 2]):

... print(item)

...

1

2

1

2

.

.

.

…and so on.

accumulate() calculates accumulated values. By default, it calculates the sum:

>>> import itertools

>>> for item initertools.accumulate([1, 2, 3, 4]):

... print(item)

...

1

3

6

10

You can provide a function as the second argument to accumulate(), and it will be used instead of addition. The function should take two arguments and return a single result. This example calculates an accumulated product:

>>> import itertools

>>> def multiply(a, b):

... return a * b

...

>>> for item initertools.accumulate([1, 2, 3, 4], multiply):

... print(item)

...

1

2

6

24

The itertools module has many more functions, notably some for combinations and permutations that can be time savers when the need arises.

Print Nicely with pprint()

All of our examples have used print() (or just the variable name, in the interactive interpreter) to print things. Sometimes, the results are hard to read. We need a pretty printer such as pprint():

>>> from pprint import pprint

>>> quotes = OrderedDict([

... ('Moe', 'A wise guy, huh?'),

... ('Larry', 'Ow!'),

... ('Curly', 'Nyuk nyuk!'),

... ])

>>>

Plain old print() just dumps things out there:

>>> print(quotes)

OrderedDict([('Moe', 'A wise guy, huh?'), ('Larry', 'Ow!'), ('Curly', 'Nyuk nyuk!')])

However, pprint() tries to align elements for better readability:

>>> pprint(quotes)

{'Moe': 'A wise guy, huh?',

'Larry': 'Ow!',

'Curly': 'Nyuk nyuk!'}

More Batteries: Get Other Python Code

Sometimes, the standard library doesn’t have what you need, or doesn’t do it in quite the right way. There’s an entire world of open-source, third-party Python software. Good resources include:

§ PyPi (also known as the Cheese Shop, after an old Monty Python skit)

§ github

§ readthedocs

You can find many smaller code examples at activestate.

Almost all of the Python code in this book uses the standard Python installation on your computer, which includes all the built-ins and the standard library. External packages are featured in some places: I mentioned requests in Chapter 1, and have more details in Beyond the Standard Library: Requests. Appendix D shows how to install third-party Python software, along with many other nuts-and-bolts development details.

Things to Do

5.1. Create a file called zoo.py. In it, define a function called hours() that prints the string 'Open 9-5 daily'. Then, use the interactive interpreter to import the zoo module and call its hours() function.

5.2. In the interactive interpreter, import the zoo module as menagerie and call its hours() function.

5.3. Staying in the interpreter, import the hours() function from zoo directly and call it.

5.4. Import the hours() function as info and call it.

5.5. Make a dictionary called plain with the key-value pairs 'a': 1, 'b': 2, and 'c': 3, and then print it.

5.6. Make an OrderedDict called fancy from the same pairs listed in 5.5 and print it. Did it print in the same order as plain?

5.7. Make a defaultdict called dict_of_lists and pass it the argument list. Make the list dict_of_lists['a'] and append the value 'something for a' to it in one assignment. Print dict_of_lists['a'].