Learning Python (2013)
Part II. Types and Operations
Chapter 8. Lists and Dictionaries
Now that we’ve learned about numbers and strings, this chapter moves on to give the full story on Python’s list and dictionary object types—collections of other objects, and the main workhorses in almost all Python scripts. As you’ll see, both types are remarkably flexible: they can be changed in place, can grow and shrink on demand, and may contain and be nested in any other kind of object. By leveraging these types, you can build up and process arbitrarily rich information structures in your scripts.
Lists
The next stop on our built-in object tour is the Python list. Lists are Python’s most flexible ordered collection object type. Unlike strings, lists can contain any sort of object: numbers, strings, and even other lists. Also, unlike strings, lists may be changed in place by assignment to offsets and slices, list method calls, deletion statements, and more—they are mutable objects.
Python lists do the work of many of the collection data structures you might have to implement manually in lower-level languages such as C. Here is a quick look at their main properties. Python lists are:
Ordered collections of arbitrary objects
From a functional view, lists are just places to collect other objects so you can treat them as groups. Lists also maintain a left-to-right positional ordering among the items they contain (i.e., they are sequences).
Accessed by offset
Just as with strings, you can fetch a component object out of a list by indexing the list on the object’s offset. Because items in lists are ordered by their positions, you can also do tasks such as slicing and concatenation.
Variable-length, heterogeneous, and arbitrarily nestable
Unlike strings, lists can grow and shrink in place (their lengths can vary), and they can contain any sort of object, not just one-character strings (they’re heterogeneous). Because lists can contain other complex objects, they also support arbitrary nesting; you can create lists of lists of lists, and so on.
Of the category “mutable sequence”
In terms of our type category qualifiers, lists are mutable (i.e., can be changed in place) and can respond to all the sequence operations used with strings, such as indexing, slicing, and concatenation. In fact, sequence operations work the same on lists as they do on strings; the only difference is that sequence operations such as concatenation and slicing return new lists instead of new strings when applied to lists. Because lists are mutable, however, they also support other operations that strings don’t, such as deletion and index assignment operations, which change the lists in place.
Arrays of object references
Technically, Python lists contain zero or more references to other objects. Lists might remind you of arrays of pointers (addresses) if you have a background in some other languages. Fetching an item from a Python list is about as fast as indexing a C array; in fact, lists really are arrays inside the standard Python interpreter, not linked structures. As we learned in Chapter 6, though, Python always follows a reference to an object whenever the reference is used, so your program deals only with objects. Whenever you assign an object to a data structure component or variable name, Python always stores a reference to that same object, not a copy of it (unless you request a copy explicitly).
As a preview and reference, Table 8-1 summarizes common and representative list object operations. It is fairly complete for Python 3.3, but for the full story, consult the Python standard library manual, or run a help(list) or dir(list) call interactively for a complete list of list methods—you can pass in a real list, or the word list, which is the name of the list data type. The set of methods here is especially prone to change—in fact, two are new as of Python 3.3.
Table 8-1. Common list literals and operations
Operation |
Interpretation |
L = [] |
An empty list |
L = [123, 'abc', 1.23, {}] |
Four items: indexes 0..3 |
L = ['Bob', 40.0, ['dev', 'mgr']] |
Nested sublists |
L = list('spam') L = list(range(-4, 4)) |
List of an iterable’s items, list of successive integers |
L[i] L[i][j] L[i:j] len(L) |
Index, index of index, slice, length |
L1 + L2 L * 3 |
Concatenate, repeat |
for x in L: print(x) 3 in L |
Iteration, membership |
L.append(4) L.extend([5,6,7]) L.insert(i, X) |
Methods: growing |
L.index(X) L.count(X) |
Methods: searching |
L.sort() L.reverse() L.copy() L.clear() |
Methods: sorting, reversing, copying (3.3+), clearing (3.3+) |
L.pop(i) L.remove(X) del L[i] del L[i:j] L[i:j] = [] |
Methods, statements: shrinking |
L[i] = 3 L[i:j] = [4,5,6] |
Index assignment, slice assignment |
L = [x**2 for x in range(5)] list(map(ord, 'spam')) |
List comprehensions and maps (Chapter 4, Chapter 14, Chapter 20) |
When written down as a literal expression, a list is coded as a series of objects (really, expressions that return objects) in square brackets, separated by commas. For instance, the second row in Table 8-1 assigns the variable L to a four-item list. A nested list is coded as a nested square-bracketed series (row 3), and the empty list is just a square-bracket pair with nothing inside (row 1).[18]
Many of the operations in Table 8-1 should look familiar, as they are the same sequence operations we put to work on strings earlier—indexing, concatenation, iteration, and so on. Lists also respond to list-specific method calls (which provide utilities such as sorting, reversing, adding items to the end, etc.), as well as in-place change operations (deleting items, assignment to indexes and slices, and so forth). Lists have these tools for change operations because they are a mutable object type.
[18] In practice, you won’t see many lists written out like this in list-processing programs. It’s more common to see code that processes lists constructed dynamically (at runtime), from user inputs, file contents, and so on. In fact, although it’s important to master literal syntax, many data structures in Python are built by running program code at runtime.
Lists in Action
Perhaps the best way to understand lists is to see them at work. Let’s once again turn to some simple interpreter interactions to illustrate the operations in Table 8-1.
Basic List Operations
Because they are sequences, lists support many of the same operations as strings. For example, lists respond to the + and * operators much like strings—they mean concatenation and repetition here too, except that the result is a new list, not a string:
% python
>>> len([1, 2, 3]) # Length
3
>>> [1, 2, 3] + [4, 5, 6] # Concatenation
[1, 2, 3, 4, 5, 6]
>>> ['Ni!'] * 4 # Repetition
['Ni!', 'Ni!', 'Ni!', 'Ni!']
Although the + operator works the same for lists and strings, it’s important to know that it expects the same sort of sequence on both sides—otherwise, you get a type error when the code runs. For instance, you cannot concatenate a list and a string unless you first convert the list to a string (using tools such as str or % formatting) or convert the string to a list (the list built-in function does the trick):
>>> str([1, 2]) + "34" # Same as "[1, 2]" + "34"
'[1, 2]34'
>>> [1, 2] + list("34") # Same as [1, 2] + ["3", "4"]
[1, 2, '3', '4']
List Iteration and Comprehensions
More generally, lists respond to all the sequence operations we used on strings in the prior chapter, including iteration tools:
>>> 3 in [1, 2, 3] # Membership
True
>>> for x in [1, 2, 3]:
... print(x, end=' ') # Iteration (2.X uses: print x,)
...
1 2 3
We will talk more formally about for iteration and the range built-ins of Table 8-1 in Chapter 13, because they are related to statement syntax. In short, for loops step through items in any sequence from left to right, executing one or more statements for each item; range produces successive integers.
The last items in Table 8-1, list comprehensions and map calls, are covered in more detail in Chapter 14 and expanded on in Chapter 20. Their basic operation is straightforward, though—as introduced in Chapter 4, list comprehensions are a way to build a new list by applying an expression to each item in a sequence (really, in any iterable), and are close relatives to for loops:
>>> res = [c * 4 for c in 'SPAM'] # List comprehensions
>>> res
['SSSS', 'PPPP', 'AAAA', 'MMMM']
This expression is functionally equivalent to a for loop that builds up a list of results manually, but as we’ll learn in later chapters, list comprehensions are simpler to code and likely faster to run today:
>>> res = []
>>> for c in 'SPAM': # List comprehension equivalent
... res.append(c * 4)
...
>>> res
['SSSS', 'PPPP', 'AAAA', 'MMMM']
As also introduced briefly in Chapter 4, the map built-in function does similar work, but applies a function to items in a sequence and collects all the results in a new list:
>>> list(map(abs, [−1, −2, 0, 1, 2])) # Map a function across a sequence
[1, 2, 0, 1, 2]
Because we’re not quite ready for the full iteration story, we’ll postpone further details for now, but watch for a similar comprehension expression for dictionaries later in this chapter.
Indexing, Slicing, and Matrixes
Because lists are sequences, indexing and slicing work the same way for lists as they do for strings. However, the result of indexing a list is whatever type of object lives at the offset you specify, while slicing a list always returns a new list:
>>> L = ['spam', 'Spam', 'SPAM!']
>>> L[2] # Offsets start at zero
'SPAM!'
>>> L[−2] # Negative: count from the right
'Spam'
>>> L[1:] # Slicing fetches sections
['Spam', 'SPAM!']
One note here: because you can nest lists and other object types within lists, you will sometimes need to string together index operations to go deeper into a data structure. For example, one of the simplest ways to represent matrixes (multidimensional arrays) in Python is as lists with nested sublists. Here’s a basic 3 × 3 two-dimensional list-based array:
>>> matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
With one index, you get an entire row (really, a nested sublist), and with two, you get an item within the row:
>>> matrix[1]
[4, 5, 6]
>>> matrix[1][1]
5
>>> matrix[2][0]
7
>>> matrix = [[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]]
>>> matrix[1][1]
5
Notice in the preceding interaction that lists can naturally span multiple lines if you want them to because they are contained by a pair of brackets; the “...”s here are Python’s continuation line prompt (see Chapter 4 for comparable code without the “...”s, and watch for more on syntax in the next part of the book).
For more on matrixes, watch later in this chapter for a dictionary-based matrix representation, which can be more efficient when matrixes are largely empty. We’ll also continue this thread in Chapter 20 where we’ll write additional matrix code, especially with list comprehensions. For high-powered numeric work, the NumPy extension mentioned in Chapter 4 and Chapter 5 provides other ways to handle matrixes.
Changing Lists in Place
Because lists are mutable, they support operations that change a list object in place. That is, the operations in this section all modify the list object directly—overwriting its former value—without requiring that you make a new copy, as you had to for strings. Because Python deals only in object references, this distinction between changing an object in place and creating a new object matters; as discussed in Chapter 6, if you change an object in place, you might impact more than one reference to it at the same time.
Index and slice assignments
When using a list, you can change its contents by assigning to either a particular item (offset) or an entire section (slice):
>>> L = ['spam', 'Spam', 'SPAM!']
>>> L[1] = 'eggs' # Index assignment
>>> L
['spam', 'eggs', 'SPAM!']
>>> L[0:2] = ['eat', 'more'] # Slice assignment: delete+insert
>>> L # Replaces items 0,1
['eat', 'more', 'SPAM!']
Both index and slice assignments are in-place changes—they modify the subject list directly, rather than generating a new list object for the result. Index assignment in Python works much as it does in C and most other languages: Python replaces the single object reference at the designated offset with a new one.
Slice assignment, the last operation in the preceding example, replaces an entire section of a list in a single step. Because it can be a bit complex, it is perhaps best thought of as a combination of two steps:
1. Deletion. The slice you specify to the left of the = is deleted.
2. Insertion. The new items contained in the iterable object to the right of the = are inserted into the list on the left, at the place where the old slice was deleted.[19]
This isn’t what really happens, but it can help clarify why the number of items inserted doesn’t have to match the number of items deleted. For instance, given a list L of two or more items, an assignment L[1:2]=[4,5] replaces one item with two—Python first deletes the one-item slice at[1:2] (from offset 1, up to but not including offset 2), then inserts both 4 and 5 where the deleted slice used to be.
This also explains why the second slice assignment in the following is really an insert—Python replaces an empty slice at [1:1] with two items; and why the third is really a deletion—Python deletes the slice (the item at offset 1), and then inserts nothing:
>>> L = [1, 2, 3]
>>> L[1:2] = [4, 5] # Replacement/insertion
>>> L
[1, 4, 5, 3]
>>> L[1:1] = [6, 7] # Insertion (replace nothing)
>>> L
[1, 6, 7, 4, 5, 3]
>>> L[1:2] = [] # Deletion (insert nothing)
>>> L
[1, 7, 4, 5, 3]
In effect, slice assignment replaces an entire section, or “column,” all at once—even if the column or its replacement is empty. Because the length of the sequence being assigned does not have to match the length of the slice being assigned to, slice assignment can be used to replace (by overwriting), expand (by inserting), or shrink (by deleting) the subject list. It’s a powerful operation, but frankly, one that you may not see very often in practice. There are often more straightforward and mnemonic ways to replace, insert, and delete (concatenation, and the insert, pop, andremove list methods, for example), which Python programmers tend to prefer in practice.
On the other hand, this operation can be used as a sort of in-place concatenation at the front of the list—per the next section’s method coverage, something the list’s extend does more mnemonically at list end:
>>> L = [1]
>>> L[:0] = [2, 3, 4] # Insert all at :0, an empty slice at front
>>> L
[2, 3, 4, 1]
>>> L[len(L):] = [5, 6, 7] # Insert all at len(L):, an empty slice at end
>>> L
[2, 3, 4, 1, 5, 6, 7]
>>> L.extend([8, 9, 10]) # Insert all at end, named method
>>> L
[2, 3, 4, 1, 5, 6, 7, 8, 9, 10]
List method calls
Like strings, Python list objects also support type-specific method calls, many of which change the subject list in place:
>>> L = ['eat', 'more', 'SPAM!']
>>> L.append('please') # Append method call: add item at end
>>> L
['eat', 'more', 'SPAM!', 'please']
>>> L.sort() # Sort list items ('S' < 'e')
>>> L
['SPAM!', 'eat', 'more', 'please']
Methods were introduced in Chapter 7. In brief, they are functions (really, object attributes that reference functions) that are associated with and act upon particular objects. Methods provide type-specific tools; the list methods presented here, for instance, are generally available only for lists.
Perhaps the most commonly used list method is append, which simply tacks a single item (object reference) onto the end of the list. Unlike concatenation, append expects you to pass in a single object, not a list. The effect of L.append(X) is similar to L+[X], but while the former changesL in place, the latter makes a new list.[20] The sort method orders the list’s items here, but merits a section of its own.
More on sorting lists
Another commonly seen method, sort, orders a list in place; it uses Python standard comparison tests (here, string comparisons, but applicable to every object type), and by default sorts in ascending order. You can modify sort behavior by passing in keyword arguments—a special “name=value” syntax in function calls that specifies passing by name and is often used for giving configuration options.
In sorts, the reverse argument allows sorts to be made in descending instead of ascending order, and the key argument gives a one-argument function that returns the value to be used in sorting—the string object’s standard lower case converter in the following (though its newercasefold may handle some types of Unicode text better):
>>> L = ['abc', 'ABD', 'aBe']
>>> L.sort() # Sort with mixed case
>>> L
['ABD', 'aBe', 'abc']
>>> L = ['abc', 'ABD', 'aBe']
>>> L.sort(key=str.lower) # Normalize to lowercase
>>> L
['abc', 'ABD', 'aBe']
>>>
>>> L = ['abc', 'ABD', 'aBe']
>>> L.sort(key=str.lower, reverse=True) # Change sort order
>>> L
['aBe', 'ABD', 'abc']
The sort key argument might also be useful when sorting lists of dictionaries, to pick out a sort key by indexing each dictionary. We’ll study dictionaries later in this chapter, and you’ll learn more about keyword function arguments in Part IV.
NOTE
Comparison and sorts in 3.X: In Python 2.X, relative magnitude comparisons of differently typed objects (e.g., a string and a list) work—the language defines a fixed ordering among different types, which is deterministic, if not aesthetically pleasing. That is, the ordering is based on the names of the types involved: all integers are less than all strings, for example, because "int" is less than"str". Comparisons never automatically convert types, except when comparing numeric type objects.
In Python 3.X, this has changed: magnitude comparison of mixed types raises an exception instead of falling back on the fixed cross-type ordering. Because sorting uses comparisons internally, this means that [1, 2, 'spam'].sort() succeeds in Python 2.X but will raise an exception in Python 3.X. Sorting mixed-types fails by proxy.
Python 3.X also no longer supports passing in an arbitrary comparison function to sorts, to implement different orderings. The suggested workaround is to use the key=func keyword argument to code value transformations during the sort, and use the reverse=True keyword argument to change the sort order to descending. These were the typical uses of comparison functions in the past.
One warning here: beware that append and sort change the associated list object in place, but don’t return the list as a result (technically, they both return a value called None). If you say something like L=L.append(X), you won’t get the modified value of L (in fact, you’ll lose the reference to the list altogether!). When you use attributes such as append and sort, objects are changed as a side effect, so there’s no reason to reassign.
Partly because of such constraints, sorting is also available in recent Pythons as a built-in function, which sorts any collection (not just lists) and returns a new list for the result (instead of in-place changes):
>>> L = ['abc', 'ABD', 'aBe']
>>> sorted(L, key=str.lower, reverse=True) # Sorting built-in
['aBe', 'ABD', 'abc']
>>> L = ['abc', 'ABD', 'aBe']
>>> sorted([x.lower() for x in L], reverse=True) # Pretransform items: differs!
['abe', 'abd', 'abc']
Notice the last example here—we can convert to lowercase prior to the sort with a list comprehension, but the result does not contain the original list’s values as it does with the key argument. The latter is applied temporarily during the sort, instead of changing the values to be sorted altogether. As we move along, we’ll see contexts in which the sorted built-in can sometimes be more useful than the sort method.
Other common list methods
Like strings, lists have other methods that perform other specialized operations. For instance, reverse reverses the list in-place, and the extend and pop methods insert multiple items at and delete an item from the end of the list, respectively. There is also a reversed built-in function that works much like sorted and returns a new result object, but it must be wrapped in a list call in both 2.X and 3.X here because its result is an iterator that produces results on demand (more on iterators later):
>>> L = [1, 2]
>>> L.extend([3, 4, 5]) # Add many items at end (like in-place +)
>>> L
[1, 2, 3, 4, 5]
>>> L.pop() # Delete and return last item (by default: −1)
5
>>> L
[1, 2, 3, 4]
>>> L.reverse() # In-place reversal method
>>> L
[4, 3, 2, 1]
>>> list(reversed(L)) # Reversal built-in with a result (iterator)
[1, 2, 3, 4]
Technically, the extend method always iterates through and adds each item in an iterable object, whereas append simply adds a single item as is without iterating through it—a distinction that will be more meaningful by Chapter 14. For now, it’s enough to know that extend adds many items, and append adds one. In some types of programs, the list pop method is often used in conjunction with append to implement a quick last-in-first-out (LIFO) stack structure. The end of the list serves as the top of the stack:
>>> L = []
>>> L.append(1) # Push onto stack
>>> L.append(2)
>>> L
[1, 2]
>>> L.pop() # Pop off stack
2
>>> L
[1]
The pop method also accepts an optional offset of the item to be deleted and returned (the default is the last item at offset −1). Other list methods remove an item by value (remove), insert an item at an offset (insert), count the number of occurrences (count), and search for an item’soffset (index—a search for the index of an item, not to be confused with indexing!):
>>> L = ['spam', 'eggs', 'ham']
>>> L.index('eggs') # Index ofan object (search/find)
1
>>> L.insert(1, 'toast') # Insert at position
>>> L
['spam', 'toast', 'eggs', 'ham']
>>> L.remove('eggs') # Delete by value
>>> L
['spam', 'toast', 'ham']
>>> L.pop(1) # Delete by position
'toast'
>>> L
['spam', 'ham']
>>> L.count('spam') # Number of occurrences
1
Note that unlike other list methods, count and index do not change the list itself, but return information about its content. See other documentation sources or experiment with these calls interactively on your own to learn more about list methods.
Other common list operations
Because lists are mutable, you can use the del statement to delete an item or section in place:
>>> L = ['spam', 'eggs', 'ham', 'toast']
>>> del L[0] # Delete one item
>>> L
['eggs', 'ham', 'toast']
>>> del L[1:] # Delete an entire section
>>> L # Same as L[1:] = []
['eggs']
As we saw earlier, because slice assignment is a deletion plus an insertion, you can also delete a section of a list by assigning an empty list to a slice (L[i:j]=[]); Python deletes the slice named on the left, and then inserts nothing. Assigning an empty list to an index, on the other hand, just stores a reference to the empty list object in the specified slot, rather than deleting an item:
>>> L = ['Already', 'got', 'one']
>>> L[1:] = []
>>> L
['Already']
>>> L[0] = []
>>> L
[[]]
Although all the operations just discussed are typical, there may be additional list methods and operations not illustrated here. The method set, for example, may change over time, and in fact has in Python 3.3—its new L.copy() method makes a top-level copy of the list, much like L[:]and list(L), but is symmetric with copy in sets and dictionaries. For a comprehensive and up-to-date list of type tools, you should always consult Python’s manuals, Python’s dir and help functions (which we first met in Chapter 4), or one of the reference texts mentioned in the preface.
And because it’s such a common hurdle, I’d also like to remind you again that all the in-place change operations discussed here work only for mutable objects: they won’t work on strings (or tuples, discussed in Chapter 9), no matter how hard you try. Mutability is an inherent property of each object type.
[19] This description requires elaboration when the value and the slice being assigned overlap: L[2:5]=L[3:6], for instance, works fine because the value to be inserted is fetched before the deletion happens on the left.
[20] Unlike + concatenation, append doesn’t have to generate new objects, so it’s usually faster than + too. You can also mimic append with the clever slice assignments of the prior section: L[len(L):]=[X] is like L.append(X), and L[:0]=[X] is like appending at the front of a list. Both delete an empty slice and insert X, changing L in place quickly, like append. Both are arguably more complex than list methods, though. For instance, L.insert(0, X) can also append an item to the front of a list, and seems noticeably more mnemonic; L.insert(len(L), X) inserts one object at the end too, but unless you like typing, you might as well use L.append(X)!
Dictionaries
Along with lists, dictionaries are one of the most flexible built-in data types in Python. If you think of lists as ordered collections of objects, you can think of dictionaries as unordered collections; the chief distinction is that in dictionaries, items are stored and fetched by key, instead of by positional offset. While lists can serve roles similar to arrays in other languages, dictionaries take the place of records, search tables, and any other sort of aggregation where item names are more meaningful than item positions.
For example, dictionaries can replace many of the searching algorithms and data structures you might have to implement manually in lower-level languages—as a highly optimized built-in type, indexing a dictionary is a very fast search operation. Dictionaries also sometimes do the work of records, structs, and symbol tables used in other languages; can be used to represent sparse (mostly empty) data structures; and much more. Here’s a rundown of their main properties. Python dictionaries are:
Accessed by key, not offset position
Dictionaries are sometimes called associative arrays or hashes (especially by users of other scripting languages). They associate a set of values with keys, so you can fetch an item out of a dictionary using the key under which you originally stored it. You use the same indexing operation to get components in a dictionary as you do in a list, but the index takes the form of a key, not a relative offset.
Unordered collections of arbitrary objects
Unlike in a list, items stored in a dictionary aren’t kept in any particular order; in fact, Python pseudo-randomizes their left-to-right order to provide quick lookup. Keys provide the symbolic (not physical) locations of items in a dictionary.
Variable-length, heterogeneous, and arbitrarily nestable
Like lists, dictionaries can grow and shrink in place (without new copies being made), they can contain objects of any type, and they support nesting to any depth (they can contain lists, other dictionaries, and so on). Each key can have just one associated value, but that value can be acollection of multiple objects if needed, and a given value can be stored under any number of keys.
Of the category “mutable mapping”
You can change dictionaries in place by assigning to indexes (they are mutable), but they don’t support the sequence operations that work on strings and lists. Because dictionaries are unordered collections, operations that depend on a fixed positional order (e.g., concatenation, slicing) don’t make sense. Instead, dictionaries are the only built-in, core type representatives of the mapping category—objects that map keys to values. Other mappings in Python are created by imported modules.
Tables of object references (hash tables)
If lists are arrays of object references that support access by position, dictionaries are unordered tables of object references that support access by key. Internally, dictionaries are implemented as hash tables (data structures that support very fast retrieval), which start small and grow on demand. Moreover, Python employs optimized hashing algorithms to find keys, so retrieval is quick. Like lists, dictionaries store object references (not copies, unless you ask for them explicitly).
For reference and preview again, Table 8-2 summarizes some of the most common and representative dictionary operations, and is relatively complete as of Python 3.3. As usual, though, see the library manual or run a dir(dict) or help(dict) call for a complete list—dict is the name of the type. When coded as a literal expression, a dictionary is written as a series of key:value pairs, separated by commas, enclosed in curly braces.[21] An empty dictionary is an empty set of braces, and you can nest dictionaries by simply coding one as a value inside another dictionary, or within a list or tuple.
Table 8-2. Common dictionary literals and operations
Operation |
Interpretation |
D = {} |
Empty dictionary |
D = {'name': 'Bob', 'age': 40} |
Two-item dictionary |
E = {'cto': {'name': 'Bob', 'age': 40}} |
Nesting |
D = dict(name='Bob', age=40) D = dict([('name', 'Bob'), ('age', 40)]) D = dict(zip(keyslist, valueslist)) D = dict.fromkeys(['name', 'age']) |
Alternative construction techniques: keywords, key/value pairs, zipped key/value pairs, key lists |
D['name'] E['cto']['age'] |
Indexing by key |
'age' in D |
Membership: key present test |
D.keys() D.values() D.items() D.copy() D.clear() D.update(D2) D.get(key, default?) D.pop(key, default?) D.setdefault(key, default?) D.popitem() |
Methods: all keys, all values, all key+value tuples, copy (top-level), clear (remove all items), merge by keys, fetch by key, if absent default (or None), remove by key, if absent default (or error) fetch by key, if absent set default (or None), remove/return any (key, value) pair; etc. |
len(D) |
Length: number of stored entries |
D[key] = 42 |
Adding/changing keys |
del D[key] |
Deleting entries by key |
list(D.keys()) D1.keys() & D2.keys() |
Dictionary views (Python 3.X) |
D.viewkeys(), D.viewvalues() |
Dictionary views (Python 2.7) |
D = {x: x*2 for x in range(10)} |
Dictionary comprehensions (Python 3.X, 2.7) |
[21] As for lists, you might not see dictionaries coded in full using literals very often—programs rarely know all their data before they are run, and more typically extract it dynamically from users, files, and so on. Lists and dictionaries are grown in different ways, though. In the next section you’ll see that you often build up dictionaries by assigning to new keys at runtime; this approach fails for lists, which are commonly grown with append or extend instead.
Dictionaries in Action
As Table 8-2 suggests, dictionaries are indexed by key, and nested dictionary entries are referenced by a series of indexes (keys in square brackets). When Python creates a dictionary, it stores its items in any left-to-right order it chooses; to fetch a value back, you supply the key with which it is associated, not its relative position. Let’s go back to the interpreter to get a feel for some of the dictionary operations in Table 8-2.
Basic Dictionary Operations
In normal operation, you create dictionaries with literals and store and access items by key with indexing:
% python
>>> D = {'spam': 2, 'ham': 1, 'eggs': 3} # Make a dictionary
>>> D['spam'] # Fetch a value by key
2
>>> D # Order is "scrambled"
{'eggs': 3, 'spam': 2, 'ham': 1}
Here, the dictionary is assigned to the variable D; the value of the key 'spam' is the integer 2, and so on. We use the same square bracket syntax to index dictionaries by key as we did to index lists by offset, but here it means access by key, not by position.
Notice the end of this example—much like sets, the left-to-right order of keys in a dictionary will almost always be different from what you originally typed. This is on purpose: to implement fast key lookup (a.k.a. hashing), keys need to be reordered in memory. That’s why operations that assume a fixed left-to-right order (e.g., slicing, concatenation) do not apply to dictionaries; you can fetch values only by key, not by position. Technically, the ordering is pseudo-random—it’s not truly random (you might be able to decipher it given Python’s source code and a lot of time to kill), but it’s arbitrary, and might vary per release and platform, and even per interactive session in Python 3.3.
The built-in len function works on dictionaries, too; it returns the number of items stored in the dictionary or, equivalently, the length of its keys list. The dictionary in membership operator allows you to test for key existence, and the keys method returns all the keys in the dictionary. The latter of these can be useful for processing dictionaries sequentially, but you shouldn’t depend on the order of the keys list. Because the keys result can be used as a normal list, however, it can always be sorted if order matters (more on sorting and dictionaries later):
>>> len(D) # Number of entries in dictionary
3
>>> 'ham' in D # Key membership test alternative
True
>>> list(D.keys()) # Create a new list of D's keys
['eggs', 'spam', 'ham']
Observe the second expression in this listing. As mentioned earlier, the in membership test used for strings and lists also works on dictionaries—it checks whether a key is stored in the dictionary. Technically, this works because dictionaries define iterators that step through their keys lists automatically. Other types provide iterators that reflect their common uses; files, for example, have iterators that read line by line. We’ll discuss iterators more formally in Chapter 14 and Chapter 20.
Also note the syntax of the last example in this listing. We have to enclose it in a list call in Python 3.X for similar reasons—keys in 3.X returns an iterable object, instead of a physical list. The list call forces it to produce all its values at once so we can print them interactively, though this call isn’t required some other contexts. In 2.X, keys builds and returns an actual list, so the list call isn’t even needed to display a result; more on this later in this chapter.
Changing Dictionaries in Place
Let’s continue with our interactive session. Dictionaries, like lists, are mutable, so you can change, expand, and shrink them in place without making new dictionaries: simply assign a value to a key to change or create an entry. The del statement works here, too; it deletes the entry associated with the key specified as an index. Notice also the nesting of a list inside a dictionary in this example (the value of the key 'ham'). All collection data types in Python can nest inside each other arbitrarily:
>>> D
{'eggs': 3, 'spam': 2, 'ham': 1}
>>> D['ham'] = ['grill', 'bake', 'fry'] # Change entry (value=list)
>>> D
{'eggs': 3, 'spam': 2, 'ham': ['grill', 'bake', 'fry']}
>>> del D['eggs'] # Delete entry
>>> D
{'spam': 2, 'ham': ['grill', 'bake', 'fry']}
>>> D['brunch'] = 'Bacon' # Add new entry
>>> D
{'brunch': 'Bacon', 'spam': 2, 'ham': ['grill', 'bake', 'fry']}
Like lists, assigning to an existing index in a dictionary changes its associated value. Unlike lists, however, whenever you assign a new dictionary key (one that hasn’t been assigned before) you create a new entry in the dictionary, as was done in the previous example for the key 'brunch'. This doesn’t work for lists because you can only assign to existing list offsets—Python considers an offset beyond the end of a list out of bounds and raises an error. To expand a list, you need to use tools such as the append method or slice assignment instead.
More Dictionary Methods
Dictionary methods provide a variety of type-specific tools. For instance, the dictionary values and items methods return all of the dictionary’s values and (key,value) pair tuples, respectively; along with keys, these are useful in loops that need to step through dictionary entries one by one (we’ll start coding examples of such loops in the next section). As for keys, these two methods also return iterable objects in 3.X, so wrap them in a list call there to collect their values all at once for display:
>>> D = {'spam': 2, 'ham': 1, 'eggs': 3}
>>> list(D.values())
[3, 2, 1]
>>> list(D.items())
[('eggs', 3), ('spam', 2), ('ham', 1)]
In realistic programs that gather data as they run, you often won’t be able to predict what will be in a dictionary before the program is launched, much less when it’s coded. Fetching a nonexistent key is normally an error, but the get method returns a default value—None, or a passed-in default—if the key doesn’t exist. It’s an easy way to fill in a default for a key that isn’t present, and avoid a missing-key error when your program can’t anticipate contents ahead of time:
>>> D.get('spam') # A key that is there
2
>>> print(D.get('toast')) # A key that is missing
None
>>> D.get('toast', 88)
88
The update method provides something similar to concatenation for dictionaries, though it has nothing to do with left-to-right ordering (again, there is no such thing in dictionaries). It merges the keys and values of one dictionary into another, blindly overwriting values of the same key if there’s a clash:
>>> D
{'eggs': 3, 'spam': 2, 'ham': 1}
>>> D2 = {'toast':4, 'muffin':5} # Lots of delicious scrambled order here
>>> D.update(D2)
>>> D
{'eggs': 3, 'muffin': 5, 'toast': 4, 'spam': 2, 'ham': 1}
Notice how mixed up the key order is in the last result; again, that’s just how dictionaries work. Finally, the dictionary pop method deletes a key from a dictionary and returns the value it had. It’s similar to the list pop method, but it takes a key instead of an optional position:
# pop a dictionary by key
>>> D
{'eggs': 3, 'muffin': 5, 'toast': 4, 'spam': 2, 'ham': 1}
>>> D.pop('muffin')
5
>>> D.pop('toast') # Delete and return from a key
4
>>> D
{'eggs': 3, 'spam': 2, 'ham': 1}
# pop a list by position
>>> L = ['aa', 'bb', 'cc', 'dd']
>>> L.pop() # Delete and return from the end
'dd'
>>> L
['aa', 'bb', 'cc']
>>> L.pop(1) # Delete from a specific position
'bb'
>>> L
['aa', 'cc']
Dictionaries also provide a copy method; we’ll revisit this in Chapter 9, as it’s a way to avoid the potential side effects of shared references to the same dictionary. In fact, dictionaries come with more methods than those listed in Table 8-2; see the Python library manual, dir and help, or other reference sources for a comprehensive list.
NOTE
Your dictionary ordering may vary: Don’t be alarmed if your dictionaries print in a different order than shown here. As mentioned, key order is arbitrary, and might vary per release, platform, and interactive session in 3.3 (and quite possibly per day of the week, and phase of the moon!).
Most of the dictionary examples in this book reflect Python 3.3’s key ordering, but it has changed both since and prior to 3.0. Your Python’s key order may vary, but you’re not supposed to care anyhow: dictionaries are processed by key, not position. Programs shouldn’t rely on the arbitrary order of keys in dictionaries, even if shown in books.
There are extension types in Python’s standard library that maintain insertion order among their keys—see OrderedDict in the collections module—but they are hybrids that incur extra space and speed overheads to achieve their extra utility, and are not true dictionaries. In short, keys are kept redundantly in a linked list to support sequence operations.
As we’ll see in Chapter 9, this module also implements a namedtuple that allows tuple items to be accessed by both attribute name and sequence position—a sort of tuple/class/dictionary hybrid that adds processing steps and is not a core object type in any event. Python’s library manual has the full story on these and other extension types.
Example: Movie Database
Let’s look at a more realistic dictionary example. In honor of Python’s namesake, the following example creates a simple in-memory Monty Python movie database, as a table that maps movie release date years (the keys) to movie titles (the values). As coded, you fetch movie names by indexing on release year strings:
>>> table = {'1975': 'Holy Grail', # Key: Value
... '1979': 'Life of Brian',
... '1983': 'The Meaning of Life'}
>>>
>>> year = '1983'
>>> movie = table[year] # dictionary[Key] => Value
>>> movie
'The Meaning of Life'
>>> for year in table: # Same as: for year in table.keys()
... print(year + '\t' + table[year])
...
1979 Life of Brian
1975 Holy Grail
1983 The Meaning of Life
The last command uses a for loop, which we previewed in Chapter 4 but haven’t covered in detail yet. If you aren’t familiar with for loops, this command simply iterates through each key in the table and prints a tab-separated list of keys and their values. We’ll learn more about for loops in Chapter 13.
Dictionaries aren’t sequences like lists and strings, but if you need to step through the items in a dictionary, it’s easy—calling the dictionary keys method returns all stored keys, which you can iterate through with a for. If needed, you can index from key to value inside the for loop as you go, as was done in this code.
In fact, Python also lets you step through a dictionary’s keys list without actually calling the keys method in most for loops. For any dictionary D, saying for key in D works the same as saying the complete for key in D.keys(). This is really just another instance of the iteratorsmentioned earlier, which allow the in membership operator to work on dictionaries as well; more on iterators later in this book.
Preview: Mapping values to keys
Notice how the prior table maps year to titles, but not vice versa. If you want to map the other way—titles to years—you can either code the dictionary differently, or use methods like items that give searchable sequences, though using them to best effect requires more background information than we yet have:
>>> table = {'Holy Grail': '1975', # Key=>Value (title=>year)
... 'Life of Brian': '1979',
... 'The Meaning of Life': '1983'}
>>>
>>> table['Holy Grail']
'1975'
>>> list(table.items()) # Value=>Key (year=>title)
[('The Meaning of Life', '1983'), ('Holy Grail', '1975'), ('Life of Brian', '1979')]
>>> [title for (title, year) in table.items() if year == '1975']
['Holy Grail']
The last command here is in part a preview for the comprehension syntax introduced in Chapter 4 and covered in full in Chapter 14. In short, it scans the dictionary’s (key, value) tuple pairs returned by the items method, selecting keys having a specified value. The net effect is to indexbackward—from value to key, instead of key to value—useful if you want to store data just once and map backward only rarely (searching through sequences like this is generally much slower than a direct key index).
In fact, although dictionaries by nature map keys to values unidirectionally, there are multiple ways to map values back to keys with a bit of extra generalizable code:
>>> K = 'Holy Grail'
>>> table[K] # Key=>Value (normal usage)
'1975'
>>> V = '1975'
>>> [key for (key, value) in table.items() if value == V] # Value=>Key
['Holy Grail']
>>> [key for key in table.keys() if table[key] == V] # Ditto
['Holy Grail']
Note that both of the last two commands return a list of titles: in dictionaries, there’s just one value per key, but there may be many keys per value. A given value may be stored under multiple keys (yielding multiple keys per value), and a value might be a collection itself (supporting multiple values per key). For more on this front, also watch for a dictionary inversion function in Chapter 32’s mapattrs.py example—code that would surely stretch this preview past its breaking point if included here. For this chapter’s purposes, let’s explore more dictionary basics.
Dictionary Usage Notes
Dictionaries are fairly straightforward tools once you get the hang of them, but here are a few additional pointers and reminders you should be aware of when using them:
§ Sequence operations don’t work. Dictionaries are mappings, not sequences; because there’s no notion of ordering among their items, things like concatenation (an ordered joining) and slicing (extracting a contiguous section) simply don’t apply. In fact, Python raises an error when your code runs if you try to do such things.
§ Assigning to new indexes adds entries. Keys can be created when you write a dictionary literal (embedded in the code of the literal itself), or when you assign values to new keys of an existing dictionary object individually. The end result is the same.
§ Keys need not always be strings. Our examples so far have used strings as keys, but any other immutable objects work just as well. For instance, you can use integers as keys, which makes the dictionary look much like a list (when indexing, at least). Tuples may be used as dictionary keys too, allowing compound key values—such as dates and IP addresses—to have associated values. User-defined class instance objects (discussed in Part VI) can also be used as keys, as long as they have the proper protocol methods; roughly, they need to tell Python that their values are “hashable” and thus won’t change, as otherwise they would be useless as fixed keys. Mutable objects such as lists, sets, and other dictionaries don’t work as keys, but are allowed as values.
Using dictionaries to simulate flexible lists: Integer keys
The last point in the prior list is important enough to demonstrate with a few examples. When you use lists, it is illegal to assign to an offset that is off the end of the list:
>>> L = []
>>> L[99] = 'spam'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: list assignment index out of range
Although you can use repetition to preallocate as big a list as you’ll need (e.g., [0]*100), you can also do something that looks similar with dictionaries that does not require such space allocations. By using integer keys, dictionaries can emulate lists that seem to grow on offset assignment:
>>> D = {}
>>> D[99] = 'spam'
>>> D[99]
'spam'
>>> D
{99: 'spam'}
Here, it looks as if D is a 100-item list, but it’s really a dictionary with a single entry; the value of the key 99 is the string 'spam'. You can access this structure with offsets much like a list, catching nonexistent keys with get or in tests if required, but you don’t have to allocate space for all the positions you might ever need to assign values to in the future. When used like this, dictionaries are like more flexible equivalents of lists.
As another example, we might also employ integer keys in our first movie database’s code earlier to avoid quoting the year, albeit at the expense of some expressiveness (keys cannot contain nondigit characters):
>>> table = {1975: 'Holy Grail',
... 1979: 'Life of Brian', # Keys are integers, not strings
... 1983: 'The Meaning of Life'}
>>> table[1975]
'Holy Grail'
>>> list(table.items())
[(1979, 'Life of Brian'), (1983, 'The Meaning of Life'), (1975, 'Holy Grail')]
Using dictionaries for sparse data structures: Tuple keys
In a similar way, dictionary keys are also commonly leveraged to implement sparse data structures—for example, multidimensional arrays where only a few positions have values stored in them:
>>> Matrix = {}
>>> Matrix[(2, 3, 4)] = 88
>>> Matrix[(7, 8, 9)] = 99
>>>
>>> X = 2; Y = 3; Z = 4 # ; separates statements: see Chapter 10
>>> Matrix[(X, Y, Z)]
88
>>> Matrix
{(2, 3, 4): 88, (7, 8, 9): 99}
Here, we’ve used a dictionary to represent a three-dimensional array that is empty except for the two positions (2,3,4) and (7,8,9). The keys are tuples that record the coordinates of nonempty slots. Rather than allocating a large and mostly empty three-dimensional matrix to hold these values, we can use a simple two-item dictionary. In this scheme, accessing an empty slot triggers a nonexistent key exception, as these slots are not physically stored:
>>> Matrix[(2,3,6)]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: (2, 3, 6)
Avoiding missing-key errors
Errors for nonexistent key fetches are common in sparse matrixes, but you probably won’t want them to shut down your program. There are at least three ways to fill in a default value instead of getting such an error message—you can test for keys ahead of time in if statements, use a trystatement to catch and recover from the exception explicitly, or simply use the dictionary get method shown earlier to provide a default for keys that do not exist. Consider the first two of these previews for statement syntax we’ll begin studying in Chapter 10:
>>> if (2, 3, 6) in Matrix: # Check for key before fetch
... print(Matrix[(2, 3, 6)]) # See Chapters 10 and 12 for if/else
... else:
... print(0)
...
0
>>> try:
... print(Matrix[(2, 3, 6)]) # Try to index
... except KeyError: # Catch and recover
... print(0) # See Chapters 10 and 34 for try/except
...
0
>>> Matrix.get((2, 3, 4), 0) # Exists: fetch and return
88
>>> Matrix.get((2, 3, 6), 0) # Doesn't exist: use default arg
0
Of these, the get method is the most concise in terms of coding requirements, but the if and try statements are much more general in scope; again, more on these starting in Chapter 10.
Nesting in dictionaries
As you can see, dictionaries can play many roles in Python. In general, they can replace search data structures (because indexing by key is a search operation) and can represent many types of structured information. For example, dictionaries are one of many ways to describe the properties of an item in your program’s domain; that is, they can serve the same role as “records” or “structs” in other languages.
The following, for example, fills out a dictionary describing a hypothetical person, by assigning to new keys over time (if you are a Bob, my apologies for picking on your name in this book—it’s easy to type!):
>>> rec = {}
>>> rec['name'] = 'Bob'
>>> rec['age'] = 40.5
>>> rec['job'] = 'developer/manager'
>>>
>>> print(rec['name'])
Bob
Especially when nested, Python’s built-in data types allow us to easily represent structured information. The following again uses a dictionary to capture object properties, but it codes it all at once (rather than assigning to each key separately) and nests a list and a dictionary to represent structured property values:
>>> rec = {'name': 'Bob',
... 'jobs': ['developer', 'manager'],
... 'web': 'www.bobs.org/˜Bob',
... 'home': {'state': 'Overworked', 'zip': 12345}}
To fetch components of nested objects, simply string together indexing operations:
>>> rec['name']
'Bob'
>>> rec['jobs']
['developer', 'manager']
>>> rec['jobs'][1]
'manager'
>>> rec['home']['zip']
12345
Although we’ll learn in Part VI that classes (which group both data and logic) can be better in this record role, dictionaries are an easy-to-use tool for simpler requirements. For more on record representation choices, see also the upcoming sidebar Why You Will Care: Dictionaries Versus Lists, as well as its extension to tuples in Chapter 9 and classes in Chapter 27.
Also notice that while we’ve focused on a single “record” with nested data here, there’s no reason we couldn’t nest the record itself in a larger, enclosing database collection coded as a list or dictionary, though an external file or formal database interface often plays the role of top-level container in realistic programs:
db = []
db.append(rec) # A list "database"
db.append(other)
db[0]['jobs']
db = {}
db['bob'] = rec # A dictionary "database"
db['sue'] = other
db['bob']['jobs']
Later in the book we’ll meet tools such as Python’s shelve, which works much the same way, but automatically maps objects to and from files to make them permanent (watch for more in this chapter’s sidebar Why You Will Care: Dictionary Interfaces).
Other Ways to Make Dictionaries
Finally, note that because dictionaries are so useful, more ways to build them have emerged over time. In Python 2.3 and later, for example, the last two calls to the dict constructor (really, type name) shown here have the same effect as the literal and key-assignment forms above them:
{'name': 'Bob', 'age': 40} # Traditional literal expression
D = {} # Assign by keys dynamically
D['name'] = 'Bob'
D['age'] = 40
dict(name='Bob', age=40) # dict keyword argument form
dict([('name', 'Bob'), ('age', 40)]) # dict key/value tuples form
All four of these forms create the same two-key dictionary, but they are useful in differing circumstances:
§ The first is handy if you can spell out the entire dictionary ahead of time.
§ The second is of use if you need to create the dictionary one field at a time on the fly.
§ The third involves less typing than the first, but it requires all keys to be strings.
§ The last is useful if you need to build up keys and values as sequences at runtime.
We met keyword arguments earlier when sorting; the third form illustrated in this code listing has become especially popular in Python code today, since it has less syntax (and hence there is less opportunity for mistakes). As suggested previously in Table 8-2, the last form in the listing is also commonly used in conjunction with the zip function, to combine separate lists of keys and values obtained dynamically at runtime (parsed out of a data file’s columns, for instance):
dict(zip(keyslist, valueslist)) # Zipped key/value tuples form (ahead)
More on zipping dictionary keys in the next section. Provided all the key’s values are the same initially, you can also create a dictionary with this special form—simply pass in a list of keys and an initial value for all of the values (the default is None):
>>> dict.fromkeys(['a', 'b'], 0)
{'a': 0, 'b': 0}
Although you could get by with just literals and key assignments at this point in your Python career, you’ll probably find uses for all of these dictionary-creation forms as you start applying them in realistic, flexible, and dynamic Python programs.
The listings in this section document the various ways to create dictionaries in both Python 2.X and 3.X. However, there is yet another way to create dictionaries, available only in Python 3.X and 2.7: the dictionary comprehension expression. To see how this last form looks, we need to move on to the next and final section of this chapter.
WHY YOU WILL CARE: DICTIONARIES VERSUS LISTS
With all the objects in Python’s core types arsenal, some readers may be puzzled over the choice between lists and dictionaries. In short, although both are flexible collections of other objects, lists assign items to positions, and dictionaries assign them to more mnemonic keys. Because of this, dictionary data often carries more meaning to human readers. For example, the nested list structure in row 3 of Table 8-1 could be used to represent a record too:
>>> L = ['Bob', 40.5, ['dev', 'mgr']] # List-based "record"
>>> L[0]
'Bob'
>>> L[1] # Positions/numbers for fields
40.5
>>> L[2][1]
'mgr'
For some types of data, the list’s access-by-position makes sense—a list of employees in a company, the files in a directory, or numeric matrixes, for example. But a more symbolic record like this may be more meaningfully coded as a dictionary along the lines of row 2 in Table 8-2, with labeled fields replacing field positions (this is similar to a record we coded in Chapter 4):
>>> D = {'name': 'Bob', 'age': 40.5, 'jobs': ['dev', 'mgr']}
>>> D['name']
'Bob'
>>> D['age'] # Dictionary-based "record"
40.5
>>> D['jobs'][1] # Names mean more than numbers
'mgr'
For variety, here is the same record recoded with keywords, which may seem even more readable to some human readers:
>>> D = dict(name='Bob', age=40.5, jobs=['dev', 'mgr'])
>>> D['name']
'Bob'
>>> D['jobs'].remove('mgr')
>>> D
{'jobs': ['dev'], 'age': 40.5, 'name': 'Bob'}
In practice, dictionaries tend to be best for data with labeled components, as well as structures that can benefit from quick, direct lookups by name, instead of slower linear searches. As we’ve seen, they also may be better for sparse collections and collections that grow at arbitrary positions.
Python programmers also have access to the sets we studied in Chapter 5, which are much like the keys of a valueless dictionary; they don’t map keys to values, but can often be used like dictionaries for fast lookups when there is no associated value, especially in search routines:
>>> D = {}
>>> D['state1'] = True # A visited-state dictionary
>>> 'state1' in D
True
>>> S = set()
>>> S.add('state1') # Same, but with sets
>>> 'state1' in S
True
Watch for a rehash of this record representation thread in the next chapter, where we’ll see how tuples and named tuples compare to dictionaries in this role, as well as in Chapter 27, where we’ll learn how user-defined classes factor into this picture, combining both data and logic to process it.
Dictionary Changes in Python 3.X and 2.7
This chapter has so far focused on dictionary basics that span releases, but the dictionary’s functionality has mutated in Python 3.X. If you are using Python 2.X code, you may come across some dictionary tools that either behave differently or are missing altogether in 3.X. Moreover, 3.X coders have access to additional dictionary tools not available in 2.X, apart from two back-ports to 2.7.
Specifically, dictionaries in Python 3.X:
§ Support a new dictionary comprehension expression, a close cousin to list and set comprehensions
§ Return set-like iterable views instead of lists for the methods D.keys, D.values, and D.items
§ Require new coding styles for scanning by sorted keys, because of the prior point
§ No longer support relative magnitude comparisons directly—compare manually instead
§ No longer have the D.has_key method—the in membership test is used instead
As later back-ports from 3.X, dictionaries in Python 2.7 (but not earlier in 2.X):
§ Support item 1 in the prior list—dictionary comprehensions—as a direct back-port from 3.X
§ Support item 2 in the prior list—set-like iterable views—but do so with special method names D.viewkeys, D.viewvalues, D.viewitems); their nonview methods return lists as before
Because of this overlap, some of the material in this section pertains both to 3.X and 2.7, but is presented here in the context of 3.X extensions because of its origin. With that in mind, let’s take a look at what’s new in dictionaries in 3.X and 2.7.
Dictionary comprehensions in 3.X and 2.7
As mentioned at the end of the prior section, dictionaries in 3.X and 2.7 can also be created with dictionary comprehensions. Like the set comprehensions we met in Chapter 5, dictionary comprehensions are available only in 3.X and 2.7 (not in 2.6 and earlier). Like the longstanding list comprehensions we met briefly in Chapter 4 and earlier in this chapter, they run an implied loop, collecting the key/value results of expressions on each iteration and using them to fill out a new dictionary. A loop variable allows the comprehension to use loop iteration values along the way.
To illustrate, a standard way to initialize a dictionary dynamically in both 2.X and 3.X is to combine its keys and values with zip, and pass the result to the dict call. The zip built-in function is the hook that allows us to construct a dictionary from key and value lists this way—if you cannot predict the set of keys and values in your code, you can always build them up as lists and zip them together. We’ll study zip in detail in Chapter 13 and Chapter 14 after exploring statements; it’s an iterable in 3.X, so we must wrap it in a list call to show its results there, but its basic usage is otherwise straightforward:
>>> list(zip(['a', 'b', 'c'], [1, 2, 3])) # Zip together keys and values
[('a', 1), ('b', 2), ('c', 3)]
>>> D = dict(zip(['a', 'b', 'c'], [1, 2, 3])) # Make a dict from zip result
>>> D
{'b': 2, 'c': 3, 'a': 1}
In Python 3.X and 2.7, though, you can achieve the same effect with a dictionary comprehension expression. The following builds a new dictionary with a key/value pair for every such pair in the zip result (it reads almost the same in Python, but with a bit more formality):
>>> D = {k: v for (k, v) in zip(['a', 'b', 'c'], [1, 2, 3])}
>>> D
{'b': 2, 'c': 3, 'a': 1}
Comprehensions actually require more code in this case, but they are also more general than this example implies—we can use them to map a single stream of values to dictionaries as well, and keys can be computed with expressions just like values:
>>> D = {x: x ** 2 for x in [1, 2, 3, 4]} # Or: range(1, 5)
>>> D
{1: 1, 2: 4, 3: 9, 4: 16}
>>> D = {c: c * 4 for c in 'SPAM'} # Loop over any iterable
>>> D
{'S': 'SSSS', 'P': 'PPPP', 'A': 'AAAA', 'M': 'MMMM'}
>>> D = {c.lower(): c + '!' for c in ['SPAM', 'EGGS', 'HAM']}
>>> D
{'eggs': 'EGGS!', 'spam': 'SPAM!', 'ham': 'HAM!'}
Dictionary comprehensions are also useful for initializing dictionaries from keys lists, in much the same way as the fromkeys method we met at the end of the preceding section:
>>> D = dict.fromkeys(['a', 'b', 'c'], 0) # Initialize dict from keys
>>> D
{'b': 0, 'c': 0, 'a': 0}
>>> D = {k:0 for k in ['a', 'b', 'c']} # Same, but with a comprehension
>>> D
{'b': 0, 'c': 0, 'a': 0}
>>> D = dict.fromkeys('spam') # Other iterables, default value
>>> D
{'s': None, 'p': None, 'a': None, 'm': None}
>>> D = {k: None for k in 'spam'}
>>> D
{'s': None, 'p': None, 'a': None, 'm': None}
Like related tools, dictionary comprehensions support additional syntax not shown here, including nested loops and if clauses. Unfortunately, to truly understand dictionary comprehensions, we need to also know more about iteration statements and concepts in Python, and we don’t yet have enough information to address that story well. We’ll learn much more about all flavors of comprehensions (list, set, dictionary, and generator) in Chapter 14 and Chapter 20, so we’ll defer further details until later. We’ll also revisit the zip built-in we used in this section in more detail inChapter 13, when we explore for loops.
Dictionary views in 3.X (and 2.7 via new methods)
In 3.X the dictionary keys, values, and items methods all return view objects, whereas in 2.X they return actual result lists. This functionality is also available in Python 2.7, but in the guise of the special, distinct method names listed at the start of this section (2.7’s normal methods still return simple lists, so as to avoid breaking existing 2.X code); because of this, I’ll refer to this as a 3.X feature in this section.
View objects are iterables, which simply means objects that generate result items one at a time, instead of producing the result list all at once in memory. Besides being iterable, dictionary views also retain the original order of dictionary components, reflect future changes to the dictionary, and may support set operations. On the other hand, because they are not lists, they do not directly support operations like indexing or the list sort method, and do not display their items as a normal list when printed (they do show their components as of Python 3.1 but not as a list, and are still a divergence from 2.X).
We’ll discuss the notion of iterables more formally in Chapter 14, but for our purposes here it’s enough to know that we have to run the results of these three methods through the list built-in if we want to apply list operations or display their values. For example, in Python 3.3 (other version’s outputs may differ slightly):
>>> D = dict(a=1, b=2, c=3)
>>> D
{'b': 2, 'c': 3, 'a': 1}
>>> K = D.keys() # Makes a view object in 3.X, not a list
>>> K
dict_keys(['b', 'c', 'a'])
>>> list(K) # Force a real list in 3.X if needed
['b', 'c', 'a']
>>> V = D.values() # Ditto for values and items views
>>> V
dict_values([2, 3, 1])
>>> list(V)
[2, 3, 1]
>>> D.items()
dict_items([('b', 2), ('c', 3), ('a', 1)])
>>> list(D.items())
[('b', 2), ('c', 3), ('a', 1)]
>>> K[0] # List operations fail unless converted
TypeError: 'dict_keys' object does not support indexing
>>> list(K)[0]
'b'
Apart from result displays at the interactive prompt, you will probably rarely even notice this change, because looping constructs in Python automatically force iterable objects to produce one result on each iteration:
>>> for k in D.keys(): print(k) # Iterators used automatically in loops
...
b
c
a
In addition, 3.X dictionaries still have iterators themselves, which return successive keys—as in 2.X, it’s still often not necessary to call keys directly:
>>> for key in D: print(key) # Still no need to call keys() to iterate
...
b
c
a
Unlike 2.X’s list results, though, dictionary views in 3.X are not carved in stone when created—they dynamically reflect future changes made to the dictionary after the view object has been created:
>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> D
{'b': 2, 'c': 3, 'a': 1}
>>> K = D.keys()
>>> V = D.values()
>>> list(K) # Views maintain same order as dictionary
['b', 'c', 'a']
>>> list(V)
[2, 3, 1]
>>> del D['b'] # Change the dictionary in place
>>> D
{'c': 3, 'a': 1}
>>> list(K) # Reflected in any current view objects
['c', 'a']
>>> list(V) # Not true in 2.X! - lists detached from dict
[3, 1]
Dictionary views and sets
Also unlike 2.X’s list results, 3.X’s view objects returned by the keys method are set-like and support common set operations such as intersection and union; values views are not set-like, but items results are if their (key, value) pairs are unique and hashable (immutable). Given that sets behave much like valueless dictionaries (and may even be coded in curly braces like dictionaries in 3.X and 2.7), this is a logical symmetry. Per Chapter 5, set items are unordered, unique, and immutable, just like dictionary keys.
Here is what keys views look like when used in set operations (continuing the prior section’s session); dictionary value views are never set-like, since their items are not necessarily unique or immutable:
>>> K, V
(dict_keys(['c', 'a']), dict_values([3, 1]))
>>> K | {'x': 4} # Keys (and some items) views are set-like
{'c', 'x', 'a'}
>>> V & {'x': 4}
TypeError: unsupported operand type(s) for &: 'dict_values' and 'dict'
>>> V & {'x': 4}.values()
TypeError: unsupported operand type(s) for &: 'dict_values' and 'dict_values'
In set operations, views may be mixed with other views, sets, and dictionaries; dictionaries are treated the same as their keys views in this context:
>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> D.keys() & D.keys() # Intersect keys views
{'b', 'c', 'a'}
>>> D.keys() & {'b'} # Intersect keys and set
{'b'}
>>> D.keys() & {'b': 1} # Intersect keys and dict
{'b'}
>>> D.keys() | {'b', 'c', 'd'} # Union keys and set
{'b', 'c', 'a', 'd'}
Items views are set-like too if they are hashable—that is, if they contain only immutable objects:
>>> D = {'a': 1}
>>> list(D.items()) # Items set-like if hashable
[('a', 1)]
>>> D.items() | D.keys() # Union view and view
{('a', 1), 'a'}
>>> D.items() | D # dict treated same as its keys
{('a', 1), 'a'}
>>> D.items() | {('c', 3), ('d', 4)} # Set of key/value pairs
{('d', 4), ('a', 1), ('c', 3)}
>>> dict(D.items() | {('c', 3), ('d', 4)}) # dict accepts iterable sets too
{'c': 3, 'a': 1, 'd': 4}
See Chapter 5’s coverage of sets if you need a refresher on these operations. Here, let’s wrap up with three other quick coding notes for 3.X dictionaries.
Sorting dictionary keys in 3.X
First of all, because keys does not return a list in 3.X, the traditional coding pattern for scanning a dictionary by sorted keys in 2.X won’t work in 3.X:
>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> D
{'b': 2, 'c': 3, 'a': 1}
>>> Ks = D.keys() # Sorting a view object doesn't work!
>>> Ks.sort()
AttributeError: 'dict_keys' object has no attribute 'sort'
To work around this, in 3.X you must either convert to a list manually or use the sorted call (introduced in Chapter 4 and covered in this chapter) on either a keys view or the dictionary itself:
>>> Ks = list(Ks) # Force it to be a list and then sort
>>> Ks.sort()
>>> for k in Ks: print(k, D[k]) # 2.X: omit outer parens in prints
...
a 1
b 2
c 3
>>> D
{'b': 2, 'c': 3, 'a': 1}
>>> Ks = D.keys() # Or you can use sorted() on the keys
>>> for k in sorted(Ks): print(k, D[k]) # sorted() accepts any iterable
... # sorted() returns its result
a 1
b 2
c 3
Of these, using the dictionary’s keys iterator is probably preferable in 3.X, and works in 2.X as well:
>>> D
{'b': 2, 'c': 3, 'a': 1} # Better yet, sort the dict directly
>>> for k in sorted(D): print(k, D[k]) # dict iterators return keys
...
a 1
b 2
c 3
Dictionary magnitude comparisons no longer work in 3.X
Secondly, while in Python 2.X dictionaries may be compared for relative magnitude directly with <, >, and so on, in Python 3.X this no longer works. However, you can simulate it by comparing sorted keys lists manually:
sorted(D1.items()) < sorted(D2.items()) # Like 2.X D1 < D2
Dictionary equality tests (e.g., D1 == D2) still work in 3.X, though. Since we’ll revisit this near the end of the next chapter in the context of comparisons at large, we’ll postpone further details here.
The has_key method is dead in 3.X: Long live in!
Finally, the widely used dictionary has_key key presence test method is gone in 3.X. Instead, use the in membership expression, or a get with a default test (of these, in is generally preferred):
>>> D
{'b': 2, 'c': 3, 'a': 1}
>>> D.has_key('c') # 2.X only: True/False
AttributeError: 'dict' object has no attribute 'has_key'
>>> 'c' in D # Required in 3.X
True
>>> 'x' in D # Preferred in 2.X today
False
>>> if 'c' in D: print('present', D['c']) # Branch on result
...
present 3
>>> print(D.get('c')) # Fetch with default
3
>>> print(D.get('x'))
None
>>> if D.get('c') != None: print('present', D['c']) # Another option
...
present 3
To summarize, the dictionary story changes substantially in 3.X. If you work in 2.X and care about 3.X compatibility (or suspect that you might someday), here are some pointers. Of the 3.X changes we’ve met in this section:
§ The first (dictionary comprehensions) can be coded only in 3.X and 2.7.
§ The second (dictionary views) can be coded only in 3.X, and with special method names in 2.7.
However, the last three techniques—sorted, manual comparisons, and in—can be coded in 2.X today to ease 3.X migration in the future.
WHY YOU WILL CARE: DICTIONARY INTERFACES
Dictionaries aren’t just a convenient way to store information by key in your programs—some Python extensions also present interfaces that look like and work the same as dictionaries. For instance, Python’s interface to DBM access-by-key files looks much like a dictionary that must be opened. You store and fetch strings using key indexes:
import dbm # Named anydbm in Python 2.X
file = dbm.open("filename") # Link to file
file['key'] = 'data' # Store data by key
data = file['key'] # Fetch data by key
In Chapter 28, you’ll see that you can store entire Python objects this way, too, if you replace dbm in the preceding code with shelve (shelves are access-by-key databases that store persistent Python objects, not just strings). For Internet work, Python’s CGI script support also presents a dictionary-like interface. A call to cgi.FieldStorage yields a dictionary-like object with one entry per input field on the client’s web page:
import cgi
form = cgi.FieldStorage() # Parse form data
if 'name' in form:
showReply('Hello, ' + form['name'].value)
Though dictionaries are the only core mapping type, all of these others are instances of mappings, and support most of the same operations. Once you learn dictionary interfaces, you’ll find that they apply to a variety of built-in tools in Python.
For another dictionary use case, see also Chapter 9’s upcoming overview of JSON—a language-neutral data format used for databases and data transfer. Python dictionaries, lists, and nested combinations of them can almost pass for records in this format as is, and may be easily translated to and from formal JSON text strings with Python’s json standard library module.
Chapter Summary
In this chapter, we explored the list and dictionary types—probably the two most common, flexible, and powerful collection types you will see and use in Python code. We learned that the list type supports positionally ordered collections of arbitrary objects, and that it may be freely nested and grown and shrunk on demand. The dictionary type is similar, but it stores items by key instead of by position and does not maintain any reliable left-to-right order among its items. Both lists and dictionaries are mutable, and so support a variety of in-place change operations not available for strings: for example, lists can be grown by append calls, and dictionaries by assignment to new keys.
In the next chapter, we will wrap up our in-depth core object type tour by looking at tuples and files. After that, we’ll move on to statements that code the logic that processes our objects, taking us another step toward writing complete programs. Before we tackle those topics, though, here are some chapter quiz questions to review.
Test Your Knowledge: Quiz
1. Name two ways to build a list containing five integer zeros.
2. Name two ways to build a dictionary with two keys, 'a' and 'b', each having an associated value of 0.
3. Name four operations that change a list object in place.
4. Name four operations that change a dictionary object in place.
5. Why might you use a dictionary instead of a list?
Test Your Knowledge: Answers
1. A literal expression like [0, 0, 0, 0, 0] and a repetition expression like [0] * 5 will each create a list of five zeros. In practice, you might also build one up with a loop that starts with an empty list and appends 0 to it in each iteration, with L.append(0). A list comprehension ([0 for i in range(5)]) could work here, too, but this is more work than you need to do for this answer.
2. A literal expression such as {'a': 0, 'b': 0} or a series of assignments like D = {}, D['a'] = 0, and D['b'] = 0 would create the desired dictionary. You can also use the newer and simpler-to-code dict(a=0, b=0) keyword form, or the more flexible dict([('a', 0), ('b', 0)]) key/value sequences form. Or, because all the values are the same, you can use the special form dict.fromkeys('ab', 0). In 3.X and 2.7, you can also use a dictionary comprehension: {k:0 for k in 'ab'}, though again, this may be overkill here.
3. The append and extend methods grow a list in place, the sort and reverse methods order and reverse lists, the insert method inserts an item at an offset, the remove and pop methods delete from a list by value and by position, the del statement deletes an item or slice, and index and slice assignment statements replace an item or entire section. Pick any four of these for the quiz.
4. Dictionaries are primarily changed by assignment to a new or existing key, which creates or changes the key’s entry in the table. Also, the del statement deletes a key’s entry, the dictionary update method merges one dictionary into another in place, and D.pop(key) removes a key and returns the value it had. Dictionaries also have other, more exotic in-place change methods not presented in this chapter, such as setdefault; see reference sources for more details.
5. Dictionaries are generally better when the data is labeled (a record with field names, for example); lists are best suited to collections of unlabeled items (such as all the files in a directory). Dictionary lookup is also usually quicker than searching a list, though this might vary per program.
All materials on the site are licensed Creative Commons Attribution-Sharealike 3.0 Unported CC BY-SA 3.0 & GNU Free Documentation License (GFDL)
If you are the copyright holder of any material contained on our site and intend to remove it, please contact our site administrator for approval.
© 2016-2025 All site design rights belong to S.Y.A.