Scopes - Functions and Generators - Learning Python (2013)

Learning Python (2013)

Part IV. Functions and Generators

Chapter 17. Scopes

Chapter 16 introduced basic function definitions and calls. As we saw, Python’s core function model is simple to use, but even simple function examples quickly led us to questions about the meaning of variables in our code. This chapter moves on to present the details behind Python’sscopes—the places where variables are defined and looked up. Like module files, scopes help prevent name clashes across your program’s code: names defined in one program unit don’t interfere with names in another.

As we’ll see, the place where a name is assigned in our code is crucial to determining what the name means. We’ll also find that scope usage can have a major impact on program maintenance effort; overuse of globals, for example, is a generally bad thing. On the plus side, we’ll learn that scopes can provide a way to retain state information between function calls, and offer an alternative to classes in some roles.

Python Scope Basics

Now that you’re ready to start writing your own functions, we need to get more formal about what names mean in Python. When you use a name in a program, Python creates, changes, or looks up the name in what is known as a namespace—a place where names live. When we talk about the search for a name’s value in relation to code, the term scope refers to a namespace: that is, the location of a name’s assignment in your source code determines the scope of the name’s visibility to your code.

Just about everything related to names, including scope classification, happens at assignment time in Python. As we’ve seen, names in Python spring into existence when they are first assigned values, and they must be assigned before they are used. Because names are not declared ahead of time, Python uses the location of the assignment of a name to associate it with (i.e., bind it to) a particular namespace. In other words, the place where you assign a name in your source code determines the namespace it will live in, and hence its scope of visibility.

Besides packaging code for reuse, functions add an extra namespace layer to your programs to minimize the potential for collisions among variables of the same name—by default, all names assigned inside a function are associated with that function’s namespace, and no other. This rule means that:

§ Names assigned inside a def can only be seen by the code within that def. You cannot even refer to such names from outside the function.

§ Names assigned inside a def do not clash with variables outside the def, even if the same names are used elsewhere. A name X assigned outside a given def (i.e., in a different def or at the top level of a module file) is a completely different variable from a name X assigned inside thatdef.

In all cases, the scope of a variable (where it can be used) is always determined by where it is assigned in your source code and has nothing to do with which functions call which. In fact, as we’ll learn in this chapter, variables may be assigned in three different places, corresponding to three different scopes:

§ If a variable is assigned inside a def, it is local to that function.

§ If a variable is assigned in an enclosing def, it is nonlocal to nested functions.

§ If a variable is assigned outside all defs, it is global to the entire file.

We call this lexical scoping because variable scopes are determined entirely by the locations of the variables in the source code of your program files, not by function calls.

For example, in the following module file, the X = 99 assignment creates a global variable named X (visible everywhere in this file), but the X = 88 assignment creates a local variable X (visible only within the def statement):

X = 99 # Global (module) scope X

def func():

X = 88 # Local (function) scope X: a different variable

Even though both variables are named X, their scopes make them different. The net effect is that function scopes help to avoid name clashes in your programs and help to make functions more self-contained program units—their code need not be concerned with names used elsewhere.

Scope Details

Before we started writing functions, all the code we wrote was at the top level of a module (i.e., not nested in a def), so the names we used either lived in the module itself or were built-ins predefined by Python (e.g., open). Technically, the interactive prompt is a module named __main__that prints results and doesn’t save its code; in all other ways, though, it’s like the top level of a module file.

Functions, though, provide nested namespaces (scopes) that localize the names they use, such that names inside a function won’t clash with those outside it (in a module or another function). Functions define a local scope and modules define a global scope with the following properties:

§ The enclosing module is a global scope. Each module is a global scope—that is, a namespace in which variables created (assigned) at the top level of the module file live. Global variables become attributes of a module object to the outside world after imports but can also be used as simple variables within the module file itself.

§ The global scope spans a single file only. Don’t be fooled by the word “global” here—names at the top level of a file are global to code within that single file only. There is really no notion of a single, all-encompassing global file-based scope in Python. Instead, names are partitioned into modules, and you must always import a module explicitly if you want to be able to use the names its file defines. When you hear “global” in Python, think “module.”

§ Assigned names are local unless declared global or nonlocal. By default, all the names assigned inside a function definition are put in the local scope (the namespace associated with the function call). If you need to assign a name that lives at the top level of the module enclosing the function, you can do so by declaring it in a global statement inside the function. If you need to assign a name that lives in an enclosing def, as of Python 3.X you can do so by declaring it in a nonlocal statement.

§ All other names are enclosing function locals, globals, or built-ins. Names not assigned a value in the function definition are assumed to be enclosing scope locals, defined in a physically surrounding def statement; globals that live in the enclosing module’s namespace; or built-ins in the predefined built-ins module Python provides.

§ Each call to a function creates a new local scope. Every time you call a function, you create a new local scope—that is, a namespace in which the names created inside that function will usually live. You can think of each def statement (and lambda expression) as defining a new local scope, but the local scope actually corresponds to a function call. Because Python allows functions to call themselves to loop—an advanced technique known as recursion and noted briefly in Chapter 9 when we explored comparisons—each active call receives its own copy of the function’s local variables. Recursion is useful in functions we write as well, to process structures whose shapes can’t be predicted ahead of time; we’ll explore it more fully in Chapter 19.

There are a few subtleties worth underscoring here. First, keep in mind that code typed at the interactive command prompt lives in a module, too, and follows the normal scope rules: they are global variables, accessible to the entire interactive session. You’ll learn more about modules in the next part of this book.

Also note that any type of assignment within a function classifies a name as local. This includes = statements, module names in import, function names in def, function argument names, and so on. If you assign a name in any way within a def, it will become a local to that function by default.

Conversely, in-place changes to objects do not classify names as locals; only actual name assignments do. For instance, if the name L is assigned to a list at the top level of a module, a statement L = X within a function will classify L as a local, but L.append(X) will not. In the latter case, we are changing the list object that L references, not L itself—L is found in the global scope as usual, and Python happily modifies it without requiring a global (or nonlocal) declaration. As usual, it helps to keep the distinction between names and objects clear: changing an object is not an assignment to a name.

Name Resolution: The LEGB Rule

If the prior section sounds confusing, it really boils down to three simple rules. With a def statement:

§ Name assignments create or change local names by default.

§ Name references search at most four scopes: local, then enclosing functions (if any), then global, then built-in.

§ Names declared in global and nonlocal statements map assigned names to enclosing module and function scopes, respectively.

In other words, all names assigned inside a function def statement (or a lambda, an expression we’ll meet later) are locals by default. Functions can freely use names assigned in syntactically enclosing functions and the global scope, but they must declare such nonlocals and globals in order to change them.

Python’s name-resolution scheme is sometimes called the LEGB rule, after the scope names:

§ When you use an unqualified name inside a function, Python searches up to four scopes—the local (L) scope, then the local scopes of any enclosing (E) defs and lambdas, then the global (G) scope, and then the built-in (B) scope—and stops at the first place the name is found. If the name is not found during this search, Python reports an error.

§ When you assign a name in a function (instead of just referring to it in an expression), Python always creates or changes the name in the local scope, unless it’s declared to be global or nonlocal in that function.

§ When you assign a name outside any function (i.e., at the top level of a module file, or at the interactive prompt), the local scope is the same as the global scope—the module’s namespace.

Because names must be assigned before they can be used (as we learned in Chapter 6), there are no automatic components in this model: assignments always determine name scopes unambiguously. Figure 17-1 illustrates Python’s four scopes. Note that the second scope lookup layer, E—the scopes of enclosing defs or lambdas—can technically correspond to more than one lookup level. This case only comes into play when you nest functions within functions, and is enhanced by the nonlocal statement in 3.X.[35]

The LEGB scope lookup rule. When a variable is referenced, Python searches for it in this order: in the local scope, in any enclosing functions’ local scopes, in the global scope, and finally in the built-in scope. The first occurrence wins. The place in your code where a variable is assigned usually determines its scope. In Python 3.X, nonlocal declarations can also force names to be mapped to enclosing function scopes, whether assigned or not.

Figure 17-1. The LEGB scope lookup rule. When a variable is referenced, Python searches for it in this order: in the local scope, in any enclosing functions’ local scopes, in the global scope, and finally in the built-in scope. The first occurrence wins. The place in your code where a variable is assigned usually determines its scope. In Python 3.X, nonlocal declarations can also force names to be mapped to enclosing function scopes, whether assigned or not.

Also keep in mind that these rules apply only to simple variable names (e.g., spam). In Parts V and VI, we’ll see that qualified attribute names (e.g., object.spam) live in particular objects and follow a completely different set of lookup rules than those covered here. References to attribute names following periods (.) search one or more objects, not scopes, and in fact may invoke something called inheritance in Python’s OOP model; more on this in Part VI of this book.

Other Python scopes: Preview

Though obscure at this point in the book, there are technically three more scopes in Python—temporary loop variables in some comprehensions, exception reference variables in some try handlers, and local scopes in class statements. The first two of these are special cases that rarely impact real code, and the third falls under the LEGB umbrella rule.

Most statement blocks and other constructs do not localize the names used within them, with the following version-specific exceptions (whose variables are not available to, but also will not clash with, surrounding code, and which involve topics covered in full later):

§ Comprehension variables—the variable X used to refer to the current iteration item in a comprehension expression such as [X for X in I]. Because they might clash with other names and reflect internal state in generators, in 3.X, such variables are local to the expression itself in all comprehension forms: generator, list, set, and dictionary. In 2.X, they are local to generator expressions and set and dictionary compressions, but not to list comprehensions that map their names to the scope outside the expression. By contrast, for loop statements never localize their variables to the statement block in any Python. See Chapter 20 for more details and examples.

§ Exception variables—the variable X used to reference the raised exception in a try statement handler clause such as except E as X. Because they might defer garbage collection’s memory recovery, in 3.X, such variables are local to that except block, and in fact are removed when the block is exited (even if you’ve used it earlier in your code!). In 2.X, these variables live on after the try statement. See Chapter 34 for additional information.

These contexts augment the LEGB rule, rather than modifying it. Variables assigned in a comprehension, for example, are simply bound to a further nested and special-case scope; other names referenced within these expressions follow the usual LEGB lookup rules.

It’s also worth noting that the class statement we’ll meet in Part VI creates a new local scope too for the names assigned inside the top level of its block. As for def, names assigned inside a class don’t clash with names elsewhere, and follow the LEGB lookup rule, where the classblock is the “L” level. Like modules and imports, these names also morph into class object attributes after the class statements ends.

Unlike functions, though, class names are not created per call: class object calls generate instances, which inherit names assigned in the class and record per-object state as attributes. As we’ll also learn in Chapter 29, although the LEGB rule is used to resolve names used in both the top level of a class itself as well as the top level of method functions nested within it, classes themselves are skipped by scope lookups—their names must be fetched as object attributes. Because Python searches enclosing functions for referenced names, but not enclosing classes, the LEGB rule still applies to OOP code.

Scope Example

Let’s step through a larger example that demonstrates scope ideas. Suppose we wrote the following code in a module file:

# Global scope

X = 99 # X and func assigned in module: global

def func(Y): # Y and Z assigned in function: locals

# Local scope

Z = X + Y # X is a global

return Z

func(1) # func in module: result=100

This module and the function it contains use a number of names to do their business. Using Python’s scope rules, we can classify the names as follows:

Global names: X, func

X is global because it’s assigned at the top level of the module file; it can be referenced inside the function as a simple unqualified variable without being declared global. func is global for the same reason; the def statement assigns a function object to the name func at the top level of the module.

Local names: Y, Z

Y and Z are local to the function (and exist only while the function runs) because they are both assigned values in the function definition: Z by virtue of the = statement, and Y because arguments are always passed by assignment.

The underlying rationale for this name-segregation scheme is that local variables serve as temporary names that you need only while a function is running. For instance, in the preceding example, the argument Y and the addition result Z exist only inside the function; these names don’t interfere with the enclosing module’s namespace (or any other function, for that matter). In fact, local variables are removed from memory when the function call exits, and objects they reference may be garbage-collected if not referenced elsewhere. This is an automatic, internal step, but it helps minimize memory requirements.

The local/global distinction also makes functions easier to understand, as most of the names a function uses appear in the function itself, not at some arbitrary place in a module. Also, because you can be sure that local names will not be changed by some remote function in your program, they tend to make programs easier to debug and modify. Functions are self-contained units of software.

The Built-in Scope

We’ve been talking about the built-in scope in the abstract, but it’s a bit simpler than you may think. Really, the built-in scope is just a built-in module called builtins, but you have to import builtins to query built-ins because the name builtins is not itself built in...

No, I’m serious! The built-in scope is implemented as a standard library module named builtins in 3.X, but that name itself is not placed in the built-in scope, so you have to import it in order to inspect it. Once you do, you can run a dir call to see which names are predefined. In Python 3.3 (see ahead for 2.X usage):

>>> import builtins

>>> dir(builtins)

['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException',

'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning',

...many more names omitted...

'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed',

'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum',

'super', 'tuple', 'type', 'vars', 'zip']

The names in this list constitute the built-in scope in Python; roughly the first half are built-in exceptions, and the second half are built-in functions. Also in this list are the special names None, True, and False, though they are treated as reserved words in 3.X. Because Python automatically searches this module last in its LEGB lookup, you get all the names in this list “for free”—that is, you can use them without importing any modules. Thus, there are really two ways to refer to a built-in function—by taking advantage of the LEGB rule, or by manually importing thebuiltins module:

>>> zip # The normal way

<class 'zip'>

>>> import builtins # The hard way: for customizations

>>> builtins.zip

<class 'zip'>

>>> zip is builtins.zip # Same object, different lookups

True

The second of these approaches is sometimes useful in advanced ways we’ll meet in this chapter’s sidebars.

Redefining built-in names: For better or worse

The careful reader might also notice that because the LEGB lookup procedure takes the first occurrence of a name that it finds, names in the local scope may override variables of the same name in both the global and built-in scopes, and global names may override built-ins. A function can, for instance, create a local variable called open by assigning to it:

def hider():

open = 'spam' # Local variable, hides built-in here

...

open('data.txt') # Error: this no longer opens a file in this scope!

However, this will hide the built-in function called open that lives in the built-in (outer) scope, such that the name open will no longer work within the function to open files—it’s now a string, not the opener function. This isn’t a problem if you don’t need to open files in this function, but triggers an error if you attempt to open through this name.

This can even occur more simply at the interactive prompt, which works as a global, module scope:

>>> open = 99 # Assign in global scope, hides built-in here too

Now, there is nothing inherently wrong with using a built-in name for variables of your own, as long as you don’t need the original built-in version. After all, if these were truly off limits, we would need to memorize the entire built-in names list and treat all its names as reserved. With over 140 names in this module in 3.3, that would be far too restrictive and daunting:

>>> len(dir(builtins)), len([x for x in dir(builtins) if not x.startswith('__')])

(148, 142)

In fact, there are times in advanced programming where you may really want to replace a built-in name by redefining it in your code—to define a custom open that verifies access attempts, for instance (see this chapter’s sidebar Breaking the Universe in Python 2.X for more on this thread).

Still, redefining a built-in name is often a bug, and a nasty one at that, because Python will not issue a warning message about it. Tools like PyChecker (see the Web) can warn you of such mistakes, but knowledge may be your best defense on this point: don’t redefine a built-in name you need. If you accidentally reassign a built-in name at the interactive prompt this way, you can either restart your session or run a del name statement to remove the redefinition from your scope, thereby restoring the original in the built-in scope.

Note that functions can similarly hide global variables of the same name with locals, but this is more broadly useful, and in fact is much of the point of local scopes—because they minimize the potential for name clashes, your functions are self-contained namespace scopes:

X = 88 # Global X

def func():

X = 99 # Local X: hides global, but we want this here

func()

print(X) # Prints 88: unchanged

Here, the assignment within the function creates a local X that is a completely different variable from the global X in the module outside the function. As one consequence, though, there is no way to change a name outside a function without adding a global (or nonlocal) declaration to thedef, as described in the next section.

NOTE

Version skew note: Actually, the tongue twisting gets a bit worse. The Python 3.X builtins module used here is named __builtin__ in Python 2.X. In addition, the name __builtins__ (with the s) is preset in most global scopes, including the interactive session, to reference the module known as builtins in 3.X and __builtin__ in 2.X, so you can often use __builtins__without an import but cannot run an import on that name itself—it’s a preset variable, not a module’s name.

That is, in 3.X builtins is __builtins__ is True after you import builtins, and in 2.X __builtin__ is __builtins__ is True after you import __builtin__. The upshot is that we can usually inspect the built-in scope by simply running dir(__builtins__) with no import in both 3.X and 2.X, but we are advised to use builtins for real work and customization in 3.X, and __builtin__ for the same in 2.X. Who said documenting this stuff was easy?

BREAKING THE UNIVERSE IN PYTHON 2.X

Here’s another thing you can do in Python that you probably shouldn’t—because the names True and False in 2.X are just variables in the built-in scope and are not reserved, it’s possible to reassign them with a statement like True = False. Don’t worry: you won’t actually break the logical consistency of the universe in so doing! This statement merely redefines the word True for the single scope in which it appears to return False. All other scopes still find the originals in the built-in scope.

For more fun, though, in Python 2.X you could say __builtin__.True = False, to reset True to False for the entire Python process. This works because there is only one built-in scope module in a program, shared by all its clients. Alas, this type of assignment has been disallowed in Python 3.X, because True and False are treated as actual reserved words, just like None. In 2.X, though, it sends IDLE into a strange panic state that resets the user code process (in other words, don’t try this at home, kids).

This technique can be useful, however, both to illustrate the underlying namespace model, and for tool writers who must change built-ins such as open to customized functions. By reassigning a function’s name in the built-in scope, you reset it to your customization for every module in the process. If you do, you’ll probably also need to remember the original version to call from your customization—in fact, we’ll see one way to achieve this for a custom open in the sidebar Why You Will Care: Customizing open after we’ve had a chance to explore nested scope closures and state retention options.

Also, note again that third-party tools such as PyChecker, and others such as PyLint, will warn about common programming mistakes, including accidental assignment to built-in names (this is usually known as “shadowing” a built-in in such tools). It’s not a bad idea to run your first few Python programs through tools like these to see what they point out.


[35] The scope lookup rule was called the “LGB rule” in the first edition of this book. The enclosing def “E” layer was added later in Python to obviate the task of passing in enclosing scope names explicitly with default arguments—a topic usually of marginal interest to Python beginners that we’ll defer until later in this chapter. Since this scope is now addressed by the nonlocal statement in Python 3.X, the lookup rule might be better named “LNGB” today, but backward compatibility matters in books, too. The present form of this acronym also does not account for the newer obscure scopes of some comprehensions and exception handlers, but acronyms longer than four letters tend to defeat their purpose!

The global Statement

The global statement and its nonlocal 3.X cousin are the only things that are remotely like declaration statements in Python. They are not type or size declarations, though; they are namespace declarations. The global statement tells Python that a function plans to change one or more global names—that is, names that live in the enclosing module’s scope (namespace).

We’ve talked about global in passing already. Here’s a summary:

§ Global names are variables assigned at the top level of the enclosing module file.

§ Global names must be declared only if they are assigned within a function.

§ Global names may be referenced within a function without being declared.

In other words, global allows us to change names that live outside a def at the top level of a module file. As we’ll see later, the nonlocal statement is almost identical but applies to names in the enclosing def’s local scope, rather than names in the enclosing module.

The global statement consists of the keyword global, followed by one or more names separated by commas. All the listed names will be mapped to the enclosing module’s scope when assigned or referenced within the function body. For instance:

X = 88 # Global X

def func():

global X

X = 99 # Global X: outside def

func()

print(X) # Prints 99

We’ve added a global declaration to the example here, such that the X inside the def now refers to the X outside the def; they are the same variable this time, so changing X inside the function changes the X outside it. Here is a slightly more involved example of global at work:

y, z = 1, 2 # Global variables in module

def all_global():

global x # Declare globals assigned

x = y + z # No need to declare y, z: LEGB rule

Here, x, y, and z are all globals inside the function all_global. y and z are global because they aren’t assigned in the function; x is global because it was listed in a global statement to map it to the module’s scope explicitly. Without the global here, x would be considered local by virtue of the assignment.

Notice that y and z are not declared global; Python’s LEGB lookup rule finds them in the module automatically. Also, notice that x does not even exist in the enclosing module before the function runs; in this case, the first assignment in the function creates x in the module.

Program Design: Minimize Global Variables

Functions in general, and global variables in particular, raise some larger design questions. How should our functions communicate? Although some of these will become more apparent when you begin writing larger functions of your own, a few guidelines up front might spare you from problems later. In general, functions should rely on arguments and return values instead of globals, but I need to explain why.

By default, names assigned in functions are locals, so if you want to change names outside functions you have to write extra code (e.g., global statements). This is deliberate—as is common in Python, you have to say more to do the potentially “wrong” thing. Although there are times when globals are useful, variables assigned in a def are local by default because that is normally the best policy. Changing globals can lead to well-known software engineering problems: because the variables’ values are dependent on the order of calls to arbitrarily distant functions, programs can become difficult to debug, or to understand at all.

Consider this module file, for example, which is presumably imported and used elsewhere:

X = 99

def func1():

global X

X = 88

def func2():

global X

X = 77

Now, imagine that it is your job to modify or reuse this code. What will the value of X be here? Really, that question has no meaning unless it’s qualified with a point of reference in time—the value of X is timing-dependent, as it depends on which function was called last (something we can’t tell from this file alone).

The net effect is that to understand this code, you have to trace the flow of control through the entire program. And, if you need to reuse or modify the code, you have to keep the entire program in your head all at once. In this case, you can’t really use one of these functions without bringing along the other. They are dependent on—that is, coupled with—the global variable. This is the problem with globals: they generally make code more difficult to understand and reuse than code consisting of self-contained functions that rely on locals.

On the other hand, short of using tools like nested scope closures or object-oriented programming with classes, global variables are probably the most straightforward way in Python to retain shared state information—information that a function needs to remember for use the next time it is called. Local variables disappear when the function returns, but globals do not. As we’ll see later, other techniques can achieve this, too, and allow for multiple copies of the retained information, but they are generally more complex than pushing values out to the global scope for retention in simple use cases where this applies.

Moreover, some programs designate a single module to collect globals; as long as this is expected, it is not as harmful. Programs that use multithreading to do parallel processing in Python also commonly depend on global variables—they become shared memory between functions running in parallel threads, and so act as a communication device.[36]

For now, though, especially if you are relatively new to programming, avoid the temptation to use globals whenever you can—they tend to make programs difficult to understand and reuse, and won’t work for cases where one copy of saved data is not enough. Try to communicate with passed-in arguments and return values instead. Six months from now, both you and your coworkers may be happy you did.

Program Design: Minimize Cross-File Changes

Here’s another scope-related design issue: although we can change variables in another file directly, we usually shouldn’t. Module files were introduced in Chapter 3 and are covered in more depth in the next part of this book. To illustrate their relationship to scopes, consider these two module files:

# first.py

X = 99 # This code doesn't know about second.py

# second.py

import first

print(first.X) # OK: references a name in another file

first.X = 88 # But changing it can be too subtle and implicit

The first defines a variable X, which the second prints and then changes by assignment. Notice that we must import the first module into the second file to get to its variable at all—as we’ve learned, each module is a self-contained namespace (package of variables), and we must import one module to see inside it from another. That’s the main point about modules: by segregating variables on a per-file basis, they avoid name collisions across files, in much the same way that local variables avoid name clashes across functions.

Really, though, in terms of this chapter’s topic, the global scope of a module file becomes the attribute namespace of the module object once it is imported—importers automatically have access to all of the file’s global variables, because a file’s global scope morphs into an object’s attribute namespace when it is imported.

After importing the first module, the second module prints its variable and then assigns it a new value. Referencing the module’s variable to print it is fine—this is how modules are linked together into a larger system normally. The problem with the assignment to first.X, however, is that it is far too implicit: whoever’s charged with maintaining or reusing the first module probably has no clue that some arbitrarily far-removed module on the import chain can change X out from under him or her at runtime. In fact, the second module may be in a completely different directory, and so difficult to notice at all.

Although such cross-file variable changes are always possible in Python, they are usually much more subtle than you will want. Again, this sets up too strong a coupling between the two files—because they are both dependent on the value of the variable X, it’s difficult to understand or reuse one file without the other. Such implicit cross-file dependencies can lead to inflexible code at best, and outright bugs at worst.

Here again, the best prescription is generally to not do this—the best way to communicate across file boundaries is to call functions, passing in arguments and getting back return values. In this specific case, we would probably be better off coding an accessor function to manage the change:

# first.py

X = 99

def setX(new): # Accessor make external changes explit

global X # And can manage access in a single place

X = new

# second.py

import first

first.setX(88) # Call the function instead of changing directly

This requires more code and may seem like a trivial change, but it makes a huge difference in terms of readability and maintainability—when a person reading the first module by itself sees a function, that person will know that it is a point of interface and will expect the change to the X. In other words, it removes the element of surprise that is rarely a good thing in software projects. Although we cannot prevent cross-file changes from happening, common sense dictates that they should be minimized unless widely accepted across the program.

NOTE

When we meet classes in Part VI, we’ll see similar techniques for coding attribute accessors. Unlike modules, classes can also intercept attribute fetches automatically with operator overloading, even when accessors aren’t used by their clients.

Other Ways to Access Globals

Interestingly, because global-scope variables morph into the attributes of a loaded module object, we can emulate the global statement by importing the enclosing module and assigning to its attributes, as in the following example module file. Code in this file imports the enclosing module, first by name, and then by indexing the sys.modules loaded modules table (more on this table in Chapter 22 and Chapter 25):

# thismod.py

var = 99 # Global variable == module attribute

def local():

var = 0 # Change local var

def glob1():

global var # Declare global (normal)

var += 1 # Change global var

def glob2():

var = 0 # Change local var

import thismod # Import myself

thismod.var += 1 # Change global var

def glob3():

var = 0 # Change local var

import sys # Import system table

glob = sys.modules['thismod'] # Get module object (or use __name__)

glob.var += 1 # Change global var

def test():

print(var)

local(); glob1(); glob2(); glob3()

print(var)

When run, this adds 3 to the global variable (only the first function does not impact it):

>>> import thismod

>>> thismod.test()

99

102

>>> thismod.var

102

This works, and it illustrates the equivalence of globals to module attributes, but it’s much more work than using the global statement to make your intentions explicit.

As we’ve seen, global allows us to change names in a module outside a function. It has a close relative named nonlocal that can be used to change names in enclosing functions, too—but to understand how that can be useful, we first need to explore enclosing functions in general.


[36] Multithreading runs function calls in parallel with the rest of the program and is supported by Python’s standard library modules _thread, threading, and queue (thread, threading, and Queue in Python 2.X). Because all threaded functions run in the same process, global scopes often serve as one form of shared memory between them (threads may share both names in global scopes, as well as objects in a process’s memory space). Threading is commonly used for long-running tasks in GUIs, to implement nonblocking operations in general and to maximize CPU capacity. It is also beyond this book’s scope; see the Python library manual, as well as the follow-up texts listed in the preface (such as O’Reilly’s Programming Python), for more details.

Scopes and Nested Functions

So far, I’ve omitted one part of Python’s scope rules on purpose, because it’s relatively uncommon to encounter it in practice. However, it’s time to take a deeper look at the letter E in the LEGB lookup rule. The E layer was added in Python 2.2; it takes the form of the local scopes of any and all enclosing function’s local scopes. Enclosing scopes are sometimes also called statically nested scopes. Really, the nesting is a lexical one—nested scopes correspond to physically and syntactically nested code structures in your program’s source code text.

Nested Scope Details

With the addition of nested function scopes, variable lookup rules become slightly more complex. Within a function:

§ A reference (X) looks for the name X first in the current local scope (function); then in the local scopes of any lexically enclosing functions in your source code, from inner to outer; then in the current global scope (the module file); and finally in the built-in scope (the modulebuiltins). global declarations make the search begin in the global (module file) scope instead.

§ An assignment (X = value) creates or changes the name X in the current local scope, by default. If X is declared global within the function, the assignment creates or changes the name X in the enclosing module’s scope instead. If, on the other hand, X is declared nonlocal within the function in 3.X (only), the assignment changes the name X in the closest enclosing function’s local scope.

Notice that the global declaration still maps variables to the enclosing module. When nested functions are present, variables in enclosing functions may be referenced, but they require 3.X nonlocal declarations to be changed.

Nested Scope Examples

To clarify the prior section’s points, let’s illustrate with some real code. Here is what an enclosing function scope looks like (type this into a script file or at the interactive prompt to run it live):

X = 99 # Global scope name: not used

def f1():

X = 88 # Enclosing def local

def f2():

print(X) # Reference made in nested def

f2()

f1() # Prints 88: enclosing def local

First off, this is legal Python code: the def is simply an executable statement, which can appear anywhere any other statement can—including nested in another def. Here, the nested def runs while a call to the function f1 is running; it generates a function and assigns it to the name f2, a local variable within f1’s local scope. In a sense, f2 is a temporary function that lives only during the execution of (and is visible only to code in) the enclosing f1.

But notice what happens inside f2: when it prints the variable X, it refers to the X that lives in the enclosing f1 function’s local scope. Because functions can access names in all physically enclosing def statements, the X in f2 is automatically mapped to the X in f1, by the LEGB lookup rule.

This enclosing scope lookup works even if the enclosing function has already returned. For example, the following code defines a function that makes and returns another function, and represents a more common usage pattern:

def f1():

X = 88

def f2():

print(X) # Remembers X in enclosing def scope

return f2 # Return f2 but don't call it

action = f1() # Make, return function

action() # Call it now: prints 88

In this code, the call to action is really running the function we named f2 when f1 ran. This works because functions are objects in Python like everything else, and can be passed back as return values from other functions. Most importantly, f2 remembers the enclosing scope’s X in f1, even though f1 is no longer active—which leads us to the next topic.

Factory Functions: Closures

Depending on whom you ask, this sort of behavior is also sometimes called a closure or a factory function—the former describing a functional programming technique, and the latter denoting a design pattern. Whatever the label, the function object in question remembers values in enclosing scopes regardless of whether those scopes are still present in memory. In effect, they have attached packets of memory (a.k.a. state retention), which are local to each copy of the nested function created, and often provide a simple alternative to classes in this role.

A simple function factory

Factory functions (a.k.a. closures) are sometimes used by programs that need to generate event handlers on the fly in response to conditions at runtime. For instance, imagine a GUI that must define actions according to user inputs that cannot be anticipated when the GUI is built. In such cases, we need a function that creates and returns another function, with information that may vary per function made.

To illustrate this in simple terms, consider the following function, typed at the interactive prompt (and shown here without the “...” continuation-line prompts, per the presentation note ahead):

>>> def maker(N):

def action(X): # Make and return action

return X ** N # action retains N from enclosing scope

return action

This defines an outer function that simply generates and returns a nested function, without calling it—maker makes action, but simply returns action without running it. If we call the outer function:

>>> f = maker(2) # Pass 2 to argument N

>>> f

<function maker.<locals>.action at 0x0000000002A4A158>

what we get back is a reference to the generated nested function—the one created when the nested def runs. If we now call what we got back from the outer function:

>>> f(3) # Pass 3 to X, N remembers 2: 3 ** 2

9

>>> f(4) # 4 ** 2

16

we invoke the nested function—the one called action within maker. In other words, we’re calling the nested function that maker created and passed back.

Perhaps the most unusual part of this, though, is that the nested function remembers integer 2, the value of the variable N in maker, even though maker has returned and exited by the time we call action. In effect, N from the enclosing local scope is retained as state information attached to the generated action, which is why we get back its argument squared when it is later called.

Just as important, if we now call the outer function again, we get back a new nested function with different state information attached. That is, we get the argument cubed instead of squared when calling the new function, but the original still squares as before:

>>> g = maker(3) # g remembers 3, f remembers 2

>>> g(4) # 4 ** 3

64

>>> f(4) # 4 ** 2

16

This works because each call to a factory function like this gets its own set of state information. In our case, the function we assign to name g remembers 3, and f remembers 2, because each has its own state information retained by the variable N in maker.

This is a somewhat advanced technique that you may not see very often in most code, and may be popular among programmers with backgrounds in functional programming languages. On the other hand, enclosing scopes are often employed by the lambda function-creation expressions we’ll expand on later in this chapter—because they are expressions, they are almost always nested within a def. For example, a lambda would serve in place of a def in our example:

>>> def maker(N):

return lambda X: X ** N # lambda functions retain state too

>>> h = maker(3)

>>> h(4) # 4 ** 3 again

64

For a more tangible example of closures at work, see the upcoming sidebar Why You Will Care: Customizing open. It uses similar techniques to store information for later use in an enclosing scope.

NOTE

Presentation note: In this chapter, I’ve started listing interactive examples without the “...” continuation-line prompts that may or may not appear in your interface (they do at the shell, but not in IDLE). This convention will be followed from this point on to make larger code examples a bit easier to cut and paste from an ebook or other. I’m assuming that by now you understand indentation rules and have had your fair share of typing Python code, and some functions and classes ahead may be too large for rote input.

I’m also listing more and more code alone or in files, and switching between these and interactive input arbitrarily; when you see a “>>>” prompt, the code is typed interactively, and can generally be cut and pasted into your Python shell if you omit the “>>>” itself. If this fails, you can still run by pasting line by line, or editing in a file.

Closures versus classes, round 1

To some, classes, described in full in Part VI of this book, may seem better at state retention like this, because they make their memory more explicit with attribute assignments. Classes also directly support additional tools that closure functions do not, such as customization by inheritance and operator overloading, and more naturally implement multiple behaviors in the form of methods. Because of such distinctions, classes may be better at implementing more complete objects.

Still, closure functions often provide a lighter-weight and viable alternative when retaining state is the only goal. They provide for per-call localized storage for data required by a single nested function. This is especially true when we add the 3.X nonlocal statement described ahead to allow enclosing scope state changes (in 2.X, enclosing scopes are read-only, and so have more limited uses).

From a broader perspective, there are multiple ways for Python functions to retain state between calls. Although the values of normal local variables go away when a function returns, values can be retained from call to call in global variables; in class instance attributes; in the enclosing scope references we’ve met here; and in argument defaults and function attributes. Some might include mutable default arguments to this list too (though others may wish they didn’t).

We’ll preview class-based alternatives and meet function attributes later in this chapter, and get the full story on arguments and defaults in Chapter 18. To help us judge how defaults compete on state retention, though, the next section gives enough of an introduction to get us started.

NOTE

Closures can also be created when a class is nested in a def: the values of the enclosing function’s local names are retained by references within the class, or one of its method functions. See Chapter 29 for more on nested classes. As we’ll see in later examples (e.g., Chapter 39’s decorators), the outer def in such code serves a similar role: it becomes a class factory, and provides state retention for the nested class.

Retaining Enclosing Scope State with Defaults

In early versions of Python (prior to 2.2), the sort of code in the prior section failed because nested defs did not do anything about scopes—a reference to a variable within f2 in the following would search only the local (f2), then global (the code outside f1), and then built-in scopes. Because it skipped the scopes of enclosing functions, an error would result. To work around this, programmers typically used default argument values to pass in and remember the objects in an enclosing scope:

def f1():

x = 88

def f2(x=x): # Remember enclosing scope X with defaults

print(x)

f2()

f1() # Prints 88

This coding style works in all Python releases, and you’ll still see this pattern in some existing Python code. In fact, it’s still required for loop variables, as we’ll see in a moment, which is why it remains worth studying today. In short, the syntax arg=val in a def header means that the argument arg will default to the value val if no real value is passed to arg in a call. This syntax is used here to explicitly assign enclosing scope state to be retained.

Specifically, in the modified f2 here, the x=x means that the argument x will default to the value of x in the enclosing scope—because the second x is evaluated before Python steps into the nested def, it still refers to the x in f1. In effect, the default argument remembers what x was in f1: the object 88.

That’s fairly complex, and it depends entirely on the timing of default value evaluations. In fact, the nested scope lookup rule was added to Python to make defaults unnecessary for this role—today, Python automatically remembers any values required in the enclosing scope for use in nesteddefs.

Of course, the best prescription for much code is simply to avoid nesting defs within defs, as it will make your programs much simpler—in the Pythonic view, flat is generally better than nested. The following is an equivalent of the prior example that avoids nesting altogether. Notice the forward reference in this code—it’s OK to call a function defined after the function that calls it, as long as the second def runs before the first function is actually called. Code inside a def is never evaluated until the function is actually called:

>>> def f1():

x = 88 # Pass x along instead of nesting

f2(x) # Forward reference OK

>>> def f2(x):

print(x) # Flat is still often better than nested!

>>> f1()

88

If you avoid nesting this way, you can almost forget about the nested scopes concept in Python. On the other hand, the nested functions of closure (factory) functions are fairly common in modern Python code, as are lambda functions—which almost naturally appear nested in defs and often rely on the nested scopes layer, as the next section explains.

Nested scopes, defaults, and lambdas

Although they see increasing use in defs these days, you may be more likely to care about nested function scopes when you start coding or reading lambda expressions. We’ve met lambda briefly and won’t cover it in depth until Chapter 19, but in short, it’s an expression that generates a new function to be called later, much like a def statement. Because it’s an expression, though, it can be used in places that def cannot, such as within list and dictionary literals.

Like a def, a lambda expression also introduces a new local scope for the function it creates. Thanks to the enclosing scopes lookup layer, lambdas can see all the variables that live in the functions in which they are coded. Thus, the following code—a variation on the factory we saw earlier—works, but only because the nested scope rules are applied:

def func():

x = 4

action = (lambda n: x ** n) # x remembered from enclosing def

return action

x = func()

print(x(2)) # Prints 16, 4 ** 2

Prior to the introduction of nested function scopes, programmers used defaults to pass values from an enclosing scope into lambdas, just as for defs. For instance, the following works on all Pythons:

def func():

x = 4

action = (lambda n, x=x: x ** n) # Pass x in manually

return action

Because lambdas are expressions, they naturally (and even normally) nest inside enclosing defs. Hence, they were perhaps the biggest initial beneficiaries of the addition of enclosing function scopes in the lookup rules; in most cases, it is no longer necessary to pass values into lambdas with defaults.

Loop variables may require defaults, not scopes

There is one notable exception to the rule I just gave (and a reason why I’ve shown you the otherwise dated default argument technique we just saw): if a lambda or def defined within a function is nested inside a loop, and the nested function references an enclosing scope variable that is changed by that loop, all functions generated within the loop will have the same value—the value the referenced variable had in the last loop iteration. In such cases, you must still use defaults to save the variable’s current value instead.

This may seem a fairly obscure case, but it can come up in practice more often than you may think, especially in code that generates callback handler functions for a number of widgets in a GUI—for instance, handlers for button-clicks for all the buttons in a row. If these are created in a loop, you may need to be careful to save state with defaults, or all your buttons’ callbacks may wind up doing the same thing.

Here’s an illustration of this phenomenon reduced to simple code: the following attempts to build up a list of functions that each remember the current variable i from the enclosing scope:

>>> def makeActions():

acts = []

for i in range(5): # Tries to remember each i

acts.append(lambda x: i ** x) # But all remember same last i!

return acts

>>> acts = makeActions()

>>> acts[0]

<function makeActions.<locals>.<lambda> at 0x0000000002A4A400>

This doesn’t quite work, though—because the enclosing scope variable is looked up when the nested functions are later called, they all effectively remember the same value: the value the loop variable had on the last loop iteration. That is, when we pass a power argument of 2 in each of the following calls, we get back 4 to the power of 2 for each function in the list, because i is the same in all of them—4:

>>> acts[0](2) # All are 4 ** 2, 4=value of last i

16

>>> acts[1](2) # This should be 1 ** 2 (1)

16

>>> acts[2](2) # This should be 2 ** 2 (4)

16

>>> acts[4](2) # Only this should be 4 ** 2 (16)

16

This is the one case where we still have to explicitly retain enclosing scope values with default arguments, rather than enclosing scope references. That is, to make this sort of code work, we must pass in the current value of the enclosing scope’s variable with a default. Because defaults are evaluated when the nested function is created (not when it’s later called), each remembers its own value for i:

>>> def makeActions():

acts = []

for i in range(5): # Use defaults instead

acts.append(lambda x, i=i: i ** x) # Remember current i

return acts

>>> acts = makeActions()

>>> acts[0](2) # 0 ** 2

0

>>> acts[1](2) # 1 ** 2

1

>>> acts[2](2) # 2 ** 2

4

>>> acts[4](2) # 4 ** 2

16

This seems an implementation artifact that is prone to change, and may become more important as you start writing larger programs. We’ll talk more about defaults in Chapter 18 and lambdas in Chapter 19, so you may also want to return and review this section later.[37]

Arbitrary scope nesting

Before ending this discussion, we should note that scopes may nest arbitrarily, but only enclosing function def statements (not classes, described in Part VI) are searched when names are referenced:

>>> def f1():

x = 99

def f2():

def f3():

print(x) # Found in f1's local scope!

f3()

f2()

>>> f1()

99

Python will search the local scopes of all enclosing defs, from inner to outer, after the referencing function’s local scope and before the module’s global scope or built-ins. However, this sort of code is even less likely to pop up in practice. Again, in Python, we say flat is better than nested, and this still holds generally true even with the addition of nested scope closures. Except in limited contexts, your life (and the lives of your coworkers) will generally be better if you minimize nested function definitions.


[37] In the section Function Gotchas, we’ll also see that there is a similar issue with using mutable objects like lists and dictionaries for default arguments (e.g., def f(a=[]))—because defaults are implemented as single objects attached to functions, mutable defaults retain state from call to call, rather then being initialized anew on each call. Depending on whom you ask, this is either considered a feature that supports another way to implement state retention, or a strange corner of the language; more on this at the end of Chapter 21.

The nonlocal Statement in 3.X

In the prior section we explored the way that nested functions can reference variables in an enclosing function’s scope, even if that function has already returned. It turns out that, in Python 3.X (though not in 2.X), we can also change such enclosing scope variables, as long as we declare them in nonlocal statements. With this statement, nested defs can have both read and write access to names in enclosing functions. This makes nested scope closures more useful, by providing changeable state information.

The nonlocal statement is similar in both form and role to global, covered earlier. Like global, nonlocal declares that a name will be changed in an enclosing scope. Unlike global, though, nonlocal applies to a name in an enclosing function’s scope, not the global module scope outside all defs. Also unlike global, nonlocal names must already exist in the enclosing function’s scope when declared—they can exist only in enclosing functions and cannot be created by a first assignment in a nested def.

In other words, nonlocal both allows assignment to names in enclosing function scopes and limits scope lookups for such names to enclosing defs. The net effect is a more direct and reliable implementation of changeable state information, for contexts that do not desire or need classes with attributes, inheritance, and multiple behaviors.

nonlocal Basics

Python 3.X introduces a new nonlocal statement, which has meaning only inside a function:

def func():

nonlocal name1, name2, ... # OK here

>>> nonlocal X

SyntaxError: nonlocal declaration not allowed at module level

This statement allows a nested function to change one or more names defined in a syntactically enclosing function’s scope. In Python 2.X, when one function def is nested in another, the nested function can reference any of the names defined by assignment in the enclosing def’s scope, but it cannot change them. In 3.X, declaring the enclosing scopes’ names in a nonlocal statement enables nested functions to assign and thus change such names as well.

This provides a way for enclosing functions to provide writeable state information, remembered when the nested function is later called. Allowing the state to change makes it more useful to the nested function (imagine a counter in the enclosing scope, for instance). In 2.X, programmers usually achieve similar goals by using classes or other schemes. Because nested functions have become a more common coding pattern for state retention, though, nonlocal makes it more generally applicable.

Besides allowing names in enclosing defs to be changed, the nonlocal statement also forces the issue for references—much like the global statement, nonlocal causes searches for the names listed in the statement to begin in the enclosing defs’ scopes, not in the local scope of the declaring function. That is, nonlocal also means “skip my local scope entirely.”

In fact, the names listed in a nonlocal must have been previously defined in an enclosing def when the nonlocal is reached, or an error is raised. The net effect is much like global: global means the names reside in the enclosing module, and nonlocal means they reside in an enclosing def. nonlocal is even more strict, though—scope search is restricted to only enclosing defs. That is, nonlocal names can appear only in enclosing defs, not in the module’s global scope or built-in scopes outside the defs.

The addition of nonlocal does not alter name reference scope rules in general; they still work as before, per the “LEGB” rule described earlier. The nonlocal statement mostly serves to allow names in enclosing scopes to be changed rather than just referenced. However, both global andnonlocal statements do tighten up and even restrict the lookup rules somewhat, when coded in a function:

§ global makes scope lookup begin in the enclosing module’s scope and allows names there to be assigned. Scope lookup continues on to the built-in scope if the name does not exist in the module, but assignments to global names always create or change them in the module’s scope.

§ nonlocal restricts scope lookup to just enclosing defs, requires that the names already exist there, and allows them to be assigned. Scope lookup does not continue on to the global or built-in scopes.

In Python 2.X, references to enclosing def scope names are allowed, but not assignment. However, you can still use classes with explicit attributes to achieve the same changeable state information effect as nonlocals (and you may be better off doing so in some contexts); globals and function attributes can sometimes accomplish similar goals as well. More on this in a moment; first, let’s turn to some working code to make this more concrete.

nonlocal in Action

On to some examples, all run in 3.X. References to enclosing def scopes work in 3X as they do in 2.X—in the following, tester builds and returns the function nested, to be called later, and the state reference in nested maps the local scope of tester using the normal scope lookup rules:

C:\code> c:\python33\python

>>> def tester(start):

state = start # Referencing nonlocals works normally

def nested(label):

print(label, state) # Remembers state in enclosing scope

return nested

>>> F = tester(0)

>>> F('spam')

spam 0

>>> F('ham')

ham 0

Changing a name in an enclosing def’s scope is not allowed by default, though; this is the normal case in 2.X as well:

>>> def tester(start):

state = start

def nested(label):

print(label, state)

state += 1 # Cannot change by default (never in 2.X)

return nested

>>> F = tester(0)

>>> F('spam')

UnboundLocalError: local variable 'state' referenced before assignment

Using nonlocal for changes

Now, under 3.X, if we declare state in the tester scope as nonlocal within nested, we get to change it inside the nested function, too. This works even though tester has returned and exited by the time we call the returned nested function through the name F:

>>> def tester(start):

state = start # Each call gets its own state

def nested(label):

nonlocal state # Remembers state in enclosing scope

print(label, state)

state += 1 # Allowed to change it if nonlocal

return nested

>>> F = tester(0)

>>> F('spam') # Increments state on each call

spam 0

>>> F('ham')

ham 1

>>> F('eggs')

eggs 2

As usual with enclosing scope references, we can call the tester factory (closure) function multiple times to get multiple copies of its state in memory. The state object in the enclosing scope is essentially attached to the nested function object returned; each call makes a new, distinctstate object, such that updating one function’s state won’t impact the other. The following continues the prior listing’s interaction:

>>> G = tester(42) # Make a new tester that starts at 42

>>> G('spam')

spam 42

>>> G('eggs') # My state information updated to 43

eggs 43

>>> F('bacon') # But F's is where it left off: at 3

bacon 3 # Each call has different state information

In this sense, Python’s nonlocals are more functional than function locals typical in some other languages: in a closure function, nonlocals are per-call, multiple copy data.

Boundary cases

Though useful, nonlocals come with some subtleties to be aware of. First, unlike the global statement, nonlocal names really must have previously been assigned in an enclosing def’s scope when a nonlocal is evaluated, or else you’ll get an error—you cannot create them dynamically by assigning them anew in the enclosing scope. In fact, they are checked at function definition time before either an enclosing or nested function is called:

>>> def tester(start):

def nested(label):

nonlocal state # Nonlocals must already exist in enclosing def!

state = 0

print(label, state)

return nested

SyntaxError: no binding for nonlocal 'state' found

>>> def tester(start):

def nested(label):

global state # Globals don't have to exist yet when declared

state = 0 # This creates the name in the module now

print(label, state)

return nested

>>> F = tester(0)

>>> F('abc')

abc 0

>>> state

0

Second, nonlocal restricts the scope lookup to just enclosing defs; nonlocals are not looked up in the enclosing module’s global scope or the built-in scope outside all defs, even if they are already there:

>>> spam = 99

>>> def tester():

def nested():

nonlocal spam # Must be in a def, not the module!

print('Current=', spam)

spam += 1

return nested

SyntaxError: no binding for nonlocal 'spam' found

These restrictions make sense once you realize that Python would not otherwise generally know which enclosing scope to create a brand-new name in. In the prior listing, should spam be assigned in tester, or the module outside? Because this is ambiguous, Python must resolve nonlocals at function creation time, not function call time.

Why nonlocal? State Retention Options

Given the extra complexity of nested functions, you might wonder what the fuss is about. Although it’s difficult to see in our small examples, state information becomes crucial in many programs. While functions can return results, their local variables won’t normally retain other values that must live on between calls. Moreover, many applications require such values to differ per context of use.

As mentioned earlier, there are a variety of ways to “remember” information across function and method calls in Python. While there are tradeoffs for all, nonlocal does improve this story for enclosing scope references—the nonlocal statement allows multiple copies of changeable state to be retained in memory. It addresses simple state-retention needs where classes may not be warranted and global variables do not apply, though function attributes can often serve similar roles more portably. Let’s review the options to see how they stack up.

State with nonlocal: 3.X only

As we saw in the prior section, the following code allows state to be retained and modified in an enclosing scope. Each call to tester creates a self-contained package of changeable information, whose names do not clash with any other part of the program:

>>> def tester(start):

state = start # Each call gets its own state

def nested(label):

nonlocal state # Remembers state in enclosing scope

print(label, state)

state += 1 # Allowed to change it if nonlocal

return nested

>>> F = tester(0)

>>> F('spam') # State visible within closure only

spam 0

>>> F.state

AttributeError: 'function' object has no attribute 'state'

We need to declare variables nonlocal only if they must be changed (other enclosing scope name references are automatically retained as usual), and nonlocal names are still not visible outside the enclosing function.

Unfortunately, this code works in Python 3.X only. If you are using Python 2.X, other options are available, depending on your goals. The next three sections present some alternatives. Some of the code in these sections uses tools we haven’t covered yet and is intended partially as preview, but we’ll keep the examples simple here so that you can compare and contrast along the way.

State with Globals: A Single Copy Only

One common prescription for achieving the nonlocal effect in 2.X and earlier is to simply move the state out to the global scope (the enclosing module):

>>> def tester(start):

global state # Move it out to the module to change it

state = start # global allows changes in module scope

def nested(label):

global state

print(label, state)

state += 1

return nested

>>> F = tester(0)

>>> F('spam') # Each call increments shared global state

spam 0

>>> F('eggs')

eggs 1

This works in this case, but it requires global declarations in both functions and is prone to name collisions in the global scope (what if “state” is already being used?). A worse, and more subtle, problem is that it only allows for a single shared copy of the state information in the module scope—if we call tester again, we’ll wind up resetting the module’s state variable, such that prior calls will see their state overwritten:

>>> G = tester(42) # Resets state's single copy in global scope

>>> G('toast')

toast 42

>>> G('bacon')

bacon 43

>>> F('ham') # But my counter has been overwritten!

ham 44

As shown earlier, when you are using nonlocal and nested function closures instead of global, each call to tester remembers its own unique copy of the state object.

State with Classes: Explicit Attributes (Preview)

The other prescription for changeable state information in 2.X and earlier is to use classes with attributes to make state information access more explicit than the implicit magic of scope lookup rules. As an added benefit, each instance of a class gets a fresh copy of the state information, as a natural byproduct of Python’s object model. Classes also support inheritance, multiple behaviors, and other tools.

We haven’t explored classes in detail yet, but as a brief preview for comparison, the following is a reformulation of the earlier tester/nested functions as a class, which records state in objects explicitly as they are created. To make sense of this code, you need to know that a def within aclass like this works exactly like a normal def, except that the function’s self argument automatically receives the implied subject of the call (an instance object created by calling the class itself). The function named __init__ is run automatically when the class is called:

>>> class tester: # Class-based alternative (see Part VI)

def __init__(self, start): # On object construction,

self.state = start # save state explicitly in new object

def nested(self, label):

print(label, self.state) # Reference state explicitly

self.state += 1 # Changes are always allowed

>>> F = tester(0) # Create instance, invoke __init__

>>> F.nested('spam') # F is passed to self

spam 0

>>> F.nested('ham')

ham 1

In classes, we save every attribute explicitly, whether it’s changed or just referenced, and they are available outside the class. As for nested functions and nonlocal, the class alternative supports multiple copies of the retained data:

>>> G = tester(42) # Each instance gets new copy of state

>>> G.nested('toast') # Changing one does not impact others

toast 42

>>> G.nested('bacon')

bacon 43

>>> F.nested('eggs') # F's state is where it left off

eggs 2

>>> F.state # State may be accessed outside class

3

With just slightly more magic—which we’ll delve into later in this book—we could also make our class objects look like callable functions using operator overloading. __call__ intercepts direct calls on an instance, so we don’t need to call a named method:

>>> class tester:

def __init__(self, start):

self.state = start

def __call__(self, label): # Intercept direct instance calls

print(label, self.state) # So .nested() not required

self.state += 1

>>> H = tester(99)

>>> H('juice') # Invokes __call__

juice 99

>>> H('pancakes')

pancakes 100

Don’t sweat the details in this code too much at this point in the book; it’s mostly a preview, intended for general comparison to closures only. We’ll explore classes in depth in Part VI, and will look at specific operator overloading tools like __call__ in Chapter 30. The point to notice here is that classes can make state information more obvious, by leveraging explicit attribute assignment instead of implicit scope lookups. In addition, class attributes are always changeable and don’t require a nonlocal statement, and classes are designed to scale up to implementing richer objects with many attributes and behaviors.

While using classes for state information is generally a good rule of thumb to follow, they might also be overkill in cases like this, where state is a single counter. Such trivial state cases are more common than you might think; in such contexts, nested defs are sometimes more lightweight than coding classes, especially if you’re not familiar with OOP yet. Moreover, there are some scenarios in which nested defs may actually work better than classes—stay tuned for the description of method decorators in Chapter 39 for an example that is far beyond this chapter’s already well-stretched scope!

State with Function Attributes: 3.X and 2.X

As a portable and often simpler state-retention option, we can also sometimes achieve the same effect as nonlocals with function attributes—user-defined names attached to functions directly. When you attach user-defined attributes to nested functions generated by enclosing factory functions, they can also serve as per-call, multiple copy, and writeable state, just like nonlocal scope closures and class attributes. Such user-defined attribute names won’t clash with names Python creates itself, and as for nonlocal, need be used only for state variables that must be changed; other scope references are retained and work normally.

Crucially, this scheme is portable—like classes, but unlike nonlocal, function attributes work in both Python 3.X and 2.X. In fact, they’ve been available since 2.1, much longer than 3.X’s nonlocal. Because factory functions make a new function on each call anyhow, this does not require extra objects—the new function’s attributes become per-call state in much the same way as nonlocals, and are similarly associated with the generated function in memory.

Moreover, function attributes allow state variables to be accessed outside the nested function, like class attributes; with nonlocal, state variables can be seen directly only within the nested def. If you need to access a call counter externally, it’s a simple function attribute fetch in this model.

Here’s a final version of our example based on this technique—it replaces a nonlocal with an attribute attached to the nested function. This scheme may not seem as intuitive to some at first glance; you access state though the function’s name instead of as simple variables, and must initialize after the nested def. Still, it’s far more portable, allows state to be accessed externally, and saves a line by not requiring a nonlocal declaration:

>>> def tester(start):

def nested(label):

print(label, nested.state) # nested is in enclosing scope

nested.state += 1 # Change attr, not nested itself

nested.state = start # Initial state after func defined

return nested

>>> F = tester(0)

>>> F('spam') # F is a 'nested' with state attached

spam 0

>>> F('ham')

ham 1

>>> F.state # Can access state outside functions too

2

Because each call to the outer function produces a new nested function object, this scheme supports multiple copy per-call changeable data just like nonlocal closures and classes—a usage mode that global variables cannot provide:

>>> G = tester(42) # G has own state, doesn't overwrite F's

>>> G('eggs')

eggs 42

>>> F('ham')

ham 2

>>> F.state # State is accessible and per-call

3

>>> G.state

43

>>> F is G # Different function objects

False

This code relies on the fact that the function name nested is a local variable in the tester scope enclosing nested; as such, it can be referenced freely inside nested. This code also relies on the fact that changing an object in place is not an assignment to a name; when it incrementsnested.state, it is changing part of the object nested references, not the name nested itself. Because we’re not really assigning a name in the enclosing scope, no nonlocal declaration is required.

Function attributes are supported in both Python 3.X and 2.X; we’ll explore them further in Chapter 19. Importantly, we’ll see there that Python uses naming conventions in both 2.X and 3.X that ensure that the arbitrary names you assign as function attributes won’t clash with names related to internal implementation, making the namespace equivalent to a scope. Subjective factors aside, function attributes’ utility does overlap with the newer nonlocal in 3.X, making the latter technically redundant and far less portable.

State with mutables: Obscure ghost of Pythons past?

On a related note, it’s also possible to change a mutable object in the enclosing scope in 2.X and 3.X without declaring its name nonlocal. The following, for example, works the same as the previous version, is just as portable, and provides changeable per-call state:

def tester(start):

def nested(label):

print(label, state[0]) # Leverage in-place mutable change

state[0] += 1 # Extra syntax, deep magic?

state = [start]

return nested

This leverages the mutability of lists, and like function attributes, relies on the fact that in-place object changes do not classify a name as local. This is perhaps more obscure than either function attributes or 3.X’s nonlocal, though—a technique that predates even function attributes, and seems to lie today somewhere on the spectrum from clever hack to dark magic! You’re probably better off using named function attributes than lists and numeric offsets this way, though this may show up in code you must use.

To summarize: globals, nonlocals, classes, and function attributes all offer changeable state-retention options. Globals support only single-copy shared data; nonlocals can be changed in 3.X only; classes require a basic knowledge of OOP; and both classes and function attributes provide portable solutions that allow state to be accessed directly from outside the stateful callable object itself. As usual, the best tool for your program depends upon your program’s goals.

We’ll revisit all the state options introduced here in Chapter 39 in a more realistic context—decorators, a tool that by nature involves multilevel state retention. State options have additional selection factors (e.g., performance), which we’ll have to leave unexplored here for space (we’ll learn how to time code speed in Chapter 21). For now, it’s time to move on to explore argument passing modes.

WHY YOU WILL CARE: CUSTOMIZING OPEN

For another example of closures at work, consider changing the built-in open call to a custom version, as suggested in this chapter’s earlier sidebar Breaking the Universe in Python 2.X If the custom version needs to call the original, it must save it before changing it, and retain it for later use—a classic state retention scenario. Moreover, if we wish to support multiple customizations to the same function, globals won’t do: we need per-customizer state.

The following, coded for Python 3.X in file makeopen.py, is one way to achieve this (in 2.X, change the built-in scope name and prints). It uses a nested scope closure to remember a value for later use, without relying on global variables—which can clash and allow just one value, and without using a class—that may require more code than is warranted here:

import builtins

def makeopen(id):

original = builtins.open

def custom(*kargs, **pargs):

print('Custom open call %r:' % id , kargs, pargs)

return original(*kargs, **pargs)

builtins.open = custom

To change open for every module in a process, this code reassigns it in the built-in scope to a custom version coded with a nested def, after it saving the original in the enclosing scope so the customization can call it later. This code is also partially preview, as it relies on starred-argument forms to collect and later unpack arbitrary positional and keyword arguments meant for open—a topic coming up in the next chapter. Much of the magic here, though, is nested scope closures: the custom open found by the scope lookup rules retains the original for later use:

>>> F = open('script2.py') # Call built-in open in builtins

>>> F.read()

'import sys\nprint(sys.path)\nx = 2\nprint(x ** 32)\n'

>>> from makeopen import makeopen # Import open resetter function

>>> makeopen('spam') # Custom open calls built-in open

>>> F = open('script2.py') # Call custom open in builtins

Custom open call 'spam': ('script2.py',) {}

>>> F.read()

'import sys\nprint(sys.path)\nx = 2\nprint(x ** 32)\n'

Because each customization remembers the former built-in scope version in its own enclosing scope, they can even be nested naturally in ways that global variables cannot support—each call to the makeopen closure function remembers its own versions of id and original, so multiple customizations may be run:

>>> makeopen('eggs') # Nested customizers work too!

>>> F = open('script2.py') # Because each retains own state

Custom open call 'eggs': ('script2.py',) {}

Custom open call 'spam': ('script2.py',) {}

>>> F.read()

'import sys\nprint(sys.path)\nx = 2\nprint(x ** 32)\n'

As is, our function simply adds possibly nested call tracing to a built-in function, but the general technique may have other applications. A class-based equivalent to this may require more code because it would need to save the id and original values explicitly in object attributes—but requires more background knowledge than we yet have, so consider this a Part VI preview only:

import builtins

class makeopen: # See Part VI: call catches self()

def __init__(self, id):

self.id = id

self.original = builtins.open

builtins.open = self

def __call__(self, *kargs, **pargs):

print('Custom open call %r:' % self.id, kargs, pargs)

return self.original(*kargs, **pargs)

The point to notice here is that classes may be more explicit but also may take extra code when state retention is the only goal. We’ll see additional closure use cases later, especially when exploring decorators in Chapter 39, where we’ll find the closures are actually preferred to classes in certain roles.

Chapter Summary

In this chapter, we studied one of two key concepts related to functions: scopes, which determine how variables are looked up when used. As we learned, variables are considered local to the function definitions in which they are assigned, unless they are specifically declared to be global or nonlocal. We also explored some more advanced scope concepts here, including nested function scopes and function attributes. Finally, we looked at some general design ideas, such as the need to avoid globals and cross-file changes.

In the next chapter, we’re going to continue our function tour with the second key function-related concept: argument passing. As we’ll find, arguments are passed into a function by assignment, but Python also provides tools that allow functions to be flexible in how items are passed. Before we move on, let’s take this chapter’s quiz to review the scope concepts we’ve covered here.

Test Your Knowledge: Quiz

1. What is the output of the following code, and why?

2. >>> X = 'Spam'

3. >>> def func():

4. print(X)

5.

6. >>> func()

7. What is the output of this code, and why?

8. >>> X = 'Spam'

9. >>> def func():

10. X = 'NI!'

11.

12.>>> func()

13.>>> print(X)

14.What does this code print, and why?

15.>>> X = 'Spam'

16.>>> def func():

17. X = 'NI'

18. print(X)

19.

20.>>> func()

21.>>> print(X)

22.What output does this code produce? Why?

23.>>> X = 'Spam'

24.>>> def func():

25. global X

26. X = 'NI'

27.

28.>>> func()

29.>>> print(X)

30.What about this code—what’s the output, and why?

31.>>> X = 'Spam'

32.>>> def func():

33. X = 'NI'

34. def nested():

35. print(X)

36. nested()

37.

38.>>> func()

39.>>> X

40.How about this example: what is its output in Python 3.X, and why?

41.>>> def func():

42. X = 'NI'

43. def nested():

44. nonlocal X

45. X = 'Spam'

46. nested()

47. print(X)

48.

49.>>> func()

50.Name three or more ways to retain state information in a Python function.

Test Your Knowledge: Answers

1. The output here is 'Spam', because the function references a global variable in the enclosing module (because it is not assigned in the function, it is considered global).

2. The output here is 'Spam' again because assigning the variable inside the function makes it a local and effectively hides the global of the same name. The print statement finds the variable unchanged in the global (module) scope.

3. It prints 'NI' on one line and 'Spam' on another, because the reference to the variable within the function finds the assigned local and the reference in the print statement finds the global.

4. This time it just prints 'NI' because the global declaration forces the variable assigned inside the function to refer to the variable in the enclosing global scope.

5. The output in this case is again 'NI' on one line and 'Spam' on another, because the print statement in the nested function finds the name in the enclosing function’s local scope, and the print at the end finds the variable in the global scope.

6. This example prints 'Spam', because the nonlocal statement (available in Python 3.X but not 2.X) means that the assignment to X inside the nested function changes X in the enclosing function’s local scope. Without this statement, this assignment would classify X as local to the nested function, making it a different variable; the code would then print 'NI' instead.

7. Although the values of local variables go away when a function returns, you can make a Python function retain state information by using shared global variables, enclosing function scope references within nested functions, or using default argument values. Function attributes can sometimes allow state to be attached to the function itself, instead of looked up in scopes. Another alternative, using classes and OOP, sometimes supports state retention better than any of the scope-based techniques because it makes it explicit with attribute assignments; we’ll explore this option in Part VI.