Python Unlocked (2015)
Chapter 2. Namespaces and Classes
In the previous chapter, we covered how objects worked. In this chapter, we will explore how objects are made available to code via reference, specifically how namespaces work, what modules are, and how they are imported. We will also cover topics related to classes, such as language protocols, MRO, and abstract classes. We will discuss the following topics:
· Namespaces
· Imports and modules
· Class multiple inheritance, MRO, super
· Protocols
· Abstract classes
How referencing objects work – namespaces
Key 1: Interrelations between objects.
The scope is the visibility of a name within a code block. Namespace is mapping from names to objects. Namespaces are important in order to maintain localization and avoid name collision. Every module has a global namespace. Modules store mapping from variable name to objects in their __dict__ attribute, which is a normal Python dictionary along with information to reload it, package information, and so on.
Every module's global namespace has an implicit reference to the built-in module; hence, objects that are in the built-in module are always available. We can also import other modules in the main script. When we use the syntax import module name, a mapping with module name to module object is created in the global namespace of the current module. For import statements with syntax such as import modname as modrename, mapping is created with a new name to module object.
We are always in the __main__ module's global namespace when the program starts, as it is the module that imports all others. When we import a variable from another module, only an entry is created for that variable in the global namespace pointing at the referenced object. Now interestingly, if this variable references a function object, and if this function uses a global variable, then this variable will be searched in the global namespace of the module that the function was defined in, not in the module that we imported this function to. This is possible because functions have the __globals__ attribute that points to its __dict__ modules, or in short, its modules namespace.
All modules that are loaded and referenced are cached in sys.modules. All imported modules are names pointing to objects in sys.modules. Let's define a new module like this with the name new.py:
k = 10
def foo():
print(k)
By importing this module in the interactive session, we can see how global namespaces work. When this module is reloaded, its namespace dictionary is updated, not recreated. Hence, if you attach anything new from the outside of the module to it, it will survive reload:
>>> import importlib
>>> import new
>>> from new import foo
>>> import sys
>>> foo()
10
>>> new.foo()
10
>>> foo.__globals__ is sys.modules['new'].__dict__ # dictionary used by namespace and function attribute __globals__ is indeed same
True
>>> foo.__globals__['k'] = 20 # changing global namespace dictionary
>>> new.do #attribute is not defined in the module
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'new' has no attribute 'do'
>>> foo.__globals__['do'] = 22 #we are attaching attribute to module from outside the module
>>> new.do
22
>>> foo() # we get updated value for global variable
20
>>> new.foo()
20
>>> importlib.reload(new) #reload repopulates old modules dictionary
<module 'new' from '/tmp/new.py'>
>>> new.do #it didn't got updated as it was populated from outside.
22
>>> new.foo() #variables updated by execution of code in module are updated
10
>>>
If we use the functions that are defined in different modules to compose a class on runtime, such as using metaclasses, or class decorators, this can bring up surprises as each function could be using a different global namespace.
Locals are simple and they work in the way that you expect. Each function call gets its own copy of variables. Nonlocal variables make variables that are defined in the outer scope (not global namespace) accessible to the current code block. In the following code example, we can see how variables can be referenced in enclosed functions.
Code blocks are able to reference variables that are defined in enclosing scopes. Hence, if a variable is not defined in a function but in an enclosing function, we are able to get its value. If, after referencing a variable in an outer scope, we assign a value to this variable in a code block, it will confuse the interpreter in finding the right variable, and we will get the value from the current local scope. If we assign a value to the variable, it defaults to the local variable. We can specify that we want to work with an enclosing variable using a nonlocal keyword:
>>> #variable in enclosing scope can be referenced any level deep
...
>>> def f1():
... v1 = "ohm"
... def f2():
... print("f2",v1)
... def f3():
... print("f3",v1)
... f3()
... f2()
...
>>> f1()
f2 ohm
f3 ohm
>>>
>>> #variable can be made non-local (variable in outer scopes) skipping one level of enclosing scope
...
>>> def f1():
... v1 = "ohm"
... def f2():
... print("f2",v1)
... def f3():
... nonlocal v1
... v1 = "mho"
... print("f3",v1)
... f3()
... print("f2",v1)
... f2()
... print("f1",v1)
...
>>> f1()
f2 ohm
f3 mho
f2 mho
f1 mho
>>>
>>>
>>> #global can be specified at any level of enclosed function
...
>>> v2 = "joule"
>>>
>>> def f1():
... def f2():
... def f3():
... global v2
... v2 = "mho"
... print("f3",v2)
... f3()
... print("f2",v2)
... f2()
... print("f1",v2)
...
>>> f1()
f3 mho
f2 mho
f1 mho
As variables are searched without any dictionary lookup for the local namespace, it is faster to look up variables inside a function with a small number of variables than to search in a global namespace. On similar lines, we will get a little speed boost if we pull objects that are referenced in loops in a function's local namespace inside a function block:
In [6]: def fun():
...: localsum = sum
...: return localsum(localsum((a,a+1)) for a in range(1000))
...:
In [8]: def fun2():
...: return sum(sum((a,a+1)) for a in range(1000))
...:
In [9]: %timeit fun2()
1000 loops, best of 3: 1.07 ms per loop
In [11]: %timeit fun()
1000 loops, best of 3: 983 µs per loop
Functions with state – closures
Key 2: Creating cheap state-remembering functions.
A closure is a function that has access to variables in an enclosing scope, which has completed its execution. This means that referenced objects are kept alive until the function is in memory. The main utility of such a setup is to easily retain some state, or to create specialized functions whose functioning depends on the initial setup:
>>> def getformatter(start,end):
... def formatter(istr):
... print("%s%s%s"%(start,istr,end))
... return formatter
...
>>> formatter1 = getformatter("<",">")
>>> formatter2 = getformatter("[","]")
>>>
>>> formatter1("hello")
<hello>
>>> formatter2("hello")
[hello]
>>> formatter1.__closure__[0].cell_contents
'>'
>>> formatter1.__closure__[1].cell_contents
'<'
We can do the same by creating a class and using the instance object to save state. The benefit with closures is that variables are stored in a __closure__ tuple, and hence, they are fast to access. Less code is required to create a closure as compared to classes:
>>> def formatter(st,en):
... def fmt(inp):
... return "%s%s%s"%(st,inp,en)
... return fmt
...
>>> fmt1 = formatter("<",">")
>>> fmt1("hello")
'<hello>'
>>> timeit.timeit(stmt="fmt1('hello')",
... number=1000000,globals={'fmt1':fmt1})
0.3326794120075647
>>> class Formatter:
... def __init__(self,st,en):
... self.st = st
... self.en = en
... def __call__(self, inp):
... return "%s%s%s"%(self.st,inp,self.en)
...
>>> fmt2 = Formatter("<",">")
>>> fmt2("hello")
'<hello>'
>>> timeit.timeit(stmt="fmt2('hello')",
... number=1000000,globals={'fmt2':fmt2})
0.5502702980011236
One such function is available from the standard library, named partial, that makes use of closure to create a new function that is always invoked with some predefined arguments:
>>> import functools
>>>
>>> def foo(*args,**kwargs):
... print("foo with",args,kwargs)
...
>>> pfoo = functools.partial(foo,10,20,v1=23)
>>>
>>> foo(1,2,3,array=1)
foo with (1, 2, 3) {'array': 1}
>>> pfoo()
foo with (10, 20) {'v1': 23}
>>> pfoo(30,40,array=12)
foo with (10, 20, 30, 40) {'v1': 23, 'array': 12}
Understanding import and modules
Key 3: Creating a custom loader for modules.
Import statements get references of other module objects in the current module's namespace. It consists of searching the module, executing code to create a module object, updating caches (sys.modules), updating modules namespace, and creating a reference to new module being imported.
The built-in __import__ function searches and executes the module to create a module object. The importlib library has the implementation, and it also provides a customizable interface to the import mechanism. Various classes interact to get the job done. The__import__ function should return a module object. For example, in the following example, we are creating a module finder, which checks for modules in any path that is given as an argument during construction. Here, an empty file named names.py should be present at the given path. We have loaded the module, then inserted its module object in sys.modules and added a function to this module's global namespace:
import os
import sys
class Spec:
def __init__(self,name,loader,file='None',path=None,
cached=None,parent=None,has_location=False):
self.name = name
self.loader = loader
self.origin = file
self.submodule_search_locations = path
self.cached = cached
self.has_location = has_location
class Finder:
def __init__(self, path):
self.path = path
def find_spec(self,name,path,target):
print("find spec name:%s path:%s target:%s"%(name,path,target))
return Spec(name,self,path)
def load_module(self, fullname):
print("loading module",fullname)
if fullname+'.py' in os.listdir(self.path):
import builtins
mod = type(os)
modobject = mod(fullname)
modobject.__builtins__ = builtins
def foo():
print("hii i am foo")
modobject.__dict__['too'] = foo
sys.modules[fullname] = modobject
modobject.__spec__ = 'asdfasfsadfsd'
modobject.__name__ = fullname
modobject.__file__ = 'aruns file'
return modobject
sys.meta_path.append(Finder(r'/tmp'))
import notes
notes.too()
Output:
find spec name:notes path:None target:None
loading module notes
hii i am foo
Customizing imports
If the module has an __all__ attribute, only the names that are specified by the iterable in this attribute will be imported from module import *. Let's assume that we created a module named mymod.py, as follows:
__all__ = ('hulk','k')
k = 10
def hulk():
print("i am hulk")
def spidey():
print("i am spidey")
We will not be able to import spidey from mymod as it is not included in __all__:
>>> from mymod import *
>>>
>>> hulk()
i am hulk
>>> k
10
>>> spidey()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'spidey' is not defined
Class inheritance
We already discussed how instances and classes are created. We also discussed how attributes are accessed in a class. Let's dive deeper into how this works for multiple base classes. As type is searched for the presence of an attribute for an instance, if the type inherits from a number of classes, they all are searched as well. There is a defined pattern to this (Method Resolution Order (MRO)). This order plays an important role in determining the method in cases of multiple inheritance and diamond-shaped inheritance.
Method resolution order
Key 4: Understanding MRO.
The methods are searched in the base classes of a class in predefined manner. This sequence or order is known as method resolution order. In Python 3, when an attribute is not found in a class, it is searched in all the base classes of that class. If the attribute is still not found, the base classes of the base classes are searched. This process goes on until we exhaust all base classes. This is similar to how if we have to ask a question, we will first go to our parents and then to uncles, and aunts (the same level base classes). If we still do not get an answer, we will approach grandparents. The following code snippet shows this sequence:
>>> class GrandParent:
... def do(self,):
... print("Grandparent do called")
...
>>> class Father(GrandParent):
... def do(self,):
... print("Father do called")
...
>>> class Mother(GrandParent):
... def do(self,):
... print("Mother do called")
...
>>> class Child(Father, Mother):
... def do(self,):
... print("Child do called")
...
>>> c = Child() # calls method in Class
>>> c.do()
Child do called
>>> del Child.do # if method is not defined it is searched in bases
>>> c.do() #Father's method
Father do called
>>> c.__class__.__bases__ = (c.__class__.__bases__[1],c.__class__.__bases__[0]) #we swap bases order
>>> c.do() #Mothers's method
Mother do called
>>> del Mother.do
>>> c.do() #Fathers' method
Father do called
>>> del Father.do
>>> c.do()
Grandparent do called
Super's superpowers
Key 6: Get superclass's methods without a superclass definition.
We mostly create subclasses to specialize methods or add a new functionality. We may need to add some feature, which is 80% the same as one in the base class. Then it will be natural to call base class's method for that portion of functionality and add extra functionality in new method in the subclass. To call a method in superclass, we can either use its class name to access the method, or super it like this:
>>> class GrandParent:
... def do(self,):
... print("Grandparent do called")
...
>>> class Father(GrandParent):
... def do(self,):
... print("Father do called")
...
>>> class Mother(GrandParent):
... def do(self,):
... print("Mother do called")
...
>>> class Child(Father, Mother):
... def do(self,):
... print("Child do called")
...
>>> c = Child()
>>> c.do()
Child do called
>>> class Child(Father, Mother):
... def do(self,):
... print("Child do called")
... super().do()
...
>>> c = Child()
>>> c.do()
Child do called
Father do called
>>> print("Father and child super calling")
Father and child super calling
>>> class Father(GrandParent):
... def do(self,):
... print("Father do called")
... super().do()
...
>>> class Child(Father, Mother):
... def do(self,):
... print("Child do called")
... super().do()
...
>>> c = Child()
>>> c.do()
Child do called
Father do called
Mother do called
>>> print("Father and Mother super calling")
Father and Mother super calling
>>> class Mother(GrandParent):
... def do(self,):
... print("Mother do called")
... super().do()
...
>>> class Father(GrandParent):
... def do(self,):
... print("Father do called")
... super().do()
...
>>> class Child(Father, Mother):
... def do(self,):
... print("Child do called")
... super().do()
...
>>> c = Child()
>>> c.do()
Child do called
Father do called
Mother do called
Grandparent do called
>>> print(Child.__mro__)
(<class '__main__.Child'>, <class '__main__.Father'>, <class '__main__.Mother'>, <class '__main__.GrandParent'>, <class 'object'>)
Using language protocols in classes
All objects that provide a specific functionality have certain methods that facilitate that behavior, for example, you can create an object of type worker and expect it to have the submit_work(function, kwargs), and is _completed() methods. Now, we can expect all objects that have these methods to be usable as workers in any application portion. Similarly, the Python language has defined some methods that are needed to add a certain functionality to an object. If an object possesses these methods, it has that functionality.
We will discuss two very import protocols: iteration protocol, and context protocol.
Iteration protocol
For iteration protocol, objects must possess the __iter__ method. If the object possesses it, we can use the object anywhere that we use an iterator object. When we are using the iterator object in a for loop or passing it to the iter built-in function, we are calling its__iter__ method. This method returns another or the same object that is responsible for maintaining the index during iteration, and this object that is returned from __iter__ must have a __next__ method that provides the next values in sequence and raisesStopIteration on the finish of this sequence. In the following code snippet, the BooksIterState objects help retain the index that is used for iteration. If the books __iter__ method returned self, then it would be difficult to maintain a state index when the object is accessed from two loops:
>>> class BooksIterState:
... def __init__(self, books):
... self.books = books
... self.index = 0
... def __next__(self,):
... if self.index >= len(self.books._data):
... raise StopIteration
... else:
... tmp = self.books._data[self.index]
... self.index += 1
... return tmp
...
>>> class Books:
... def __init__(self, data):
... self._data = data
... def __iter__(self,):
... return BooksIterState(self)
...
>>> ii = iter(Books(["don quixote","lord of the flies","great expectations"]))
>>> next(ii)
'don quixote'
>>> for i in Books(["don quixote","lord of the flies","great expectations"]):
... print(i)
...
don quixote
lord of the flies
great expectations
>>> next(ii)
'lord of the flies'
>>> next(ii)
'great expectations'
>>> next(ii)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in __next__
StopIteration
>>>
Context manager protocol
The objects providing context for execution are like try finally statements. If an object has the __enter__ and __exit__ methods, then this object can be used as a replacement of try finally statements. The most common uses are releasing locks and resources, or flushing and closing files. In the following example, we are creating a Ctx class to serve as context manager:
>>> class Ctx:
... def __enter__(*args):
... print("entering")
... return "do some work"
... def __exit__(self, exception_type,
... exception_value,
... exception_traceback):
... print("exit")
... if exception_type is not None:
... print("error",exception_type)
... return True
...
>>> with Ctx() as k:
... print(k)
... raise KeyError
...
entering
do some work
exit
error <class 'KeyError'>
We can also use the contextmanager decorator of contextlib to easily create context managers like the one shown in the following code:
>>> import contextlib
>>> @contextlib.contextmanager
... def ctx():
... try:
... print("start")
... yield "so some work"
... except KeyError:
... print("error")
... print("done")
...
>>> with ctx() as k:
... print(k)
... raise KeyError
...
start
so some work
error
done
There are other methods that one should know, such as __str__, __add__, __getitem__, and so on, that define various functionalities of the objects. There is a list of them at the language reference's datamodel.html. You should at least read it once to get to know what methods are available. Here is the link: https://docs.python.org/3/reference/datamodel.html#special-method-names.
Using abstract classes
Key 6: Making interfaces for conformity.
Abstract classes are available via the standard abc library package. They are useful for the definition of interfaces and common functionality. These abstract classes can implement a portion of the interface and make the rest of the API mandatory for subclasses by defining their methods as abstract. Also, classes can be turned into subclasses of the abstract class by simply registering them. These classes are useful to make a set of classes conform to a single interface. Here is how to use them. Here, worker class defines an interface with two methods, do and is_busy, which each type of worker must implement. ApiWorker is the implementation for this interface:
>>> from abc import ABCMeta, abstractmethod
>>> class Worker(metaclass=ABCMeta):
... @abstractmethod
... def do(self, func, args, kwargs):
... """ work on function """
... @abstractmethod
... def is_busy(self,):
... """ tell if busy """
...
>>> class ApiWorker(Worker):
... def __init__(self,):
... self._busy = False
... def do(self, func, args=[], kwargs={}):
... self._busy = True
... res = func(*args, **kwargs)
... self._busy = False
... return res
... def is_busy(self,):
... return self._busy
...
>>> apiworker = ApiWorker()
>>> print(apiworker.do(lambda x: x + 1, (1,)))
2
>>> print(apiworker.is_busy())
False
Summary
Now, we have seen how to manipulate namespaces, and to create custom module-loading classes. We can use multiple inheritance to create mixin classes in which each mixin class provides a new functionality to the subclass. Context manager and iterator protocols are very useful constructs to create clean code. We created abstract classes that can help us in setting up API contracts for classes.
In the next chapter, we will cover the functions and utilities that are available to us from a standard Python installation.