Functions, Iterators, and Generators - Functional Python Programming (2015)

Functional Python Programming (2015)

Chapter 3. Functions, Iterators, and Generators

The core of functional programming is the use of pure functions to map values from the input domain to the output range. A pure function has no side effects, a relatively easy threshold for us to achieve in Python.

Avoiding side effects also means reducing our dependence on variable assignment to maintain the state of our computations. We can't purge the assignment statement from the Python language, but we can reduce our dependence on stateful objects. This means we need to choose among the available Python built-in data structures to select those that don't require stateful operations.

This chapter will present several Python features from a functional viewpoint, as follows:

· Pure Functions, free of side effects

· Functions as objects that can be passed as arguments or returned as results

· The use of Python strings using object-oriented suffix notation and prefix notation

· Using tuples and namedtuples as a way to create stateless objects

· Using iterable collections as our primary design tool for functional programming

We'll look at generators and generator expressions, since these are ways to work with collections of objects. As we noted in Chapter 2, Introducing Some Functional Features, there are some boundary issues while trying to replace all generator expressions with recursions. Python imposes a recursion limit, and doesn't automatically handle TCO: we must optimize recursions manually using a generator expression.

We'll write generator expressions that will perform the following tasks:

· Conversions

· Restructuring

· Complex calculations

We'll make a quick survey of many of the built-in Python collections, and how we can work with collections while pursuing a functional paradigm. This might change our approach to working with lists, dicts, and sets. Writing functional Python encourages us to focus on tuples and immutable collections. In the next chapter, we'll emphasize more functional ways to work with specific kinds of collections.

Writing pure functions

A pure function has no side effects: there are no global changes to variables. If we avoid the global statement, we will almost meet this threshold. We also need to avoid changing the state mutable objects. We'll look at a number of ways of ensuring these two aspects of pure functions. A reference to a value in the Python global using a free variable is something we can rework into a proper parameter. In most cases, it's quite easy.

Here is an example where the usage of the global statement is explained:

def some_function(a, b, t):

return a+b*t+global_adjustment

We can refactor this function to make the global_adjustment variable into a proper parameter. We would need to change each reference to this function, which might have a large ripple effect through a complex application. A global reference will be visible as a free variable in the body of a function. There will be neither a parameter nor an assignment for this variable, making it reasonably clear that it's global.

There are many internal Python objects, which are stateful. Instances of the file class, and all file-like objects, are examples of stateful objects in common use. We observe that the most commonly used stateful objects in Python generally behave as context managers. Not all developers make use of the available context managers but many objects implement the required interface. In a few cases, stateful objects don't completely implement the context manager interface; in these cases, there's often a close() method. We can use the contextlib.closing() function to provide these objects with the proper context manager interface.

We can't easily eliminate all stateful Python objects, except from small programs. Therefore, we must manage state while still exploiting the strengths of functional design. Toward this end, we should always use the with statement to encapsulate stateful file objects into a well-defined scope.

Tip

Always use file objects in a with context.

We should always avoid global file objects, global database connections, and the associated state issues. The global file object is a very common pattern for handling open files. We might have a function as shown in the following command snippet:

def open(iname, oname):

global ifile, ofile

ifile= open(iname, "r")

ofile= open(oname, "w")

Given this context, numerous other functions can use the ifile and ofile variables, hoping they properly refer to the global files, which are left open for the application to use.

This is not a very good design, and we need to avoid it. The files should be proper parameters to functions, and the open files should be nested in a with statement to assure that their stateful behavior is handled properly.

This design pattern also applies to databases. A database connection object can generally be provided as a formal argument to the application's functions. This is contrary to the way some popular web frameworks work that rely on a global database connection in an effort to make the database a transparent feature of the application. Additionally, a multithreaded web server might not benefit from sharing a single database connection. This suggests that there are some benefits of a hybrid approach that uses functional design with a few isolated stateful features.

Functions as first-class objects

It shouldn't come as a surprise that Python functions are first-class objects. In Python, functions are objects with a number of attributes. The reference manual lists a number of special member names that apply to functions. Since functions are objects with attributes, we can extract the docstring function or the name of a function, using special attributes such as __doc__ or __name__. We can also extract the body of the function via the __code__ attribute. In compiled languages, this introspection is relatively complex because of the source information that needs to be retained. In Python, it's quite simple.

We can assign functions to variables, pass functions as arguments, and return functions as values. We can easily use these techniques to write higher-order functions.

Since functions are objects, Python already has many features required to be a functional programming language.

Additionally, a callable object also helps us to create functions, which are first-class objects. We can even consider the callable class definition as a higher-order function. We do need to be judicious in how we use the __init__() method of a callable object; we should avoid setting stateful class variables. One common application is to use an __init__() method to create objects that fit the Strategy design pattern.

A class following the Strategy design pattern depends on another object to provide an algorithm or parts of an algorithm. This allows us to inject algorithmic details at runtime, rather than compiling the details into the class.

Here is an example of a callable object with an embedded Strategy object:

import collections

class Mersenne1(collections.Callable):

def __init__(self, algorithm):

self.pow2= algorithm

def __call__(self, arg):

return self.pow2(arg)-1

This class uses __init__() to save a reference to another function. We're not creating any stateful instance variables.

The function given as a Strategy object must raise 2 to the given power. The three candidate objects that we can plug into this class are as follows:

def shifty(b):

return 1 << b

def multy(b):

if b == 0: return 1

return 2*multy(b-1)

def faster(b):

if b == 0: return 1

if b%2 == 1: return 2*faster(b-1)

t= faster(b//2)

return t*t

The shifty() function raises 2 to the desired power using a left shift of the bits. The multy() function uses a naive recursive multiplication. The faster() function uses a divide and conquer strategy that will perform Functions as first-class objects multiplications instead of b multiplications.

We can create instances of our Mersenne1 class with an embedded strategy algorithm, as follows:

m1s= Mersenne1(shifty)

m1m= Mersenne1(multy)

m1f= Mersenne1(faster)

This shows how we can define alternative functions that produce the same result but use different algorithms.

Tip

Python allows us to compute Functions as first-class objects, since this doesn't even come close to the recursion limits in Python. This is quite a large prime number, with 27 digits.

Using strings

Since Python strings are immutable, they're an excellent example of functional programming objects. A Python string module has a number of methods, all of which produce a new string as the result. These methods are pure functions with no side effects.

The syntax for string method functions is postfix, where most functions are prefix. This means that complex string operations can be hard to read when they're commingled with conventional functions.

When scraping data from a web page, we might have a cleaner function that applies a number of transformations to a string to clean up the punctuation and return a Decimal object for use by the rest of the application. This will involve a mixture of prefix and postfix syntax.

It might look like the following command snippet:

from decimal import *

def clean_decimal(text):

if text is None: return text

try:

return Decimal(text.replace("$", "").replace(",", ""))

except InvalidOperation:

return text

This function does two replacements on the string to remove $ and , string values. The resulting string is used as an argument to the Decimal class constructor, which returns the desired object.

To make this more consistent, we can consider defining our own prefix functions for the string method functions, as follows:

def replace(data, a, b):

return data.replace(a,b)

This can allow us to use Decimal(replace(replace(text, "$", ""), ",", "")) with consistent-looking prefix syntax. In this case, we simply rearrange the existing argument values, allowing us an additional technique. We can do this for trivial cases, such as the follows:

>>> replace=str.replace

>>> replace("$12.45","$","")

12.45

It's not clear if this kind of consistency is a significant improvement over the mixed prefix and postfix notation. The issue with functions of multiple arguments is that the arguments wind up in various places in the expression.

A slightly better approach might be to define a more meaningful prefix function to strip punctuation, such as the following command snippet:

def remove( str, chars ):

if chars: return remove( str.replace(chars[0], ""), chars[1:] )

return str

This function will recursively remove each of the characters from the char variable. We can use it as Decimal(remove(text, "$,")) to make the intent of our string cleanup more clear.

Using tuples and namedtuples

Since Python tuples are immutable objects, they're another excellent example of objects suitable for functional programming. A Python tuple has very few method functions, so almost everything is done through functions using prefix syntax. There are a number of use cases for tuples, particularly when working with list-of-tuple, tuple-of-tuple and generator-of-tuple constructs.

Of course, namedtuples add an essential feature to a tuple: a name that we can use instead of an index. We can exploit namedtuples to create objects that are accretions of data. This allows us to write pure functions based on stateless objects, yet keep data bound into tidy object-like packages.

We'll almost always use tuples (and namedtuples) in the context of a collection of values. If we're working with single values, or a tidy group of exactly two values, we'll usually use named parameters to a function. When working with collections, however, we might need to have iterable-of-tuples or iterable-of-namedtuple constructs.

The decision to use a tuple or namedtuple object is entirely a matter of convenience. We might be working with a sequence of values as a three tuple of the form (number, number, number) assuming that the triple is in red, green, and blue order.

We can use functions to pick a three-tuple apart, as shown in the following command snippet:

red = lambda color: color[0]

green = lambda color: color[1]

blue = lambda color: color[2]

Or, we might introduce the following command line:

Color = namedtuple("Color", ("red", "green", "blue", "name"))

This allows us to use item.red instead of red(item).

The functional programming application of tuples centers on the iterable-of-tuple design pattern. We'll look closely at a few iterable-of-tuple techniques. We'll look at the namedtuple techniques in Chapter 7, Additional Tuple Techniques.

Using generator expressions

We've shown some examples of generator expressions already. We'll show many more later in the chapter. We'll introduce some more sophisticated generator techniques in this section.

We need to mention a small bit of Python syntax here. It's common to see generator expressions used to create the list or dict literals via a list comprehension or a dict comprehension. For our purposes, a list display (or comprehension) is just one use of generator expressions. We can try to make a distinction between generator expressions outside a display and generator expressions inside a display, but there's nothing to be gained by this. The syntax is the same except for the enclosing punctuation and the semantics are indistinguishable.

A display includes the enclosing literal syntax: [x**2 for x in range(10)]; this example is a list comprehension, which creates a list object from the enclosed generator expression. In this section, we're going to focus on the generator expression. We'll occasionally create a display as part of demonstrating how the generator works. Displays have the disadvantage of creating (potentially large) collection objects. A generator expression is lazy and creates objects only as required.

We have to provide two important caveats on generator expressions, as follows:

· Generators appear to be sequence-like except for a function such as the len() function that needs to know the size of the collection.

· Generators can be used only once. After that, they appear empty.

Here is a generator function that we'll use for some examples:

def pfactorsl(x):

if x % 2 == 0:

yield 2

if x//2 > 1:

yield from pfactorsl(x//2)

return

for i in range(3,int(math.sqrt(x)+.5)+1,2):

if x % i == 0:

yield i

if x//i > 1:

yield from pfactorsl(x//i)

return

yield x

We're locating the prime factors of a number. If the number, x, is even, we'll yield 2 and then recursively yield all factors of x÷2.

For odd numbers, we'll step through odd values greater than or equal to 3, to locate a candidate factor of the number. When we locate a factor, we'll yield that factor, i, and then recursively yield all factors of x÷i.

In the event that we can't locate a factor, the number must be prime, so we can yield that.

We handle 2 as a special case to cut the number of iterations in half. All prime numbers, except 2, are odd.

We've used one important for loop in addition to recursion. This allows us to easily handle numbers that have as many as 1,000 factors. This number is at least as large as Using generator expressions, a number with 300 digits. Since the for variable, i, is not used outside the indented body of the loop, the stateful nature of the i variable won't lead to confusion if we make any changes to the body of the loop.

In effect, we've done tail-call optimization, the recursive calls that count from 3 to Using generator expressions. The for loop saves us from deeply recursive calls that test every single number in the range.

The other two for loops exist merely to consume the results of a recursive function that is iterable.

Tip

In a recursive generator function, be careful of the return statement.

Do not use the following command line:

return recursive_iter(args)

It returns only a generator object; it doesn't evaluate the function to return the generated values. Use either of the following:

for result in recursive_iter(args): yield result

OR yield from recursive_iter(args)

As an alternative, the following command is a more purely recursive version:

def pfactorsr(x):

def factor_n(x, n):

if n*n > x:

yield x

return

if x % n == 0:

yield n

if x//n > 1:

yield from factor_n(x//n, n)

else:

yield from factor_n(x, n+2)

if x % 2 == 0:

yield 2

if x//2 > 1:

yield from pfactorsr(x//2)

return

yield from factor_n(x, 3)

We defined an internal recursive function, factor_n(), to test factors, n, in the range Using generator expressions. If the candidate factor, n, is outside the range, then x is prime. Otherwise, we'll see if n is a factor of x. If so, we'll yield n and all factors of Using generator expressions. If n is not a factor, we'll evaluate the function recursively using n+2. This recursion to test each value of Using generator expressions can be optimized into a for loop, as shown in the previous example.

The outer function handles some edge cases. As with other prime-related processing, we handle 2 as a special case. For even numbers, we'll yield 2 and then evaluate pfactorsr() recursively for x÷2. All other prime factors must be odd numbers greater than or equal to 3. We'll evaluate the factors_n() function starting with 3 to test these other candidate prime factors.

Tip

The purely recursive function can only locate prime factors of numbers up to about 4,000,000. Above this, Python's recursion limit will be reached.

Exploring the limitations of generators

We noted that there are some limitations of generator expressions and generator functions. The limitations can be observed by executing the following command snippet:

>>> from ch02_ex4 import *

>>> pfactorsl( 1560 )

<generator object pfactorsl at 0x1007b74b0>

>>> list(pfactorsl(1560))

[2, 2, 2, 3, 5, 13]

>>> len(pfactorsl(1560))

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

TypeError: object of type 'generator' has no len()

In the first example, we saw that generator functions are not strict. They're lazy, and don't have a proper value until we consume the generator functions. This isn't a limitation, per se; this is the whole reason that generator expressions fit with functional programming in Python.

In the second example, we materialized a list object from the generator function. This is handy for seeing the output and writing unit test cases.

In the third example, we saw one limitation of generator functions: there's no len().

The other limitation of generator functions is that they can only be used once. For example, look at the following command snippet:

>>> result= pfactorsl(1560)

>>> sum(result)

27

>>> sum(result)

0

The first evaluation of the sum() method performed evaluation of the generator. The second evaluation of the sum() method found that the generator was now empty. We can only consume the values once.

Generators have a stateful life in Python. While they're very nice for some aspects of functional programming, they're not quite perfect.

We can try to use the itertools.tee() method to overcome the once-only limitation. We'll look at this in depth in Chapter 8, The Itertools Module. Here is a quick example of its usage:

import itertools

def limits(iterable):

max_tee, min_tee = itertools.tee(iterable, 2)

return max(max_tee), min(min_tee)

We created two clones of the parameter generator expression, max_tee() and min_tee(). This leaves the original iterator untouched, a pleasant feature that allows us to do very flexible combinations of functions. We can consume these two clones to get maxima andminima from the iterable.

While appealing, we'll see that this doesn't work out well in the long run. Once consumed, an iterable will not provide any more values. When we want to compute multiple kinds of reductions—for example, sums, counts, minimums, maximums—we need to design with this one-pass-only limitation in mind.

Combining generator expressions

The essence of functional programming comes from the ways we can easily combine generator expressions and generator functions to create very sophisticated composite processing sequences. When working with generator expressions, we can combine generators in several ways.

One common way to combine generator functions is when we create a composite function. We might have a generator that computes (f(x) for x in range()). If we want to compute g(f(x)), we have several ways to combine two generators.

We can tweak the original generator expression as follows:

g_f_x = (g(f(x)) for x in range())

While technically correct, this defeats any idea of reuse. Rather than reusing an expression, we rewrite it.

We can also substitute one expression within another expression, as follows:

g_f_x = (g(y) for y in (f(x) for x in range()))

This has the advantage of allowing us to use simple substitution. We can revise this slightly to emphasize reuse, using the following commands:

f_x= (f(x) for x in range())

g_f_x= (g(y) for y in f_x)

This has the advantage of leaving the initial expression, (f(x) for x in range()), essentially untouched. All we did was assign the expression to a variable.

The resulting composite function is also a generator expression, which is also lazy. This means that extracting the next value from g_f_x will extract one value from f_x, which will extract one value from the source range() function.

Cleaning raw data with generator functions

One of the tasks that arise in exploratory data analysis is cleaning up raw source data. This is often done as a composite operation applying several scalar functions to each piece of input data to create a usable data set.

Let's look at a simplified set of data. This data is commonly used to show techniques in exploratory data analysis. It's called Anscombe's Quartet, and it comes from the article, Graphs in Statistical Analysis, by F. J. Anscombe that appeared in American Statistician in 1973. Following are the first few rows of a downloaded file with this dataset:

Anscombe's quartet

I II III IV

x y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

Sadly, we can't trivially process this with the csv module. We have to do a little bit of parsing to extract the useful information from this file. Since the data is properly tab-delimited, we can use the csv.reader() function to iterate through the various rows. We can define a data iterator as follows:

import csv

def row_iter(source):

return csv.reader(source, delimiter="\t")

We simply wrapped a file in a csv.reader function to create an iterator over rows. We can use this iterator in the following context:

with open("Anscombe.txt") as source:

print( list(row_iter(source)) )

The problem with this is that the first three items in the resulting iterable aren't data. The Anacombe's quartet file looks as follows when opened:

[["Anscombe's quartet"], ['I', 'II', 'III', 'IV'], ['x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'],

We need to filter these rows from the iterable. Here is a function that will neatly excise three expected title rows, and return an iterator over the remaining rows:

def head_split_fixed(row_iter):

title= next(row_iter)

assert len(title) == 1 and title[0] == "Anscombe's quartet"

heading= next(row_iter)

assert len(heading) == 4 and heading == ['I', 'II', 'III', 'IV']

columns= next(row_iter)

assert len(columns) == 8 and columns == ['x', 'y', 'x', 'y', 'x', 'y', 'x', 'y']

return row_iter

This function plucks three rows from the iterable. It asserts that each row has an expected value. If the file doesn't meet these basic expectations, it's a symptom that the file was damaged or perhaps our analysis is focused on the wrong file.

Since both the row_iter() and the head_split_fixed() functions expect an iterable as an argument value, they can be trivially combined as follows:

with open("Anscombe.txt") as source:

print( list(head_split_fixed(row_iter(source))))

We've simply applied one iterator to the results of another iterator. In effect, this defines a composite function. We're not done, of course; we still need to convert the strings values to the float values and we also need to pick apart the four parallel series of data in each row.

The final conversions and data extractions are more easily done with higher-order functions such as map() and filter(). We'll return to those in Chapter 5, Higher-order Functions.

Using lists, dicts, and sets

A Python sequence object, like a list, is iterable. However, it has some additional features. We'll think of it as a materialized iterable. We've used the tuple() function in several examples to collect the output of a generator expression or generator function into a single tuple object. We can also materialize a sequence to create a list object.

In Python, a list display offers simple syntax to materialize a generator: we just add the [] brackets. This is ubiquitous to the point where the distinction between generator expression and list comprehension is a subtlety of little practical importance.

The following is an example to enumerate the cases:

>>> range(10)

range(0, 10)

>>> [range(10)]

[range(0, 10)]

>>> [x for x in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The first example is a generator function.

Tip

The range(10) function is lazy; it won't produce the 10 values until evaluated in a context that iterates through the values.

The second example shows a list composed of a single generator function. To evaluate this, we'll have to use nested loops. Something like this [x for gen in [range(10)] for x in gen].

The third example shows a list comprehension built from a generator expression that includes a generator function. The function, range(10), is evaluated by a generator expression, x for x in range(10). The resulting values are collected into a list object.

We can also use the list() function to build a list from an iterable or a generator expression. This also works for set(), tuple(), and dict().

Tip

The list(range(10)) function evaluated the generator expression. The [range(10)] list literal does not evaluate the generator function.

While there's shorthand syntax for list, dict, and set using [] and {},there's no shorthand syntax for a tuple. To materialize a tuple, we must use the tuple() function. For this reason, it often seems most consistent to use the list(), tuple(), and set() functions as the preferred syntax.

In the data cleansing example, we used a composite function to create a list of four tuples. The function looked as follows:

with open("Anscombe.txt") as source:

data = head_split_fixed(row_iter(source))

print(list(data))

We assigned the results of the composite function to a name, data. The data looks as follows:

[['10.0', '8.04', '10.0', '9.14', '10.0', '7.46', '8.0', '6.58'],

['8.0', '6.95', '8.0', '8.14', '8.0', '6.77', '8.0', '5.76'], ...

['5.0', '5.68', '5.0', '4.74', '5.0', '5.73', '8.0', '6.89']]

We need to do a little bit more processing to make this useful. First, we need to pick pairs of columns from the eight tuple. We can select pair of columns with a function, as shown in the following command snippet:

from collections import namedtuple

Pair = namedtuple("Pair", ("x", "y"))

def series(n, row_iter):

for row in row_iter:

yield Pair(*row[n*2:n*2+2])

This function picks two adjacent columns based on a number between 0 and 3. It creates a namedtuple object from those two columns. This allows us to pick the x or y value from each row.

We can now create a tuple-of-tuples collection as follows:

with open("Anscombe.txt") as source:

data = tuple(head_split_fixed(row_iter(source)))

sample_I= tuple(series(0,data))

sample_II= tuple(series(1,data))

sample_III= tuple(series(2,data))

sample_IV= tuple(series(3,data))

We applied the tuple() function to a composite function based on the head_split_fixed() and row_iter() methods. This will create an object that we can reuse in several other functions. If we don't materialize a tuple object, only the first sample will have any data. After that, the source iterator will be exhausted and all other attempts to access it would yield empty sequestionsnces.

The series() function will pick pairs of items to create the Pair objects. Again, we applied an overall tuple() function to materialize the resulting tuple-of-namedtuple sequences so that we can do further processing on each one.

The sample_I sequence looks like the following command snippet:

(Pair(x='10.0', y='8.04'), Pair(x='8.0', y='6.95'),

Pair(x='13.0', y='7.58'), Pair(x='9.0', y='8.81'),

Etc.

Pair(x='5.0', y='5.68'))

The other three sequences are similar in structure. The values, however, are quite different.

The final thing we'll need to do is create proper numeric values from the strings that we've accumulated so that we can compute some statistical summary values. We can apply the float() function conversion as the last step. There are many alternative places to apply the float() function, and we'll look at some choices in Chapter 5, Higher-order Functions.

Here is an example describing the usage of float() function:

mean = sum(float(pair.y) for pair in sample_I)/len(sample_I)

This will provide the mean of the y value in each Pair object. We can gather a number of statistics as follows:

for subset in sample_I, sample_II, sample_III, sample_III:

mean = sum(float(pair.y) for pair in subset)/len(subset)

print(mean)

We computed a mean for the y values in each pair built from the source database. We created a common tuple-of-namedtuple structure so that we can have reasonably clear references to members of the source dataset. Using pair.y is a bit less obscure thanpair[1].

To reduce memory use—and increase performance—we prefer to use generator expressions and functions as much as possible. These iterate through collections in a lazy (or non-strict) manner, computing values only when required. Since iterators can only be used once, we're sometimes forced to materialize a collection as a tuple (or list) object. Materializing a collection costs memory and time, so we do it reluctantly.

Programmers familiar with Clojure can match Python's lazy generators with the lazy-seq and lazy-cat functions. The idea is that we can specify a potentially infinite sequence, but only take values from it as needed.

Using stateful mappings

Python offers several stateful collections; the various mappings include the dict class and a number of related mappings defined in the collections module. We need to emphasize the stateful nature of these mappings and use them carefully.

For our purposes in learning functional programming techniques in Python, there are two use cases for mapping: a stateful dictionary that accumulates a mapping and a frozen dictionary. In the first example of this chapter, we showed a frozen dictionary that was used by the ElementTree.findall() method. Python doesn't provide an easy-to-use definition of an immutable mapping. The collections.abc.Mapping abstract class is immutable but it's not something we can use trivially. We'll dive into details in Chapter 6, Recursions and Reductions.

Instead of the formality of using the collections.abc.Mapping abstract class, we can fall back on confirming that the variable ns_map appears exactly once on the left side of an assignment statement, methods such as ns_map.update() or ns_map.pop() are never used, and the del statement isn't used with map items.

The stateful dictionary can be further decomposed into two typical use cases; they are as follows:

· A dictionary built once and never updated. In this case, we will exploit the hashed keys feature of the dict class to optimize performance. We can create a dictionary from any iterable sequence of (key, value) two tuples via dict( sequence ).

· A dictionary built incrementally. This is an optimization we can use to avoid materializing and sorting a list object. We'll look at this in Chapter 6, Recursions and Reductions. We'll look at the collections.Counter class as a sophisticated reduction. Incremental building is particularly helpful for memoization. We'll defer memoization until Chapter 16, Optimizations and Improvements.

The first example, building a dictionary once, stems from an application with three operating phases: gather some input, create a dict object, and then process input based on the mappings in the dictionary. As an example of this kind of application, we might be doing some image processing and have a specific palette of colors, represented by names and (R, G, B) tuples. If we use the GNU Image Manipulation Program (GIMP) GNU General Public License (GPL) file format, the color palette might look like the following command snippet:

GIMP Palette

Name: Small

Columns: 3

#

0 0 0 Black

255 255 255 White

238 32 77 Red

28 172 120 Green

31 117 254 Blue

The details of parsing this file are the subject of Chapter 6, Recursions and Reductions. What's important is the results of the parsing.

First, we'll assume that we're using the following Color namedtuple:

from collections import namedtuple

Color = namedtuple("Color", ("red", "green", "blue", "name"))

Second, we'll assume that we have a parser that produces an iterable of Color objects. If we materialize it as a tuple, it would look like the following:

(Color(red=239, green=222, blue=205, name='Almond'), Color(red=205, green=149, blue=117, name='Antique Brass'), Color(red=253, green=217, blue=181, name='Apricot'), Color(red=197, green=227, blue=132, name='Yellow Green'), Color(red=255, green=174, blue=66, name='Yellow Orange'))

In order to locate a given color name quickly, we will create a frozen dictionary from this sequence. This is not the only way to get fast lookups of a color by name. We'll look at another option later.

To create a mapping from a tuple, we will use the process(wrap(iterable)) design pattern. The following command shows how we can create the color name mapping:

name_map= dict( (c.name, c) for c in sequence )

Where the sequence variable is the iterable of the Color objects shown previously, the wrap() element of the design pattern simply transforms each Color object , c, into the two tuple (c.name, c). The process() element of the design uses dict() initialization to create a mapping from name to Color. The resulting dictionary looks as follows:

{'Caribbean Green': Color(red=28, green=211, blue=162, name='Caribbean Green'),'Peach': Color(red=255, green=207, blue=171, name='Peach'), 'Blizzard Blue': Color(red=172, green=229, blue=238, name='Blizzard Blue'),

The order is not guaranteed, so you may not see Caribbean Green first.

Now that we've materialized the mapping, we can use this dict() object in some later processing for repeated transformations from color name to (R, G, B) color numbers. The lookup will be blazingly fast because a dictionary does a rapid transformation from key to hash value followed by lookup in the dictionary.

Using the bisect module to create a mapping

In the previous example, we created a dict mapping to achieve a fast mapping from a color name to a Color object. This isn't the only choice; we can use the bisect module instead. Using the bisect module means that we have to create a sorted object, which we can then search. To be perfectly compatible with the dict mapping, we can use collections.abc.Mapping as the base class.

The dict mapping uses a hash to locate items almost immediately. However, this requires allocating a fairly large block of memory. The bisect mapping does a search, which doesn't require as much memory, but performance can be described as immediate.

A static mapping class looks like the following command snippet:

import bisect

from collections.abc import Mapping

class StaticMapping(Mapping):

def __init__( self, iterable ):

self._data = tuple(iterable)

self._keys = tuple(sorted(key for key, _ in self._data))

def __getitem__(self, key):

ix= bisect.bisect_left(self._keys, key)

if ix != len(self._keys) and self._keys[ix] == key:

return self._data[ix][1]

raise ValueError("{0!r} not found".format(key))

def __iter__(self):

return iter(self._keys)

def __len__(self):

return len(self._keys)

This class extends the abstract superclass collections.abc.Mapping. It provides an initialization and implementations for three functions missing from the abstract definition. The __getitem__() method uses the bisect.bisect_left() function to search the collection of keys. If the key is found, the appropriate value is returned. The __iter__() method returns an iterator, as required by the superclass. The __len__() method, similarly, provides the required length of the collection.

Another option is to start with the source code for the collections.OrderedDict class, change the superclass to Mapping instead of MutableMapping, and remove all of the methods that implement mutability. For more details on which methods to keep and which to discard, refer to the Python Standard Library, section 8.4.1.

Visit the following link for more details:

https://docs.python.org/3.3/library/collections.abc.html#collections-abstract-base-classes

This class might not seem to embody too many functional programming principles. Our goal here is to support a larger application that minimizes the use of stateful variables. This class saves a static collection of key-value pairs. As an optimization, it materializes two objects.

An application that creates an instance of this class is using a materialized object to perform rapid lookups of the keys. The superclass does not support updates to the object. The collection, as a whole, is stateless. It's not as fast as the built-in dict class, but it uses less memory and, through the formality of being a subclass of Mapping, we can be assured that this object is not used to contain a processing state.

Using stateful sets

Python offers several stateful collections, including the set collection. For our purposes, there are two use cases for a set: a stateful set that accumulates items, and a frozenset that is used to optimize searches for an item.

We can create a frozenset from an iterable in the same way as we create a tuple object from an iterable fronzenset(some_iterable) method; this will create a structure that has the advantage of a very fast in operator. This can be used in an application that gatheres data, creates a set, and then uses that frozenset to process some other data items.

We might have a set of colors that we will use as a kind of chroma-key: we will use this color to create a mask that will be used to combine two images. Pragmatically, a single color isn't appropriate but a small set of very similar colors works best. In this case, we'll examine each pixel of an image file to see if the pixel is in the chroma-key set or not. For this kind of processing, the chroma-key colors are loaded into a frozenset before processing the target images. For more information, read about chroma-key processing from the following link:

http://en.wikipedia.org/wiki/Chroma_key

As with mappings—specifically the Counter class—there are some algorithms that can benefit from a memoized set of values. Some functions benefit from memoization because a function is a mapping between domain values and range values, a job for which a mapping works nicely. A few algorithms benefit from a memoized set, which is stateful and grows as data is processed.

We'll return to memoization in Chapter 16, Optimizations and Improvements.

Summary

In this chapter, we looked closely at writing pure functions: free of side effects. The bar is low here, since Python forces us to use the global statement to write impure functions. We looked at generator functions and how we can use these as the backbone of functional programming.

We also examined the built-in collection classes to show how they're used in the functional paradigm. While the general ideal behind functional programming is to limit the use of stateful variables, the collection objects are generally stateful and, for many algorithms, also essential. Our goal is to be judicious in our use of Python's nonfunctional features.

In the next two chapters, we'll look at higher-order functions: functions that accept functions as arguments as well as returning functions. We'll start with an exploration of the built-in higher-order functions. In later chapters, we'll look at techniques for defining our own higher-order functions. We'll also look at the itertools and functools modules and their higher-order functions in later chapters.