Introducing Some Functional Features - Functional Python Programming (2015)

Functional Python Programming (2015)

Chapter 2. Introducing Some Functional Features

Most of the features of functional programming are already first-class parts of Python. Our goal in writing functional Python is to shift our focus away from imperative (procedural or object-oriented) techniques to the maximum extent possible.

We'll look at each of the following functional programming topics:

· First-class and higher-order functions, which are also known as pure functions.

· Immutable Data.

· Strict and non-strict evaluation. We can also call this eager vs. lazy evaluation.

· Recursion instead of an explicit loop state.

· Functional type systems.

This should reiterate some concepts from the first chapter. Firstly, that purely functional programming avoids the complexities of explicit state maintained via variable assignment. Secondly, that Python is not a purely functional language.

We don't offer a rigorous definition of functional programming. Instead, we'll locate some common features that are indisputably important. We'll steer clear of the blurry edges.

First-class functions

Functional programming is often succinct and expressive. One way to achieve this is by providing functions as arguments and return values for other functions. We'll look at numerous examples of manipulating functions.

For this to work, functions must be first-class objects in the runtime environment. In programming languages such as C, a function is not a runtime object. In Python, however, functions are objects that are created (usually) by the def statements and can be manipulated by other Python functions. We can also create a function as a callable object or by assigning lambda to a variable.

Here's how a function definition creates an object with attributes:

>>> def example(a, b, **kw):

... return a*b

...

>>> type(example)

<class 'function'>

>>> example.__code__.co_varnames

('a', 'b', 'kw')

>>> example.__code__.co_argcount

2

We've created an object, example, that is of class function(). This object has numerous attributes. The __code__ object associated with the function object has attributes of its own. The implementation details aren't important. What is important is that functions are first-class objects, and can be manipulated just like all other objects. We previously displayed the values of two of the many attributes of a function object.

Pure functions

To be expressive, a function used in a functional programming design will be free from the confusion created by side effects. Using pure functions can also allow some optimizations by changing evaluation order. The big win, however, stems from pure functions being conceptually simpler and much easier to test.

To write a pure function in Python, we have to write local-only code. This means we have to avoid the global statements. We need to look closely at any use of nonlocal; while it is a side effect in another scope, it's confined to a nested function definition. This is an easy standard to meet. Pure functions are a common feature of Python programs.

There isn't a trivial way to guarantee a Python function is free from side effects. It is easy to carelessly break the pure function rule. If we ever want to worry about our ability to follow this rule, we could write a function that uses the dis module to scan a given function's __code__.co_code compiled code for global references. It could report on use of internal closures, and the __code__.co_freevars tuple method as well. This is a rather complex solution to a rare problem; we won't pursue it further.

A Python lambda is a pure function. While this isn't a highly recommended style, it's certainly possible to create pure functions via lambda values.

Here's a function created by assigning lambda to a variable:

>>> mersenne = lambda x: 2**x-1

>>> mersenne(17)

131071

We created a pure function using lambda and assigned this to the variable mersenne. This is a callable object with a single argument value that returns a single value. Because lambda's can't have assignment statements, they're always pure functions and suitable for functional programming.

Higher-order functions

We can achieve expressive, succinct programs using higher-order functions. These are functions that accept a function as an argument or return a function as a value. We can use higher-order functions as a way to create composite functions from simpler functions.

Consider the Python max() function. We can provide a function as an argument and modify how the max() function behaves.

Here's some data we might want to process:

>>> year_cheese = [(2000, 29.87), (2001, 30.12), (2002, 30.6), (2003, 30.66),(2004, 31.33), (2005, 32.62), (2006, 32.73), (2007, 33.5), (2008, 32.84), (2009, 33.02), (2010, 32.92)]

We can apply the max() function as follows:

>>> max(year_cheese)

(2010, 32.92)

The default behavior is to simply compare each tuple in the sequence. This will return the tuple with the largest value on position 0.

Since the max() function is a higher-order function, we can provide another function as an argument. In this case, we'll use lambda as the function; this is used by the max() function, as follows:

>>> max(year_cheese, key=lambda yc: yc[1])

(2007, 33.5)

In this example, the max() function applies the supplied lambda and returns the tuple with the largest value in position 1.

Python provides a rich collection of higher-order functions. We'll see examples of each of Python's higher-order functions in later chapters, primarily in Chapter 5, Higher-order Functions. We'll also see how we can easily write our own higher-order functions.

Immutable data

Since we're not using variables to track the state of a computation, our focus needs to stay on immutable objects. We can make extensive use of tuples and namedtuples to provide more complex data structures that are immutable.

The idea of immutable objects is not foreign to Python. There can be a performance advantage to using immutable tuples instead of more complex mutable objects. In some cases, the benefits come from rethinking the algorithm to avoid the costs of object mutation.

We will avoid class definitions (almost) entirely. It can seem like it's anathema to avoid objects in an Object-Oriented Programming (OOP) language. Functional programming simply doesn't need stateful objects. We'll see this throughout this book. There are reasons for defining callable objects; it is a tidy way to provide namespace for closely-related functions, and it supports a pleasant level of configurability.

We'll look at a common design pattern that works well with immutable objects: the wrapper() function. A list of tuples is a fairly common data structure. We will often process this list of tuples in one of the two following ways:

· Using Higher-order Functions: As shown earlier, we provided lambda as an argument to the max() function: max(year_cheese, key=lambda yc: yc[1])

· Using the Wrap-Process-Unwrap pattern: In a functional context, we should call this the unwrap(process(wrap(structure))) pattern

For example, look at the following command snippet:

>>> max(map(lambda yc: (yc[1],yc), year_cheese))

(33.5, (2007, 33.5))

>>> _[1]

(2007, 33.5)

This fits the three-part pattern, although it might not be obvious how well it fits.

First, we wrap, using map(lambda yc: (yc[1],yc), year_cheese). This will transform each item into a two tuple with a key followed by the original item. In this example, the comparison key is merely yc[1].

Second, do the processing using the max() function. Since each piece of data has been simplified to a two tuple with position zero used for comparison, we don't really need the higher-order function feature of the max() function. The default behavior of the max()function is exactly what we require.

Finally, we unwrap using the subscript [1]. This will pick the second element of the two tuple selected by the max() function.

This kind of wrap and unwrap is so common that some languages have special functions with names like fst() and snd() that we can use as a function prefix instead of a syntactic suffix of [0] or [1]. We can use this idea to modify our wrap-process-unwrap example, as follows:

snd= lambda x: x[1]

snd( max(map(lambda yc: (yc[1],yc), year_cheese)))

We defined a snd() function to pick the second item from a tuple. This provides us with an easier-to-read version of unwrap(process(wrap())). We used map(lambda... , year_cheese) to wrap our raw data items. We used max() function as the processing and, finally, thesnd() function to extract the second item from the tuple.

In Chapter 13, Conditional Expressions and the Operator Module, we'll look at some alternatives to lambda functions like fst() and snd().

Strict and non-strict evaluation

Functional programming's efficiency stems, in part, from being able to defer a computation until it's required. The idea of lazy or non-strict evaluation is very helpful. It's so helpful that Python already offers this feature.

In Python, the logical expression operators and, or, and if-then-else are all non-strict. We sometimes call them short-circuit operators because they don't need to evaluate all arguments to determine the resulting value.

The following command snippet shows the and operator's non-strict feature:

>>> 0 and print("right")

0

>>> True and print("right")

right

When we execute the preceding command snippet, the left-hand side of the and operator is equivalent to False; the right-hand side is not evaluated. When the left-hand side is equivalent to True, the right-hand side is evaluated.

Other parts of Python are strict. Outside the logical operators, an expression is evaluated eagerly from left-to-right. A sequence of statement lines is also evaluated strictly in order. Literal lists and tuples require eager evaluation.

When a class is created, the method functions are defined in a strict order. In the case of a class definition, the method functions are collected into a dictionary (by default) and order is not maintained after they're created. If we provide two methods with the same name, the second one is retained because of the strict evaluation order.

Python's generator expressions and generator functions, however, are lazy. These expressions don't create all possible results immediately. It's difficult to see this without explicitly logging the details of a calculation. Here is an example of the version of the range()function that has the side effect of showing the numbers it creates:

>>> def numbers():

... for i in range(1024):

... print( "=", i )

... yield i

If this function were eager, it would create all 1,024 numbers. Since it's lazy, it only creates numbers as requested.

Note

The older Python 2 range() function was eager and created an actual list of object with all of the requested numbers. Python 2 has an xrange() function that is lazy and matches the semantics of the Python 3 range() function.

We can use this noisy numbers() function in a way that will show lazy evaluation. We'll write a function that evaluates some, but not all, of the values from this iterator:

>>> def sum_to(n):

... sum= 0

... for i in numbers():

... if i == n: break

... sum += i

... return sum

The sum_to() function will not evaluate the entire result of the numbers() function. It will break after only consuming a few values from the numbers() function. We can see this consumption of values in the following log:

>>> sum_to(5)

= 0

= 1

= 2

= 3

= 4

= 5

10

As we'll see later, Python generator functions have some properties that make them a little awkward for simple functional programming. Specifically, a generator can only be used once in Python. We have to be cautious how we use the lazy Python generator expressions.

Recursion instead of a explicit loop state

Functional programs don't rely on loops and the associated overhead of tracking the state of loops. Instead, functional programs try to rely on the much simpler approach of recursive functions. In some languages, the programs are written as recursions, but Tail-Call Optimization (TCO) by the compiler changes them to loops. We'll introduce some recursion here and examine it closely in Chapter 6, Recursions and Reductions.

We'll look at a simple iteration to test a number for being prime. A prime number is a natural number, evenly divisible by only 1 and itself. We can create a naïve and poorly performing algorithm to determine if a number has any factors between two and the number. This algorithm has the advantage of simplicity; it works acceptably for solving Project Euler problems. Read up on Miller-Rabin primality tests for a much better algorithm.

We'll use the term coprime to mean that two numbers have only 1 as their common factor. The numbers 2 and 3, for example, are coprime. The numbers 6 and 9, however, are not coprime because they have 3 as a common factor.

If we want to know if a number, n, is prime, we actually ask this: is the number n coprime to all prime numbers, p, such that Recursion instead of a explicit loop state. We can simplify this using all integers, p, such that Recursion instead of a explicit loop state.

Sometimes, it helps to formalize this as follows:

Recursion instead of a explicit loop state

The expression could look as follows in Python:

not any(n%p==0 for p in range(2,int(math.sqrt(n))+1))

A more direct conversion from mathematical formalism to Python would use all(n%p != 0... ) but that requires strict evaluation of all values of p. The not any version can terminate early if a True value is found.

This simple expression has a for loop inside it: it's not a pure example of stateless functional programming. We can reframe this into a function that works with a collection of values. We can ask whether the number, n, is coprime within any value in the range Recursion instead of a explicit loop state?". This uses the symbols, [), to show a half-open interval: the lower values are included, and the upper value is not included. This is typical behavior of the Python range() function. We will also restrict ourselves to the domain of natural numbers. The square root values, for example, are implicitly truncated to integers.

We can think of the definition of prime as the following:

Recursion instead of a explicit loop state

When defining a recursive function over a simple range of values, the base case can be an empty range. A nonempty range is handled recursively by processing one value combined with a range that's narrower by one value. We might formalize it as follows:

Recursion instead of a explicit loop state

This version is relatively easy to confirm by examining the two cases, which are given as follows:

· If the range is empty, Recursion instead of a explicit loop state, we evaluate something like:Recursion instead of a explicit loop state. The range contains no values, so the return is a trivial True.

· If the range is not empty, we ask something like Recursion instead of a explicit loop state. This decomposes into Recursion instead of a explicit loop state. For this example, we can see that the first clause is True, and we'll evaluate the second clause recursively.

As an exercise for the reader: this recursion can be redefined to count down instead of up, using [a,b-1) in the second case.

As a side note, some folks like to think of the empty interval as a b, not a=b. This is needless, since a is incremented by 1 and we can easily guarantee that ab, initially. There's no way for a to somehow leap past b by some error in the function; we don't need to over-specify the rules for an empty interval.

Here is a Python code snippet that implements this definition of prime:

def isprimer(n):

def isprime(k, coprime):

"""Is k relatively prime to the value coprime?"""

if k < coprime*coprime: return True

if k % coprime == 0: return False

return isprime(k, coprime+2)

if n < 2: return False

if n == 2: return True

if n % 2 == 0: return False

return isprime(n, 3)

This shows a recursive definition of an isprime() function. The half-open interval Recursion instead of a explicit loop state is reduced to just the low-end argument, a, which is renamed coprime in this function to clarify its purpose. The base case is implemented as n < coprime*coprime; the range of values from coprime to 1+math.sqrt(n) would be empty.

The non-strict and operation is implemented by splitting it out into a separate if statement, if n % coprime == 0. The return statement is the recursive call with a different coprime test value.

Because the recursion is the tail end of the function, this is an example of Tail recursion.

This function is embedded in a function that establishes the boundary condition that n is an odd number greater than 2. There's no point in testing any even number for being prime, since 2 is the only even prime.

What's important in this example is that the two cases of this recursive function are quite simple to design. Making the range of values an explicit argument to the internal isprime() function allows us to call the function recursively with argument values that reflect a steadily shrinking interval.

While this is often extremely succinct and very expressive, we have to be a little cautious about using recursion in Python. There are two problems that arise. They are stated as follows:

· Python imposes a recursion limit to detect recursive functions with improperly defined base cases

· Python does have a compiler to do Tail-Call Optimization (TCO)

The default recursion limit is 1,000, which is adequate for many algorithms. It's possible to change this with the sys.setrecursionlimit() function. It's not wise to raise this arbitrarily since it might lead to exceeding the OS memory limitations and crashing the Python interpreter.

If we try a recursive isprimer() function on a number over 1,000,000, we'll run foul of the recursion limit. If we used a somehow smarter isprimer() function that only checked prime factors instead of all factors, we'd be stopped at the 1,000th prime number, 7,919, limiting our prime testing to numbers below 62,710,561.

Some functional programming languages can optimize simple recursive functions such as our isprimer() function. An optimizing compiler can transform the recursive evaluation of the isprimer(n, coprime+1) method into a low-overhead loop. The optimization tends to make a hash of call stacks; debugging optimized programs becomes difficult. Python doesn't perform this optimization. Performance and memory are sacrificed for clarity and simplicity.

In Python, when we use a generator expression instead of a recursive function, we essentially do the tail-call optimization manually. We don't rely on a compiler for some functional language to do this optimization.

Here is TCO done as a generator expression:

def isprime(p):

if p < 2: return False

if p == 2: return True

if p % 2 == 0: return False

return not any(p==0 for p in range(3,int(math.sqrt(n))+1,2))

This function includes many of the functional programming principles, but it uses a generator expression instead of a pure recursion.

Tip

We'll often optimize a purely recursive function to use an explicit for loop in a generator expression.

This algorithm is slow for large primes. For composite numbers, the function often returns a value quickly. If used on a value such as Recursion instead of a explicit loop state, it will take a few minutes to show that this is prime. Clearly, the slowness comes from checking 1,518,500,249 individual candidate factors.

Functional type systems

Some functional programming languages such as Haskell and Scala are statically compiled, and depend on declared types for functions and their arguments. In order to provide the kind of flexibility Python already has, these languages have sophisticated type matching rules so that a generic function can be written, which works for a variety of related types.

In Object-Oriented Python, we often use the class inheritance hierarchy instead of sophisticated function type matching. We rely on Python to dispatch an operator to a proper method based on simple name matching rules.

Since Python already has the desired levels of flexibility, the type matching rules for a compiled functional language aren't relevant. Indeed, we could argue that the sophisticated type matching is a workaround imposed by static compilation. Python doesn't need this workaround because it's a dynamic language.

In some cases, we might have to resort to using isinstance(a, tuple) to detect if an argument value is tuple or an individual value. This will be as rare in functional programs as it is in Object-Oriented Programs.

Familiar territory

One of the ideas that emerge from the previous list of topics is that most functional programming is already present in Python. Indeed, most functional programming is already a very typical and common part of Object-Oriented Programming.

As a very specific example, a fluent Application Program Interface (API) is a very clear example of functional programming. If we take time to create a class with return self() in each method function, we can use it as follows:

some_object.foo().bar().yet_more()

We can just as easily write several closely-related functions that work as follows:

yet_more(bar(foo(some_object)))

We've switched the syntax from traditional object-oriented suffix notation to a more functional prefix notation. Python uses both notations freely, often using a prefix version of a special method name. For example, the len() function is generally implemented by the class.__len__() special method.

Of course, the implementation of the class shown above might involve a highly stateful object. Even then, a small change in viewpoint might reveal a functional approach that can lead to more succinct or more expressive programming.

The point is not that imperative programming is broken in some way, or that functional programming offers such a vastly superior technology. The point is that functional programming leads to a change in viewpoint that can—in many cases—be very helpful.

Saving some advanced concepts

We will set some more advanced concepts aside for consideration in later chapters. These concepts are part of the implementation of a purely functional language. Since Python isn't purely functional, our hybrid approach won't require deep consideration of these topics.

We will identify these up-front for the benefit of folks who already know a functional language such as Haskell and are learning Python. The underlying concerns are present in all programming languages but we'll tackle them differently in Python. In many cases, we can and will drop into imperative programming rather than use a strictly functional approach.

The topics are as follows:

· Referential transparency: When looking at lazy evaluation and the various kinds of optimization that are possible in a compiled language, the idea of multiple routes to the same object is important. In Python, this isn't as important because there aren't any relevant compile-time optimizations.

· Currying: The type systems will employ currying to reduce multiple-argument functions to single-argument functions. We'll look at currying in some depth in Chapter 11, Decorator Design Techniques.

· Monads: These are purely functional constructs that allow us to structure a sequential pipeline of processing in a flexible way. In some cases, we'll resort to imperative Python to achieve the same end. We'll also leverage the elegant PyMonad library for this. We'll defer this to Chapter 14, The PyMonad Library.

Summary

In this chapter, we've identified a number of features that characterize the functional programming paradigm. We started with first-class and higher-order functions. The idea is that a function can be an argument to a function or the result of a function. When functions become the object of additional programming, we can write some extremely flexible and generic algorithms.

The idea of immutable data is sometimes odd in an imperative and object-oriented programming language such as Python. When we start to focus on functional programming, however, we see a number of ways that state changes can be confusing or unhelpful. Using immutable objects can be a helpful simplification.

Python focuses on strict evaluation: all sub-expressions are evaluated from left-to-right through the statement. Python, however, does perform some non-strict evaluation. The or, and, and if-else logical operators are non-strict: all subexpressions are not necessarily evaluated. Similarly, a generator function is also non-strict. We can also call this eager vs. lazy. Python is generally eager but we can leverage generator functions to create lazy evaluation.

While functional programming relies on recursion instead of explicit loop state, Python imposes some limitations here. Because of the stack limitation and the lack of an optimizing compiler, we're forced to manually optimize recursive functions. We'll return to this topic in Chapter 6, Recursions and Reductions.

Although many functional languages have sophisticated type systems, we'll rely on Python's dynamic type resolution. In some cases, it means we'll have to write manual coercion among types. It might also mean that we'll have to create class definitions to handle very complex situations. For the most part, however, Python's built-in rules will work very elegantly.

In the next chapter, we'll look at the core concepts of pure functions and how these fit with Python's built-in data structures. Given this foundation, we can look at higher-order functions available in Python and how we can define our own higher-order functions.