Cython in Depth - Cython (2015)

Cython (2015)

Chapter 3. Cython in Depth

Readability counts.

Special cases aren’t special enough to break the rules.

Although practicality beats purity.

— T. Peters “The Zen of Python”

The preceding chapters covered what Cython is, why we would want to use it, and how we can compile and run Cython code. With that knowledge in hand, it is time to explore the Cython language in depth.

The first two sections of this chapter cover the deeper reasons why Cython works as well as it does to speed up Python code. These sections are useful to help form a mental model of how Cython works, but are not necessary to understand the what of Cython’s syntax, which comprises the remaining sections.

For those interested in why Cython works, it can be attributed to two differences: runtime interpretation versus ahead-of-time compilation, and dynamic versus static typing.

Interpreted Versus Compiled Execution

To better understand how and why Cython improves the performance of Python code, it is useful to compare how the Python runtime runs Python code with how an operating system runs compiled C code.

Before being run, Python code is automatically compiled to Python bytecode. Bytecodes are fundamental instructions to be executed, or interpreted, by the Python virtual machine (VM). Because the VM abstracts away all platform-specific details, Python bytecode can be generated on one platform and run anywhere else. It is up to the VM to translate each high-level bytecode into one or more lower-level operations that can be executed by the operating system and, ultimately, the CPU. This virtualized design is common and very flexible, bringing with it many benefits—first among them is not having to fuss with picky compilers! The primary downside is that the VM is slower than running natively compiled code.

On the C side of the fence, there is no VM or interpreter, and there are no high-level bytecodes. C code is translated, or compiled, directly to machine code by a compiler. This machine code is incorporated into an executable or compiled library. It is tailored to a specific platform and architecture, it can be run directly by a CPU, and it is as low-level as it gets.

There is a way to bridge the divide between the bytecode-executing VM and machine code–executing CPU: the Python interpreter can run compiled C code directly and transparently to the end user. The C code must be compiled into a specific kind of dynamic library known as an extension module. These modules are full-fledged Python modules, but the code inside of them has been precompiled into machine code by a standard C compiler. When running code in an extension module, the Python VM no longer interprets high-level bytecodes, but instead runs machine code directly. This removes the interpreter’s performance overhead while any operation inside this extension module is running.

How does Cython fit in? As we saw in Chapter 2, we can use the cython and standard C compilers to translate Cython source code into a compiled platform-specific extension module. Whenever Python runs anything inside an extension module, it is running compiled code, so no interpreter overhead can slow things down.

How big of a difference does interpretation versus direct execution make? It can vary widely, depending on the Python code in question, but usually we can expect around a 10 to 30 percent speedup from converting Python code into an equivalent extension module.

Cython gives us this speedup for free, and we are glad to take it. But the real performance improvements come from replacing Python’s dynamic dispatch with static typing.

Dynamic Versus Static Typing

Another important difference between high-level languages like Python, Ruby, Tcl, and JavaScript and low-level languages like C, C++, and Java is that the former are dynamically typed, while the latter are statically typed. Statically typed languages require the type of a variable to be fixed at compile time. Often we can accomplish this by explicitly declaring the type of a variable, or, when possible, the compiler can automatically infer a variable’s type. In either case, in the context where it is used, a variable has that type and only that type.

What benefits does static typing bring? Besides compile-time type checking, compilers use static typing to generate fast machine code that is tailored to that specific type.

Dynamically typed languages place no restrictions on a variable’s type: the same variable can start out as an integer and end up as a string, or a list, or an instance of a custom Python object, for example. Dynamically typed languages are typically easier to write because the user does not have to explicitly declare variables’ types, with the tradeoff that type-related errors are caught at runtime.

When running a Python program, the interpreter spends most of its time figuring out what low-level operation to perform, and extracting the data to give to this low-level operation. Given Python’s design and flexibility, the Python interpreter always has to determine the low-level operation in a completely general way, because a variable can have any type at any time. This is known as dynamic dispatch, and for many reasons, fully general dynamic dispatch is slow.[5]

For example, consider what happens when the Python runtime evaluates a + b:

1. The interpreter inspects the Python object referred to by a for its type, which requires at least one pointer lookup at the C level.

2. The interpreter asks the type for an implementation of the addition method, which may require one or more additional pointer lookups and internal function calls.

3. If the method in question is found, the interpreter then has an actual function it can call, implemented either in Python or in C.

4. The interpreter calls the addition function and passes in a and b as arguments.

5. The addition function extracts the necessary internal data from a and b, which may require several more pointer lookups and conversions from Python types to C types. If successful, only then can it perform the actual operation that adds a and b together.

6. The result then must be placed inside a (perhaps new) Python object and returned. Only then is the operation complete.

The situation for C is very different. Because C is compiled and statically typed, the C compiler can determine at compile time what low-level operations to perform and what low-level data to pass as arguments. At runtime, a compiled C program skips nearly all steps that the Python interpreter must perform. For something like a + b with a and b both being fundamental numeric types, the compiler generates a handful of machine code instructions to load the data into registers, add them, and store the result.

What is the takeaway? A compiled C program spends nearly all its time calling fast C functions and performing fundamental operations. Because of the restrictions a statically typed language places on its variables, a compiler generates faster, more specialized instructions that are tailored to its data. Given this efficiency, is it any wonder that a language like C can be hundreds, or even thousands, of times faster than Python for certain operations?

The primary reason Cython yields such impressive performance boosts is that it brings static typing to a dynamic language. Static typing transforms runtime dynamic dispatch into type-optimized machine code.

Before Cython (and Cython’s predecessor, Pyrex), we could only benefit from static typing by reimplementing our Python code in C. Cython makes it easy to keep our Python code as is and tap into C’s static type system. The first and most important Cython-specific keyword we will learn is cdef, which is our gateway to C’s performance.

Static Type Declaration with cdef

Dynamically typed variables in Cython come for free: we simply assign to a variable to initialize it and use it as we would in Python:[6]

a = [x+1 for x inrange(12)]

b = a

a[3] = 42.0

assert b[3] == 42.0

a = 13

assert isinstance(b, list)

In Cython, untyped dynamic variables behave exactly like Python variables. The assignment b = a allows both a and b to access the same list object created on the first line in the preceding example. Modifying the list via a[3] = 42 modifies the same list referenced by b, so the assertion holds true. The assignment a = 13 leaves b referring to the original list object, while a is now referring to a Python integer object. This reassignment to a changes a’s type, which is perfectly valid Python code.

To statically type variables in Cython, we use the cdef keyword with a type and the variable name. For example:

cdef int i

cdef int j

cdef float k

Using these statically typed variables looks just like Python (or C) code:

j = 0

i = j

k = 12.0

j = 2 * k

assert i != j

NOTE

The important difference between dynamic variables and static variables is that static variables with C types have C semantics, which changes the behavior of assignment. It also means these variables follow C coercion and casting rules.

In the previous example, i = j copies the integer data at j to the memory location reserved for i. This means that i and j refer to independent entities, and can evolve separately.

As with C, we can declare several variables of the same type at once:

cdef int i, j, k

cdef float price, margin

Also, we can provide an optional initial value:

cdef int i = 0

cdef long int j = 0, k = 0

cdef float price = 0.0, margin = 1.0

Inside a function, cdef statements are indented and the static variables declared are local to that function. All of these are valid uses of cdef to declare local variables in a function integrate:

def integrate(a, b, f):

cdef int i

cdef int N=2000

cdef float dx, s=0.0

dx = (b-a)/N

for i inrange(N):

s += f(a+i*dx)

return s * dx

An equivalent way to declare multiple variables is by means of a cdef block, which groups the declarations in an indented region:

def integrate(a, b, f):

cdef:

int i

int N=2000

float dx, s=0.0

# ...

This groups long lists of cdef declarations nicely, and we will use both forms throughout this book.

WHAT ABOUT STATIC AND CONST?

The C static keyword is used to declare a variable whose lifetime extends to the entire lifetime of a program. It is not a valid Cython keyword, so we cannot declare C static variables in Cython. The C const keyword declares an unmodifiable identifier. Cython supports the const keyword, but it is not very useful in the context of this chapter. If we try to declare N asconst, for example, we will get a compilation error (“Error compiling Cython file […] Assignment to const N”). We will see in Chapters 7 and 8 where Cython’s const support becomes useful.

We can declare any kind of variable that C supports. Table 3-1 gives examples using cdef for the more common C types.

Table 3-1. Various cdef declarations

C type

Cython cdef statement

Pointers

cdef int *p

cdef void **buf

Stack-allocated C arrays

cdef int arr[10]

cdef double points[20][30]

typedefed aliased types

cdef size_t len

Compound types (structs and unions)

cdef tm time_struct

cdef int_short_union_t hi_lo_bytes

Function pointers

cdef void (*f)(int, double)

Cython supports the full range of C declarations, even the cryptic arrays-of-pointers-to-function-pointers-that-return-function-pointers tongue twisters. For example, to declare a function that takes a function pointer as its only argument and returns another function pointer, we could say:

cdef int (*signal(int (*f)(int))(int)

It is not immediately apparent how to make use of the signal function in Cython, but we will see later how C function pointers enter the picture with callbacks. Cython does not limit the C-level types that we can use, which is especially useful when we are wrapping external C libraries.

Automatic Type Inference in Cython

Static typing with cdef is not the only way to statically type variables in Cython. Cython also performs automatic type inference for untyped variables in function and method bodies. By default, Cython infers variable types only when doing so cannot change the semantics of the code.

Consider the following simple function:

def automatic_inference():

i = 1

d = 2.0

c = 3+4j

r = i * d + c

return r

In this example, Cython types the literals 1 and 3+4j and the variables i, c, and r as general Python objects. Even though these types have obvious corresponding C types, Cython conservatively assumes that the integer i may not be representable as a C long, so types it as a Python object with Python semantics. Automatic inference is able to infer that the 2.0 literal, and hence the variable d, are C doubles and proceeds accordingly. To the end user, it is as if d is a regular Python object, but Cython treats it as a C double for performance.

By means of the infer_types compiler directive (see Compiler Directives), we can give Cython more leeway to infer types in cases that may possibly change semantics—for example, when integer addition may result in overflow.

To enable type inference for a function, we can use the decorator form of infer_types:

cimport cython

@cython.infer_types(True)

def more_inference():

i = 1

d = 2.0

c = 3+4j

r = i * d + c

return r

Because infer_types is enabled for more_inference, the variable i is typed as a C long; d is a double, as before, and both c and r are C-level complex variables (more on complex variables in Table 3-2 and Complex types). When enabling infer_types, we are taking responsibility to ensure that integer operations do not overflow and that semantics do not change from the untyped version. The infer_types directive can be enabled at function scope or globally, making it easy to test whether it changes the results of the code base, and whether it makes a difference in performance.

C Pointers in Cython

As we saw in Table 3-1, declaring C pointers in Cython uses C syntax and semantics:

cdef int *p_int

cdef float** pp_float = NULL

As with C, the asterisk can be declared adjacent to the type or to the variable, although the pointerness is associated with the variable, not the type.

This means that to declare multiple pointers on a single line we have to use an asterisk with each variable declared, like so:

cdef int *a, *b

If we instead use:

cdef int *a, b

this declares an integer pointer a, and a nonpointer integer b! In recent versions, Cython issues a warning when compiling error-prone declarations such as these.

Dereferencing pointers in Cython is different than in C. Because the Python language already uses the *args and **kwargs syntax to allow arbitrary positional and keyword arguments and to support function argument unpacking, Cython does not support the *a syntax to dereference a C pointer. Instead, we index into the pointer at location 0 to dereference a pointer in Cython. This syntax also works to dereference a pointer in C, although that’s rare.

For example, suppose we have a golden_ratio C double and a p_double C pointer:

cdef double golden_ratio

cdef double *p_double

We can assign golden_ratio’s address to p_double using the address-of operator, &:

p_double = &golden_ratio

We can now assign to golden_ratio through p_double using our indexing-at-zero-to-dereference syntax:

p_double[0] = 1.618

print golden_ratio

# => 1.618

And we can access p_double’s referent the same way:

print p_double[0]

# => 1.618

Alternatively, we can use the cython.operator.dereference function-like operator to dereference a pointer. We access this operator by cimporting from the special cython namespace, which is covered in detail in Chapter 6:

from cython cimport operator

print operator.dereference(p_double)

# => 1.618

This form is not frequently used.

Another difference between Cython and C arises when we are using pointers to structs. (We will cover Cython’s struct support in depth later in this chapter.) In C, if p_st is a pointer to a struct typedef:

st_t *p_st = make_struct();

then to access a struct member a inside p_st, we use arrow syntax:

int a_doubled = p_st->a + p_st->a;

Cython, however, uses dot access whether we have a nonpointer struct variable or a pointer to a struct:

cdef st_t *p_st = make_struct()

cdef int a_doubled = p_st.a + p_st.a

Wherever we use the arrow operator in C, we use the dot operator in Cython, and Cython will generate the proper C-level code.

Mixing Statically and Dynamically Typed Variables

Cython allows assignments between statically and dynamically typed variables. This fluid blending of static and dynamic is a powerful feature that we will use in several instances: it allows us to use dynamic Python objects for the majority of our code base, and easily convert them into fast, statically typed analogues for the performance-critical sections.

To illustrate, say we have several (static) C ints we want to group into a (dynamic) Python tuple. The C code to create and initialize this tuple using the Python/C API is straightforward but tedious, requiring dozens of lines of code, with a significant amount of error checking. In Cython, the obvious way to do it just works:

cdef int a, b, c

# ...Calculations using a, b, and c...

tuple_of_ints = (a, b, c)

This code is trivial, boring even. The point to emphasize here is that a, b, and c are statically typed integers, and Cython allows the creation of a dynamically typed Python tuple literal with them. We can then assign that tuple to the dynamically typed tuple_of_ints variable. The simplicity of this example is part of Cython’s power and beauty: we can just create a tuple of C ints in the obvious way without further thought. We want conceptually simple things like this to be simple, and that is what Cython provides.

This example works because there is an obvious correspondence between C ints and Python ints, so Python can transform things automatically for us. This example would not work as is if a, b, and c were, for example, C pointers. In that case we would have to dereference them before putting them into the tuple, or use another strategy.

Table 3-2 gives the full list of correspondences between built-in Python types and C or C++ types.

Table 3-2. Type correspondence between built-in Python types and C or C++ types

Python type(s)

C type(s)

bool

bint

int
long

[unsigned] char
[unsigned] short
[unsigned] int
[unsigned] long
[unsigned] long long

float

float
double
long double

complex

float complex
double complex

bytes
str
unicode

char *
std::string (C++)

dict

struct

There are several points worth mentioning regarding Table 3-2, which we’ll cover next.

The bint type

The bint Boolean integer type is an int at the C level and is converted to and from a Python bool. It has the standard C interpretation of truthiness: zero is False, and nonzero is True.

Integral type conversions and overflow

In Python 2, a Python int is stored as a C long, and a Python long has unlimited precision. In Python 3, all int objects are unlimited precision.

When converting integral types from Python to C, Cython generates code that checks for overflow. If the C type cannot represent the Python integer, a runtime OverflowError is raised.

There are related Boolean overflowcheck and overflowcheck.fold compiler directives (see Compiler Directives) that will catch overflow errors when we are working with C integers. If overflowcheck is set to True, Cython will raise an OverflowError for overflowing C integer arithmetic operations. The overflowcheck.fold directive, when set, may help remove some overhead when overflowcheck is enabled.

Floating-point type conversions

A Python float is stored as a C double. Converting a Python float to a C float may truncate to 0.0 or positive or negative infinity, according to IEEE 754 conversion rules.

Complex types

The Python complex type is stored as a C struct of two doubles.

Cython has float complex and double complex C-level types, which correspond to the Python complex type. The C types have the same interface as the Python complex type, but use efficient C-level operations. This includes the real and imag attributes to access the real and imaginary components, the conjugate method to create the complex conjugate of a number, and efficient operations for addition, subtraction, multiplication, and division.

The C-level complex type is compatible with the C99 _Complex type or the C++ std::complex templated class.

bytes type

The Python bytes type converts to and from a char * or std::string automatically.

str and unicode types

The c_string_type and c_string_encoding compiler directives need to be set (see ???) to allow str or unicode types to convert to and from a char * or std::string.

Statically Declaring Variables with a Python Type

Until now, we have used cdef to statically declare variables with a C type. It is also possible to use cdef to statically declare variables with a Python type. We can do this for the built-in types like list, tuple, and dict; extension types like NumPy arrays; and many others.

Not all Python types can be statically declared: they must be implemented in C and Cython must have access to the declaration. The built-in Python types already satisfy these requirements, and declaring them is straightforward. For example:

cdef list particles, modified_particles

cdef dict names_from_particles

cdef str pname

cdef set unique_particles

The variables in this example are full Python objects. Under the hood, Cython declares them as C pointers to some built-in Python struct type. They can be used like ordinary Python variables, but are constrained to their declared type:

# ...initialize names_from_particles...

particles = list(names_from_particles.keys())

Dynamic variables can be initialized from statically declared Python types:

other_particles = particles

del other_particles[0]

Here, deleting the 0th element via other_particles will delete the 0th element of particles as well, since they are referring to the same list.

One difference between other_particles and particles is that particles can only ever refer to Python list objects, while other_particles can refer to any Python type. Cython will enforce the constraint on particles at compile time and at runtime.

NOTE

In cases where Python built-in types like int or float have the same name as a C type, the C type takes precedence. This is almost always what we want.

When we are adding, subtracting, or multiplying scalars, the operations have Python semantics (including automatic Python long coercion for large values) when the operands are dynamically typed Python objects. They have C semantics (i.e., the result may overflow for limited-precision integer types) when the operands are statically typed C variables.

Division and modulus (i.e., computing the remainder) deserve special mention. C and Python have markedly different behavior when computing the modulus with signed integer operands: C rounds toward zero, while Python rounds toward infinity. For example, -1 % 5 evaluates to 4 with Python semantics; with C semantics, however, it evaluates to -1. When dividing two integers, Python always checks the denominator and raises a ZeroDivisionError when it is zero, while C has no such safeguards in place.

Following the principle of least astonishment, Cython uses Python semantics by default for division and modulus even when the operands are statically typed C scalars. To obtain C semantics, we can use the cdivision compiler directive (see Compiler Directives), either at the global module level, or in a directive comment:

# cython: cdivision=True

or at the function level with a decorator:

cimport cython

@cython.cdivision(True)

def divides(int a, int b):

return a / b

or within a function with a context manager:

cimport cython

def remainder(int a, int b):

with cython.cdivision(True):

return a % b

Note that when we are dividing C integers with cdivision(True), if the denominator is zero, the result may lead to undefined behavior (i.e., anything from hard crashes to corrupted data).

Cython also has the cdivision_warnings compiler directive (which has a default value of False). When cdivision_warnings is True, Cython emits a runtime warning whenever division (or modulo) is performed with negative operands.

Static Typing for Speed

It may seem odd at first that Cython allows static declaration of variables with built-in Python types. Why not just use Python’s dynamic typing as usual? The answer points to a general Cython principle: the more static type information we provide, the better Cython can optimize the result. As always, there are exceptions to this rule, but it is more often true than not. For instance, this line of code simply appends a Particle object to a dynamic dynamic_particles variable:

dynamic_particles = make_particles(...)

# ...

dynamic_particles.append(Particle())

# ...

The cython compiler will generate code that can handle any Python object, and tests at runtime if dynamic_particles is a list. If it is not, as long as it has an append method that takes an argument, this code will run. Under the hood, the generated code first looks up the appendattribute on the dynamic_particles object (using PyObject_GetAttr), and then calls that method using the completely general PyObject_Call Python/C API function. This essentially emulates what the Python interpreter would do when running equivalent Python bytecode.

Suppose we statically declare a static_particles Python list and use it instead:

cdef list static_particles = make_particles(...)

# ...

static_particles.append(Particle())

# ...

Now Cython can generate specialized code that directly calls either the PyList_SET_ITEM or the PyList_Append function from the C API. This is what PyObject_Call in the previous example ends up calling anyway, but static typing allows Cython to remove dynamic dispatch onstatic_particles.

Cython currently supports several built-in statically declarable Python types, including:

§ type, object

§ bool

§ complex

§ basestring, str, unicode, bytes, bytearray

§ list, tuple, dict, set, frozenset

§ array

§ slice

§ date, time, datetime, timedelta, tzinfo

More types may be supported in future releases.

Python types that have direct C counterparts—like int, long, and float—are not included in the preceding list. It turns out that it is not straightforward to statically declare and use PyIntObjects, PyLongObjects, or PyFloatObjects in Cython; fortunately, the need to do so is rare. We just declare regular C ints, longs, floats, and doubles and let Cython do the automatic conversion to and from Python for us.

NOTE

A Python float corresponds to a C double. For this reason, C doubles are preferred whenever conversions to and from Python are used to ensure no clipping of values or loss of precision.

In Python 2, a Python int (more precisely, a PyIntObject at the C level) stores its value internally as a C long. So a C long is the preferred integral data type to ensure maximal compatibility with Python.

Python also has a PyLongObject at the C level to represent arbitrarily sized integers. In Python 2, these are exposed as the long type, and if an operation with PyIntObject overflows, a PyLongObject results.

In Python 3, at the C level, all integers are PyLongObjects.

Cython properly converts between C integral types and these Python integer types in a language-agnostic way, and raises an OverflowError when a conversion is not possible.

When we work with Python objects in Cython, whether statically declared or dynamic, Cython still manages all aspects of the object for us, which includes the tedium of reference counting.

Reference Counting and Static String Types

One of Python’s major features is automatic memory management. CPython implements this via straightforward reference counting, with an automatic garbage collector that runs periodically to clean up unreachable reference cycles.

Cython handles all reference counting for us, ensuring a Python object (whether statically typed or dynamic) is finalized when its reference count reaches zero.

CPython’s automatic memory management has certain implications when mixing static and dynamic variables in Cython. Say, for instance, we have two Python bytes objects b1 and b2, and we want to extract the underlying char pointer after adding them together:

b1 = b"All men are mortal."

b2 = b"Socrates is a man."

cdef char *buf = b1 + b2

The b1 + b2 expression is a temporary Python bytes object, and the assignment attempts to extract that temporary object’s char pointer using Cython’s automatic conversion rules. Because the result of the addition is a temporary object, the preceding example cannot work—the temporary result of the addition is deleted immediately after it is created, so the char buffer cannot refer to a valid Python object. Fortunately, Cython is able to catch the error and issue a compilation error.

Once understood, the right way to accomplish what we want is straightforward—just use a temporary Python variable, either dynamically typed:

tmp = s1 + s2

cdef char *buf = tmp

or statically typed:

cdef bytes tmp = s1 + s2

cdef char *buf = tmp

These cases are not common. It is an issue here only because a C-level object is referring to data that is managed by a Python object. Because the Python object owns the underlying string, the C char * buffer has no way to tell Python that it has another (non-Python) reference. We have to create a temporary bytes object so that Python does not delete the string data, and we must ensure that the temporary object is maintained as long as the C char * buffer is required. The other C types listed in Table 3-2 are all value types, not pointer types. For these types, the Python data is copied during assignment (C semantics), allowing the C variable to evolve separately from the Python object used to initialize it.

Just as Cython understands both dynamic Python variables and static C variables, it also understands functions in both languages, and allows us to use either kind.

Cython’s Three Kinds of Functions

Much of what we have learned about dynamic and static variables applies to functions as well. Python and C functions have some common attributes: they both (usually) have a name, take zero or more arguments, and can return new values or objects when called. But Python functions are more flexible and powerful. Python functions are first-class citizens, meaning that they are objects with state and behavior. This abstraction is very useful.

A Python function can be

§ created both at import time and dynamically at runtime;

§ created anonymously with the lambda keyword;

§ defined inside another function (or other nested scope);

§ returned from other functions;

§ passed as an argument to other functions;

§ called with positional or keyword arguments;

§ defined with default values.

C functions have minimal call overhead, making them orders of magnitude faster than Python functions. A C function

§ can be passed as an argument to other functions (but doing so is much more cumbersome than in Python);

§ cannot be defined inside another function;

§ has a statically assigned name that is not modifiable;

§ takes arguments only by position;

§ does not support default values for parameters.

All of the power and flexibility of Python functions comes at a cost: Python functions are several orders of magnitude slower than C functions—even functions that take no arguments.

Cython supports both Python and C functions and allows them to call each other in a natural and straightforward way, all in the same source file.

Python Functions in Cython with the def Keyword

Cython supports regular Python functions defined with the def keyword, and they work as we would expect. For example, consider a recursive py_fact function that recursively computes the factorial of its argument:

def py_fact(n):

"""Computes n!"""

if n <= 1:

return 1

return n * py_fact(n - 1)

This simple Python function is valid Cython code. In Cython, the n argument is a dynamic Python variable, and py_fact must be passed a Python object when called. py_fact is used the same way regardless of whether it is defined in pure Python or defined in Cython and imported from an extension module.

We can compile the py_fact example using any of the methods described in Chapter 2. If we put the py_fact function in a file named fact.pyx, we can easily compile it on the fly using pyximport from an interactive prompt (here, IPython):

In [1]: import pyximport

In [2]: pyximport.install()

Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x101c65690>)

In [3]: import fact

We can now access and use fact.py_fact:

In [4]: fact.py_fact?

Type: builtin_function_or_method

String Form:<built-in function py_fact>

Docstring: Computes n!

In [5]: fact.py_fact(20)

Out[5]: 2432902008176640000

Let’s define a pure-Python version of py_fact in the interpreter for comparison:

In [7]: def interpreted_fact(n):

...: """Computes n!"""

...: if n <= 1:

...: return 1

...: return n * interpreted_fact(n - 1)

...:

We can compare their runtimes with the handy IPython %timeit magic:

In [8]: %timeit interpreted_fact(20)

100000 loops, best of 3: 4.24 µs per loop

In [9]: %timeit fact.py_fact(20)

1000000 loops, best of 3: 1.78 µs per loop

The py_fact function runs approximately two times faster with Cython for small input values on this system, although the speedup depends on a number of factors. The source of the speedup is the removal of interpretation overhead and the reduced function call overhead in Cython.

With respect to usage, interpreted_fact and the Cython-compiled py_fact are identical. With respect to implementation, these two functions have some important differences. The Python version has type function, while the Cython version has typebuiltin_function_or_method. The Python version has several attributes available to it—such as __name__—that are modifiable, while the Cython version is not modifiable. The Python version, when called, executes bytecodes with the Python interpreter, while the Cython version runs compiled C code that calls into the Python/C API, bypassing bytecode interpretation entirely.

Factorials grow very quickly. One nice feature of Python integers is that they can represent arbitrarily large values (memory constraints), and can therefore represent values that C integral types cannot. These large integers are very convenient, but that convenience comes at the cost of performance.

We can tell Cython to type n as a C integral type and possibly gain a performance improvement, with the understanding that we are now working with limited-precision integers that may overflow (more on handling overflow later).

Let’s define a new function, typed_fact, inside our fact.pyx file:

def typed_fact(long n):

"""Computes n!"""

if n <= 1:

return 1

return n * typed_fact(n - 1)

Here, we statically type n. Because n is a function argument, we omit the cdef keyword. When we call typed_fact from Python, Cython will convert the Python object argument to a C long, raising an appropriate exception (TypeError or OverflowError) if it cannot.

When defining any function in Cython, we may mix dynamically typed Python object arguments with statically typed arguments. Cython allows statically typed arguments to have default values, and statically typed arguments can be passed positionally or by keyword.

In this case, statically typing typed_fact’s argument does not improve performance over py_fact. Because typed_fact is a Python function, its return value is a Python integer object, not a statically typed C long. When computing n * typed_fact(n - 1), Cython has to generate lots of code to extract the underlying C long from the Python integer returned from typed_fact, multiply it by the statically typed n, and pack that result into a new Python integer, which is then returned. All this packing and unpacking leads to essentially the same code paths taken by thepy_fact function we saw earlier.

So how do we improve performance? We could translate this into a loop rather than a recursive function, but we will hold off on that for now. What we would like to do is tell Cython, “Here is a C long; compute its factorial without creating any Python integers, and I’ll make a Python integer out of that result to return.” Essentially, we want a pure C function to do all the hard work using only C function calls and statically typed C data. We can then trivially convert the result to a Python integer and return that. This is a perfect fit for Cython’s cdef function.

C Functions in Cython with the cdef Keyword

When used to define a function, the cdef keyword creates a function with C-calling semantics. A cdef function’s arguments and return type are typically statically typed, and they can work with C pointer objects, structs, and other C types that cannot be automatically coerced to Python types. It is helpful to think of a cdef function as a C function that is defined with Cython’s Python-like syntax.

A cdef version of the factorial function would look something like:

cdef long c_fact(long n):

"""Computes n!"""

if n <= 1:

return 1

return n * c_fact(n - 1)

Its definition is very similar to typed_fact, the primary difference being the long return type.

Careful inspection of c_fact in the preceding example reveals that the argument type and return type are statically declared, and no Python objects are used; hence, no conversions from Python types to C types are necessary. Calling the c_fact function is as efficient as calling a pure-C function, so the function call overhead is minimal. Nothing prevents us from declaring and using Python objects and dynamic variables in cdef functions, or accepting them as arguments. But cdef functions are typically used when we want to get as close to C as possible without writing C code directly.

Cython allows cdef functions to be defined alongside Python def functions in the same Cython source file. The optional return type of a cdef function can be any static type we have seen, including pointers, structs, C arrays, and static Python types like list or dict. We can also have a return type of void. If the return type is omitted, then it defaults to object.

A function declared with cdef can be called by any other function—def or cdef—inside the same Cython source file (we will see in Chapter 6 how to relax this constraint). However, Cython does not allow a cdef function to be called from external Python code. Because of this restriction,cdef functions are typically used as fast auxiliary functions to help def functions do their job.

If we want to use c_fact from Python code outside this extension module, we need a minimal def function that calls c_fact internally:

def wrap_c_fact(n):

"""Computes n!"""

return c_fact(n)

We get a nice speedup for our efforts: wrap_c_fact(20) is about 10 times faster than typed_fact(20) and py_fact(20), both of which have significant Python overhead.

Unfortunately, the wrap_c_fact function comes with some limitations. One limitation is that wrap_c_fact and its underlying c_fact are restricted to C integral types only, and do not have the benefit of Python’s unlimited-precision integers. In practice, this means that wrap_c_factgives erroneous results for arguments larger than some small value, depending on how large an unsigned long is on our system. For typical 8-byte C longs, wrap_c_fact(21) yields invalid results. One option to partially address this limitation while maintaining Cython’s performance would be to use doubles rather than integral types.

This is a general issue when we are working with Python and C, and is not specific to Cython: Python objects and C types do not always map to each other perfectly, and we have to be aware of C’s limitations.

Combining def and cdef Functions with cpdef

There is a third kind of function, declared with the cpdef keyword, that is a hybrid of def and cdef. A cpdef function combines features from both of the other kinds of functions and addresses many of their limitations. In the previous section we made the cdef function c_fact available to Python by writing a def wrapper function, wrap_c_fact, that simply forwards its arguments on to c_fact and returns its result. A single cpdef function gives us these two functions automatically: we get a C-only version of the function and a Python wrapper for it, both with the same name. When we call the function from Cython, we call the C-only version; when we call the function from Python, the wrapper is called. In this way, cpdef functions combine the accessibility of def functions with the performance of cdef functions.

To continue with our example, let us define a cpdef function cp_fact to see how we can clean up the wrap_c_fact and c_fact combo:

cpdef long cp_fact(long n):

"""Computes n!"""

if n <= 1:

return 1

return n * cp_fact(n - 1)

Our cp_fact provides the speed of c_fact and the Python accessibility of py_fact, all in one place. Its performance is identical to that of wrap_c_fact; that is, about 10 times faster than py_fact.

INLINE CDEF AND CPDEF FUNCTIONS

C and C++ support an optional inline keyword to suggest that the compiler replace the so-declared function with its body wherever it is called, thereby further removing call overhead. The compiler is free to ignore inline.

Cython supports the inline keyword for cdef and cpdef functions—we simply place inline after the cdef or cpdef keyword:

cdef inline long c_fact(long a):

# ...

Cython passes this modifier through to the generated C or C++ code.

The inline modifier, when judiciously used, can yield performance improvements, especially for small inlined functions called in deeply nested loops, for example.

A cpdef function has one limitation, due to the fact that it does double duty as both a Python and a C function: its arguments and return types have to be compatible with both Python and C types. Any Python object can be represented at the C level (e.g., by using a dynamically typed argument, or by statically typing a built-in type), but not all C types can be represented in Python. So, we cannot use void, C pointers, or C arrays indiscriminately as the argument types or return type of cpdef functions. Table 3-2 may be useful here.

Functions and Exception Handling

A def function always returns some sort of PyObject pointer at the C level. This invariant allows Cython to correctly propagate exceptions from def functions without issue. Cython’s other two function types—cdef and cpdef—may return a non-Python type, which makes some other exception-indicating mechanism necessary.

For example, suppose we have a cpdef function that divides integers, and therefore must consider what to do when the denominator is zero:

cpdef int divide_ints(int i, int j):

return i / j

If we call divide_ints with j=0, a ZeroDivisionError exception will be set, but there is no way for divide_ints to communicate this to its caller:

In [1]: import pyximport; pyximport.install()

Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c7d650>)

In [2]: from division import divide_ints

In [3]: divide_ints(1, 1)

Out[3]: 1

In [4]: divide_ints(1, 0)

Exception ZeroDivisionError: 'integer division or modulo by zero'

in 'division.divide_ints' ignored

Out[4]: 0

Note that even though Python detects the ZeroDivisionError, the warning message indicates that it was ignored, and the call to divide_ints(1, 0) returns an erroneous value of 0.

To correctly propagate this exception, Cython provides an except clause to allow a cdef or cpdef function to communicate to its caller that a Python exception has or may have occurred during its execution:

cpdef int divide_ints(int i, int j) except? -1:

return i / j

Because we modified the Cython source, we must restart the Python (or IPython) interpreter; otherwise, we cannot access our modified version of divide_ints:

In [1]: import pyximport; pyximport.install()

Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c67690>)

In [2]: from division import divide_ints

In [3]: divide_ints(1, 0)

Traceback (most recent call last):

File "<ipython-input-3-27c79d4283e7>", line 1, in <module>

divide_ints(1, 0)

File "division.pyx", line 1, in division.divide_ints (...)

cpdef int divide_ints(int i, int j) except? -1:

File "division.pyx", line 2, in division.divide_ints (...)

return i / j

ZeroDivisionError: integer division or modulo by zero

We see that the exception is now correctly propagated and is no longer ignored.

The except? -1 clause allows the return value -1 to act as a possible sentinel that an exception has occurred. If divide_ints ever returns -1, Cython checks if the global exception state has been set, and if so, starts unwinding the stack. We do not have to set the return value to -1ourselves when an exception occurs; Cython does this for us automatically. The value -1 here is arbitrary: we could have used a different integer literal that is within the range of values for the return type.

In this example we use a question mark in the except clause because -1 might be a valid result from divide_ints, in which case no exception state will be set. If there is a return value that always indicates an error has occurred without ambiguity, then the question mark can be omitted. Alternatively, to have Cython check if an exception has been raised regardless of return value, we can use the except * clause instead. This will incur some overhead.

Functions and the embedsignature Compiler Directive

When working with a pure-Python function, we can easily see its signature when using IPython’s introspection:

In [11]: interpreted_fact?

Type: function

String Form:<function interpreted_fact at 0x101c711b8>

File: [...]

Definition: interpreted_fact(n)

Docstring: Computes n!

IPython calls the signature of interpreted_fact the definition.

Cython-compiled def and cpdef functions do have a standard docstring, but do not include a signature by default:

In [12]: fact.py_fact?

Type: builtin_function_or_method

String Form:<built-in function py_fact>

Docstring: Computes n!

We can instruct Cython to inject the compiled function’s Python signature into the docstring with the embedsignature compiler directive (see Compiler Directives).

When embedsignature is set to True, we see the signature for py_fact in the output:

In [3]: fact.py_fact?

Type: builtin_function_or_method

String Form:<built-in function py_fact>

Docstring:

py_fact(n)

Computes n!

This can be helpful to know the argument names, their default values, the order in which arguments are passed in, and more.

GENERATED C CODE

The cython compiler outputs either a C or a C++ source file. The generated code is highly optimized, and the variable names are modified from the original. For these reasons, it is not particularly easy to read.

For a very simple Cython function called mult, defined in mult.pyx, let’s see a little bit of the generated source. Let’s first compile a fully dynamic version:

def mult(a, b):

return a * b

We place this function in mult.pyx and call cython to generate mult.c:

$ cython mult.pyx

Looking at mult.c, we see it is several thousand lines long. Some of this is extension module boilerplate, and most is support code that is not actually used for trivial functions like this. Cython generates embedded comments to indicate what C code corresponds to each line of the original Cython source.

Let’s look at the generated C code that computes a + b:

/* "mult.pyx":3

*

* def mult(a, b):

* return a * b # <<<<<<<<<<<<<<

*/

__pyx_t_1 = PyNumber_Multiply(__pyx_v_a, __pyx_v_b);

if (unlikely(!__pyx_t_1)) {

__pyx_filename = __pyx_f[0];

__pyx_lineno = 3;

__pyx_clineno = __LINE__;

goto __pyx_L1_error;

}

We see that the generated code is calling the PyNumber_Multiply function from the Python/C API, which is the most general way to multiply any two objects in Python (not just numbers, despite the name). The types of the __pyx_v_a and __pyx_v_b variables are PyObject*. This code will work for any objects that support multiplication, and will raise an exception otherwise.

Let’s add static typing to mult:

def mult(int a, int b):

return a * b

The generated source code now does C-level multiplication of C integers, which will have much better performance:

/* "mult.pyx":3

*

* def mult(int a, int b):

* return a * b # <<<<<<<<<<<<<<

*/

__pyx_t_1 = __Pyx_PyInt_From_int((__pyx_v_a * __pyx_v_b));

/* etc. */

The __pyx_v_a and __pyx_v_b variables are now declared as ints, as we would expect with our changed declaration, and Cython now computes the product of a and b by generating a call to __Pyx_PyInt_From_int, which is a thin wrapper around the Python/C API function PyInt_FromLong.

A more convenient way to check the generated code is found in Chapter 9, which covers compile-time options that generate an annotated source file. These annotated files help us determine in a high-level way whether Cython is generating the fastest possible code.

Type Coercion and Casting

Both C and Python have well-defined rules for coercion between numeric types. Because statically typed numeric types in Cython are C types, C coercion rules apply here as well.

Explicit casting between types is common in C, especially when we’re dealing with C pointers. Cython provides a casting operator that is very similar to C’s casting operator, except that it replaces parentheses with angle brackets. A simple cast from a void * to an int * would look like:

cdef int *ptr_i = <int*>v

For this example, the cython compiler generates the C equivalent:

int *ptr_i = (int*)v;

Explicit casting in C is not checked, providing total control over type representation. For example, it is possible—but not recommended—to create a function print_address that prints the memory address of a Python object, which should be equivalent to the object’s identity as returned by the id built-in function:

def print_address(a):

cdef void *v = <void*>a

cdef long addr = <long>v

print "Cython address:", addr

print "Python id :", id(a)

We can try out print_address on systems where sizeof(void*) equals sizeof(long):

In [1]: import pyximport; pyximport.install()

Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c64290>)

In [2]: import casting

In [3]: casting.print_address(1)

Cython address: 4298191640

Python id : 4298191640

We can use casting with Python extension types, either built-in or types that we define ourselves (Chapter 5). A somewhat contrived example:

def cast_to_list(a):

cdef list cast_list = <list>a

print type(a)

print type(cast_list)

cast_list.append(1)

In this example, we take a Python object of any type and cast it to a static list. Cython will treat cast_list as a list at the C level, and will call either PyList_SET_ITEM or PyList_Append on it for the last line. This will succeed as long as the argument is a list or a subtype, and will raise a nasty SystemError exception otherwise. Such bare casts are appropriate only when we are certain that the object being cast has a compatible type.

When we are less than certain and want Cython to check the type before casting, we can use the checked casting operator instead:

def safe_cast_to_list(a):

cdef list cast_list = <list?>a

print type(a)

print type(cast_list)

cast_list.append(1)

This version of the function will raise a saner TypeError when a is not a list or a subtype at casting time.

Casting also comes into play when we are working with base and derived classes in an extension type hierarchy. See Chapter 5 for more on extension types with Cython.

Declaring and Using structs, unions, and enums

Cython also understands how to declare, create, and manipulate C structs, unions, and enums. For the un-typedefed C struct or union declaration:

struct mycpx {

int a;

float b;

};

union uu {

int a;

short b, c;

};

the equivalent Cython declarations are:

cdef struct mycpx:

float real

float imag

cdef union uu:

int a

short b, c

Cython’s syntax for struct and union declarations uses cdef and an indented block for the struct or union members. This is another case where Cython blends Python with C: it uses Python-like blocks to define C-level constructs.

We can combine struct and union declarations with ctypedef, which creates a new type alias for the struct or union:

ctypedef struct mycpx:

float real

float imag

ctypedef union uu:

int a

short b, c

To declare a variable with the struct type, simply use cdef, and use the struct type as you would any other type:

cdef mycpx zz

The declaration of zz is the same whether the struct was declared with cdef or ctypedef.

We can initialize a struct in three ways:

§ We can use struct literals:

§ cdef mycpx a = mycpx(3.1415, -1.0)

cdef mycpx b = mycpx(real=2.718, imag=1.618034)

Note the use of function-like syntax, including keyword-like argument support. This is another instance where Cython blends Python and C++ constructs.

§ The struct fields can be assigned by name individually:

§ cdef mycpx zz

§ zz.real = 3.1415

zz.imag = -1.0

For initialization, struct literals are more convenient, but direct assignment can be used to update an individual field.

§ Lastly, structs can be assigned from a Python dictionary:

cdef mycpx zz = {'real': 3.1415, 'imag': -1.0}

This uses Cython’s automatic conversion to do the individual assignments automatically. Note that this involves more Python overhead.

Nested and anonymous inner struct or union declarations are not supported. It is necessary to un-nest the declarations and to provide dummy names when necessary. For example, this nested C struct declaration:

struct nested {

int outer_a;

struct _inner {

int inner_a;

} inner;

};

can be declared in Cython like this:

cdef struct _inner:

int inner_a

cdef struct nested:

int outer_a

_inner inner

We can initialize a nested struct on a field-by-field basis or by assigning to a nested dictionary that matches the structure of nested:

cdef nested n = {'outer_a': 1, 'inner': {'inner_a': 2}}

To define an enum, we can define the members on separate lines, or on one line separated with commas:

cdef enum PRIMARIES:

RED = 1

YELLOW = 3

BLUE = 5

cdef enum SECONDARIES:

ORANGE, GREEN, PURPLE

An enum can be declared with either ctypedef or cdef, as in the preceding examples, like a struct or union.

Anonymous enums are useful to declare global integer constants:

cdef enum:

GLOBAL_SEED = 37

Structs, unions, and enums will be used more frequently when we interface with external code in Chapters 7 and 8.

Type Aliasing with ctypedef

Another C feature that Cython supports is type aliasing with the ctypedef keyword. This is used in a similar way to C’s typedef statement, and is essential when interfacing with external code that uses typedef aliases. We will see more of ctypedef in Chapters 7 and 8.

Here’s a simple example:

ctypedef double real

ctypedef long integral

def displacement(real d0, real v0, real a, real t):

"""Calculates displacement under constant acceleration."""

cdef real d = d0 + (v0 * t) + (0.5 * a * t**2)

return d

In this example, the ctypedef aliases allow us to switch the precision of the calculation from double precision to single precision by changing a single line of the program. Cython is able to convert between Python numeric types and these ctypedef type aliases without difficulty.

The ctypedef feature is particularly useful for C++, when typedef aliases can significantly shorten long templated types. A ctypedef statement must occur at file scope, and cannot be used inside a function (or other local) scope to declare a local type name. The typedef is passed through to the generated source code.

FUSED TYPES AND GENERIC PROGRAMMING

Cython has a novel typing feature, known as fused types, that allows us to refer to several related types with a single type definition. As of this writing, fused types are experimental, and their syntax and semantics may change in future releases. We will therefore cover just the basics here. We will also mention them where relevant in later chapters.

Cython provides three built-in fused types that we can use directly: integral, floating, and numeric. All are accessed via the special cython namespace, which must be cimported (see Chapter 6).

The integral fused type groups together the C short, int, and long scalar types. The floating fused type groups the float and double C types, and numeric—the most general—groups all integral and floating types along with float complex and double complex. Let’s look at an example to make fused types more concrete.

Consider the following implementation of max for integral values:

from cython cimport integral

cpdef integral integral_max(integral a, integral b):

return a if a >= b else b

Because we’ve used cython.integral as the argument and return type, Cython creates three versions of integral_max: one for a and b both shorts, one for them both ints, and one for them both longs. Cython will use the long version when we call integral_max from Python. When we call integral_max from other Cython code, Cython checks the argument types at compile time to determine which version of integral_max to use.

For example, these three uses of integral_max from Cython are allowed:

cdef allowed():

print integral_max(<short>1, <short>2)

print integral_max(<int>1, <int>2)

print integral_max(<long>5, <long>10)

But we cannot mix specializations for the same fused type from other Cython code; doing so generates a compile-time error, as Cython does not have a version of integral_max to dispatch:

cdef not_allowed():

print integral_max(<short>1, <int>2)

print integral_max(<int>1, <long>2)

Trying to pass in a float or double to integral_max will result in a compile-time error if we’re doing so from Cython, and will result in a TypeError if we’re doing so from Python.

It would be nice to generalize integral_max to support floats and doubles as well. We cannot use the cython.numeric fused type to do so, because complex numbers are not comparable. But we can create our own fused type to group the integral and floating C types. This uses the ctypedef fused statement:

cimport cython

ctypedef fused integral_or_floating:

cython.short

cython.int

cython.long

cython.float

cython.double

cpdef integral_or_floating generic_max(integral_or_floating a,

integral_or_floating b):

return a if a >= b else b

The generic_max function now has five specializations, one for each C type included in the ctypedef fused block, and can therefore handle floating arguments as well as integral arguments.

If a function or method uses a fused type, at least one of its arguments must be declared with that fused type, to allow Cython to determine the actual function specialization to dispatch to at compile time or runtime. Provided at least one argument has a fused type, the function or method can have local variables of the fused type as well.

Fused types—and their associated generic functions—have several other features, some of which we will point out in Chapters 8 and 10. Currently the most significant limitation of fused types is that they cannot be used for extension type attributes (Chapter 5). We do not go into full depth on fused types because this feature is still in its infancy. Please refer to Cython’s online documentation for the most up-to-date material on fused types.

Cython for Loops and while Loops

Python for and while loops are flexible and high level; their syntax is natural and reads like pseudocode. Cython supports for and while loops without modification. Because loops, by nature, often occupy the majority of a program’s runtime, it is worth keeping in mind some pointers to ensure Cython can translate Python looping constructs into efficient C analogues.

Consider the common Python for loop over a range:

n = 100

# ...

for i inrange(n):

# ...

If the index variable i and range argument n are dynamically typed, Cython may not be able to generate a fast C for loop. We can easily fix that by typing i and n:

cdef unsigned int i, n = 100

for i inrange(n):

# ...

The static typing ensures Cython generates efficient C code:

for (i=0; i<n; ++i) {

/* ... */

}

Cython is often able to infer types and generate fast loops automatically, but not always. The following guidelines will help Cython generate efficient loops.

Guidelines for Efficient Loops

When looping over a range call, we should type the range argument as a C integer:

cdef int N

# ...

for i inrange(N):

# ...

Cython will automatically type the loop index variable i as an int as well, provided we do not use the index in an expression in the loop body. If we do use i in an expression, Cython cannot automatically infer whether the operation will overflow, and conservatively refuses to infer a C integer type.

If we are certain the expression will not cause integer overflow, we should statically type the index variable as well:

cdef int i, N

for i inrange(N):

a[i] = i + 1

When looping over a container (list, tuple, dict, etc.), statically typing the loop indexing variable may introduce more overhead, depending on the situation. For efficient loops over containers, consider converting the container to a C++ equivalent container (Chapter 8) or using typed memoryviews (Chapter 10) instead.

These guidelines will likely reduce loop overhead. We will learn more about optimizing loop bodies we cover Cython’s NumPy support and typed memoryviews in Chapter 10.

To ensure efficient while loops, we must make the loop condition expression efficient. This may involve using typed variables and cdef functions. Simple while True loops with an internal break are efficiently translated to C automatically.

Loop Example

Say we want to smooth a one-dimensional array by updating each element with the average of that point with its immediate neighbors. A Python version (ignoring endpoints) would be:

n = len(a) - 1

# "a" is a list or array of Python floats.

for i inrange(1, n):

a[i] = (a[i-1] + a[i] + a[i+1]) / 3.0

Because we have to access the i-1 and i+1 elements on each iteration, we cannot iterate through a directly. This example is almost in a Cython-friendly format. We only need to add some minimal typing information for Cython to generate a fast loop:

cdef unsigned int i, n = len(a) - 1

for i inrange(1, n):

a[i] = (a[i-1] + a[i] + a[i+1]) / 3.0

Peeking at the generated source, we find that the for statement in the preceding example is translated into:

for (i = 1; i < n; i += 1) {

/* ... */

}

In this case, because we use i in indexing expressions, it is essential that we statically type the indexing variable. Typing n is, however, optional; the following version is just as efficient (but perhaps slightly more difficult to read):

cdef unsigned int i

for i inrange(1, len(a) - 1):

a[i] = (a[i-1] + a[i] + a[i+1]) / 3.0

Performance-wise, the Cython code with the extra typing information is consistently two to three times faster than the untyped equivalent.

The Cython Preprocessor

Cython has a DEF keyword that creates a macro, which is a compile-time symbolic constant akin to #define C preprocessor symbolic macros. These can be useful for giving meaningful names to magic numbers, allowing them to be updated and changed in a single location. They are textually substituted with their value at compile time.

For example:

DEF E = 2.718281828459045

DEF PI = 3.141592653589793

def feynmans_jewel():

"""Returns e**(i*pi) + 1. Should be ~0.0"""

return E ** (1j * PI) + 1.0

DEF constants must resolve at compile time and are restricted to simple types. They can be made up of literal integrals, floating-point numbers, strings, predefined DEF variables, calls to a set of predefined functions, or expressions involving these types and other DEF variables.

The set of predefined compile-time names, listed in Table 3-3, corresponds to what is returned by os.uname.

Table 3-3. Predefined compile-time names

Predefined DEF variable

Meaning

UNAME_SYSNAME

Operating system name

UNAME_RELEASE

Operating system release

UNAME_VERSION

Operating system version

UNAME_MACHINE

Machine hardware name

UNAME_NODENAME

Name on network

The constants, functions, and types available for defining a DEF constant are summarized in Table 3-4.

Table 3-4. DEF constants, functions, and types

Kind

Options

Constants

None, True, False

Built-in functions

abs, chr, cmp, divmod, enumerate, hash, hex, len, map, max, min, oct, ord, pow, range, reduce, repr, round, sum, xrange, zip

Built-in types

bool, complex, dict, float, int, list, long, slice, str, tuple

Remember that the righthand side of a DEF declaration must ultimately evaluate to an int, float, or string object. The cython compiler will yield an error if it does not.

Like the C preprocessor, cython also supports conditional compilation with the all-caps IF-ELIF-ELSE compile-time statement. This can appear anywhere a normal Python statement or declaration can, and it can use any value that is valid in that context. IF statements can be nested. The types they use are not restricted like DEF constants, and they determine truth and falsehood according to Python semantics.

Taking an example from Cython’s documentation, say we want to branch based on the OS we are on:

IF UNAME_SYSNAME == "Windows":

# ...Windows-specific code...

ELIF UNAME_SYSNAME == "Darwin":

# ...Mac-specific code...

ELIF UNAME_SYSNAME == "Linux":

# ...Linux-specific code...

ELSE:

# ...other OS...

The last area to cover is Cython’s support for Python 2 and Python 3.

Bridging the Python 2 and Python 3 Divide

As we learned in Chapter 2, cython generates a C source file that is compiled into an extension module with a specific version of Python. Conveniently, we can write our Cython .pyx file using either Python 2 or Python 3 syntax. The generated C source file is compatible with either Python 2 or Python 3. This means any Cython code can be compiled for either Python 2 or Python 3 runtimes.

NOTE

Python 3 changed both the Python language and the C API in nontrivial ways. Python 2 extension modules can be particularly difficult to port to Python 3, given the language (C) and the lack of automatic conversion tools. Cython’s ability to generate a single extension module that can be compiled, unmodified, for either Python 2 or Python 3 can remove much of the pain and tedium of porting version 2 extension code to version 3.

By default, Cython assumes the source language version (the version of Python in the .pyx or .py file) uses Python 2 syntax and semantics. This can be set explicitly with the -2 and -3 flags at compile time, the latter changing the default behavior to Python 3 syntax and semantics.

For example, in Python 2 print is a statement, whereas in Python 3 it is a function. If we have the following file named einstein.pyx:

import sys

print("If facts don't fit the theory, change the facts.", file=sys.stderr)

it will not compile assuming Python 2 syntax. So, we must pass in the -3 flag to set Python 3 syntax:

$ cython -3 einstein.pyx

NOTE

The -2 and -3 cython compiler flags are necessary only if a language construct has different semantics in the respective language version.

The resulting einstein.c file can be compiled against the Python 2 or Python 3 runtime. With Python 2, the resulting extension module will run as if the print function were instead the Python 2 print statement. This feature allows us to use a specific Python version for the .pyx source, and distribute the extension module source file to anyone, regardless of the version of Python being used to run the extension module.

NOTE

Cython decouples the .pyx language version from the runtime version, nicely managing the Python 2 and Python 3 language divide for us.

Besides decoupling the source and runtime language versions, Cython supports the unicode_literals, print_function, and division imports from __future__ to bring Python 3 semantics into Python 2.

String types were significantly changed in Python 3, and deserve special mention. Cython has several features to manage string types in a version-agnostic way.

str, unicode, bytes, and All That

Python 2 and Python 3 handle strings and string types differently. Both have a string type that represents a sequence of 8-bit characters, and both have a string type that represents a sequence of variable-width characters. They are named differently in each implementation.

Because Cython straddles the Python 2 and Python 3 divide, it handles strings and string types in a way that allows it to generate code that is compatible with Python 2 or Python 3. This means that Cython string types differ from Python 2 strings and Python 3 strings. Several points of note:

§ The bytes type is the same for all versions, and Cython supports bytes as is.

§ Cython’s str type is equivalent to bytes when run with Python 2, and is equivalent to the Unicode str type when run with Python 3.

§ The Cython unicode type is identical to the unicode type when run with Python 2, and is equivalent to the str type when run with Python 3.

§ The Cython basestring type is a base type for all string types on both versions, useful for type checking with isinstance.

§ By default, Cython does not allow implicit conversion between unicode strings and data buffers; it requires setting a compiler directive (see next points) or explicit encoding and decoding to convert between the different types.

§ Cython provides the global c_string_type compiler directive to set the type of an implicit conversion from char * (or from std::string in C++). The directive can take the value bytes, str, or unicode.

§ Cython also provides the global c_string_encoding compiler directive to control the encoding used when implicitly converting char * or std::string to a unicode object. The directive can take the name of any valid Unicode encoding (ascii, utf-8, etc.). It can also take the value default, which is utf-8 in Python 3 and ascii in Python 2. The only allowed encoding to convert a unicode object to char * is default or ascii.

§ Dynamically typed string variables typically just work, and the cython compiler will notify us when an explicit encoding or decoding operation is required.

§ Statically typed Cython str variables can be difficult to use without the c_string_type and c_string_encoding directives, since str in Cython can be equivalent to either bytes in Python 2 or unicode in Python 3. The cython compiler will yield errors or warnings when assigning to a statically typed str object without explicitly encoding the righthand side. It is often better to statically type strings in Cython with the unambiguous bytes and unicode types.

§ The C char * type and the C++ string type are automatically compatible with the bytes type.

More information on working with string types in Cython can be found in Cython’s included documentation.

Summary

This chapter covers the core Cython language features in depth; we will build on these features in future chapters. Because these features are fundamental to Cython, many online examples of their usage can be found via straightforward searches.

CYTHON’S ADOPTION

Given that Cython is in some sense an auxiliary language, it is rare to have a project entirely or even primarily written in it. Nevertheless, it is a full-fledged language with its own syntax and idioms. Searching GitHub for all Cython files, we found approximately 15,000 source files spread over thousands of repositories as of mid-2014.

Cython’s use is so pervasive that a complete catalog of all projects using it would be impossible. But we can survey several foundational projects in the Python ecosystem that use Cython. Some of these projects use it in an auxiliary fashion, to bring in an external random number generation library or speed up a small performance-critical component. Others, like Sage, have Cython at their core.

Some prominent projects that use Cython, and their respective lines of Cython code as of September 2014, are summarized in Table 3-5.

Table 3-5. Cython’s SLOC in foundational Python projects

Project

Lines of Cython

Sage

477,000

NumPy

5,000

SciPy

24,000

Pandas

27,000

scikit-learn

15,000

scikit-image

11,000

MPI4Py

12,000

PETSc4Py

18,000

lxml

22,000

yt

18,000

Given the pervasiveness of projects like NumPy, SciPy, Pandas, scikit-learn, and scikit-image, Cython code is used directly or indirectly by millions of end users, developers, analysts, engineers, and scientists.[7]

If the Pareto principle is to be believed, then roughly 80 percent of the runtime in a library is due to just 20 percent of the code. For a Python project to see major performance improvements, it need only convert a small fraction of its code base from Python to Cython.

It is no accident that the most active Cython projects have a data analysis and scientific computing bent. Cython shines in these domains for several reasons:

§ Cython can wrap existing C, C++, and Fortran libraries efficiently and easily, providing access to existing functionality that is already optimized and debugged.

§ Memory- and CPU-bound Python computations perform much better when translated into a statically typed language.

§ When dealing with large data sets, having control over the precise data types and data structures at a low level can yield efficient storage and improved performance when compared to Python’s built-in data structures.

§ Cython can share homogeneous and contiguous arrays with C, C++, and Fortran libraries and make them easily accessible to Python via NumPy arrays.

But Cython is not a one-trick pony. It can speed up general Python code, including data structure–intensive algorithms. For example, lxml, a widely used high-performance XML parser, uses Cython extensively. It is not under the scientific computing umbrella, but Cython works just as well here.

Cython allows us to choose exactly where on the high level Python–to–low level C spectrum we would like to program.


[5] For an in-depth and quantitative explication of Python’s interpreter and dynamic dispatch performance, see Brandon Rhodes’s PyCon 2014 talk “The Day of the EXE Is Upon Us.”

[6] To follow along with the examples in this chapter, please see https://github.com/cythonbook/examples.

[7] Cython itself has approximately 100,000 monthly PyPI downloads, and together, NumPy, SciPy, Pandas, and lxml have more than 1 million monthly PyPI downloads. NumPy alone has several million direct downloads per year (not accounting for installations via prepackaged distributions).