Organizing Cython Code - Cython (2015)

Cython (2015)

Chapter 6. Organizing Cython Code

Namespaces are one honking great idea—let’s do more of those!

— T. Peters “The Zen of Python”

Python provides modules and packages to help organize a project. This allows us to group functions, classes, and variables into logical units, making a project easier to understand and navigate. Modules and packages also make it easier to reuse code. In Python, we use the import statement to access functions, objects, and classes inside other modules and packages.

Cython also allows us to break up our project into several modules. It fully supports the import statement, which has the same meaning as in Python. This allows us, at runtime, to access Python objects defined in external pure-Python modules or Python-accessible objects defined in other extension modules.

If that were the end of the story, it would not allow two Cython modules to access each other’s cdef or cpdef functions, ctypedefs, or structs, and it would not allow C-level access to other extension types.

To address this, Cython provides three file types that help organize the Cython-specific and C-level parts of a project. Until now we have been working with Cython source files with a .pyx extension, known as implementation files. Here we will see how these files work with a new Cython file type called definition files, which have a .pxd extension. We will also look at the third Cython file type, with a .pxi extension; these are called include files.

In addition to the three file types, Cython has a cimport statement that provides compile-time access to C-level constructs, and it looks for these constructs’ declarations inside definition (.pxd) files.

This chapter covers the details of the cimport statement; the interrelationship between .pyx files, .pxd files, and .pxi files; and how to use them all to structure larger Cython projects. With the cimport statement and the three file types, we have the tools to effectively organize our Cython projects without compromising performance.

Cython Implementation (.pyx) and Declaration (.pxd) Files

We have been working with implementation files all along. As noted earlier, an implementation file typically has the extension .pyx, although we can treat a pure-Python file with the extension .py as an implementation file as well. If we have a small Cython project and no other code needs to access C-level constructs in it, then a single implementation file is sufficient. But as soon as we want to share its C-level constructs, we need to create a definition file.

Suppose we have an implementation file, simulator.pyx, meant to run some sort of physical simulation—we keep the details intentionally vague. Inside simulator.pyx we find the following:

§ A ctypedef

§ A cdef class named State to hold the simulation state

§ Two def functions, setup and output, to initialize the simulation and to report or visualize the results

§ Two cpdef functions, run and step, to drive the simulation and to advance one time step

An outline of our implementation file is:[12]

ctypedef double real_t

cdef class State:

cdef:

unsigned int n_particles

real_t *x

real_t *vx

def __cinit__(...):

# ...

def __dealloc__(...):

# ...

cpdef real_t momentum(self):

# ...

def setup(input_fname):

# ...

cpdef run(State st):

# ...calls step function repeatedly...

cpdef int step(State st, real_t timestep):

# ...advance st one time step...

def output(State st):

# ...

The State extension type has the regular __cinit__ and __dealloc__ methods for allocation and deallocation, a cpdef method called momentum, and perhaps other def methods not listed here.

Because everything is in one file, all functions have access to the C-level attributes of the simulation state, so there is no Python overhead when we are accessing or manipulating it. Because step is a cpdef function, when run calls it, it can access its fast C implementation, bypassing its slower Python wrapper.

As we develop the simulation, the simulator.pyx extension module gains more functionality and becomes harder to maintain. To make it modular, we need to break it up into logical subcomponents.

To do so, first we need to create a simulator.pxd definition file. In it we place the declarations of C-level constructs that we wish to share:

ctypedef double real_t

cdef class State:

cdef:

unsigned int n_particles

real_t *x

real_t *vx

cpdef real_t momentum(self)

cpdef run(State st)

cpdef int step(State st, real_t timestep)

Because definition files are meant for compile-time access, note that we put only C-level declarations in it. No Python-only declarations—like def functions—are allowed, and it is a compile-time error to put them here. These functions are accessible at runtime, so they are just declared and defined inside the implementation file.

Our implementation file, simulator.pyx, also needs to change. The simulator.pxd and simulator.pyx files, because they have the same base name, are treated as one namespace by Cython. We cannot repeat any of the simulator.pxd declarations in the implementation file, as doing so would be a compilation error.

DECLARATIONS AND DEFINITIONS

What makes something a Cython declaration as opposed to a Cython definition? Syntactically, a declaration for a function or method includes everything for the function or method’s signature: the declaration type (cdef or cpdef); the function or method’s name; and everything in the argument list, including the parentheses. It does not include the terminating colon. For a cdef class, the declaration includes the cdef class line (colon included) as well as the extension type’s name, all attribute declarations, and all method declarations.

A Cython definition is everything required for that construct’s implementation. The definition for a function or method repeats the declaration as part of the definition (i.e., the implementation); the definition for a cdef class does not redeclare the attribute declarations.

Our implementation file is now:

cdef class State:

def __cinit__(...):

# ...

def __dealloc__(...):

# ...

cpdef real_t momentum(self):

# ...

def setup(input_fname):

# ...

cpdef run(State st):

# ...calls step function repeatedly...

cpdef int step(State st, real_t timestep):

# ...advance st one time step...

def output(State st):

# ...

The ctypedef and the State type’s attributes have been moved to the definition file, so they are removed from the implementation file. The definitions of all objects, whether C level or Python level, go inside the implementation file. The def functions and methods remain. When compiling simulator.pyx, the cython compiler will automatically detect the simulator.pxd definition file and use its declarations.

What belongs inside a definition file? Essentially, anything that is meant to be publicly accessible to other Cython modules at the C level. This includes:

§ C type declarations—ctypedef, struct, union, or enum (Chapter 7)

§ Declarations for external C or C++ libraries (i.e., cdef extern blocks—Chapters 7 and 8)

§ Declarations for cdef and cpdef module-level functions

§ Declarations for cdef class extension types

§ The cdef attributes of extension types

§ Declarations for cdef and cpdef methods

§ The implementation of C-level inline functions and methods

A definition file cannot contain:

§ Implementations of Python or non-inline C functions or methods

§ Python class definitions (i.e., regular classes)

§ Executable Python code outside of IF or DEF macros

What functionality does our .pxd file provide? Now an external implementation file can access all C-level constructs inside simulator.pyx via the cimport statement.

The cimport Statement

Suppose another version of the simulation—in a separate improved_simulator.pyx implementation file—wants to work with our simulator, using the same setup and step functions but a different run function, and needs to subclass our State extension type:

from simulator cimport State, step, real_t

from simulator import setup as sim_setup

cdef class NewState(State):

cdef:

# ...extra attributes...

def __cinit__(self, ...):

# ...

def __dealloc__(self):

# ...

def setup(fname):

# ...call sim_setup and tweak things slightly...

cpdef run(State st):

# ...improved run that uses simulator.step...

Inside improved_simulator.pyx, the first line uses the cimport statement to access the State extension type, the step cpdef function, and the real_t ctypedef. This access is at the C level and occurs at compile time. The cimport statement looks for the simulator.pxd definition file,and only the declarations there are cimportable. This is in contrast to the second line in the file, which uses the import statement to access the setup def function from the simulator extension module. The import statement works at the Python level and the import occurs at runtime.

The cimport statement has the same syntax as the import statement. We can cimport the .pxd filename and use it as a module-like namespace:

cimport simulator

# ...

cdef simulator.State st = simulator.State(params)

cdef simulator.real_t dt = 0.01

simulator.step(st, dt)

We can provide an alias when cimporting the definition file:

cimport simulator as sim

# ...

cdef sim.State st = sim.State(params)

cdef sim.real_t dt = 0.01

sim.step(st, dt)

We can also provide an alias to specific cimported declarations with the as clause:

from simulator cimport State as sim_state, step as sim_step

All of these forms of cimport should be familiar from Python’s import statement.

It is a compile-time error to cimport a Python-level object like the setup function. Conversely, it is a compile-time error to import a C-only declaration like real_t. We are allowed to import or cimport the State extension type or the step cpdef function, although cimport is recommended. If we were to import rather than cimport extension types or cpdef functions, we would have Python-only access. This blocks access to any private attributes or cdef methods, and cpdef methods and functions use the slower Python wrapper.

A definition file can contain cdef extern blocks. It is useful to group such declarations inside their own .pxd files for use elsewhere. Doing so provides a useful namespace to help disambiguate where a function is declared.

For example, the Mersenne Twister random-number generator (RNG) header file has a few functions that we can declare inside a _mersenne_twister.pxd definition file:

cdef extern from "mt19937ar.h":

# initializes mt[N] with a seed

void init_genrand(unsigned long s)

# generates a random number on [0,0xffffffff]-interval

unsigned long genrand_int32()

# generates a random number on [0,0x7fffffff]-interval

long genrand_int31()

# generates a random number on [0,1]-real-interval

double genrand_real1()

# generates a random number on [0,1)-real-interval

double genrand_real2()

# generates a random number on (0,1)-real-interval

double genrand_real3()

# generates a random number on [0,1) with 53-bit resolution

double genrand_res53()

Now any implementation file can simply cimport the necessary function:

from _mersenne_twister cimport init_genrand, genrand_real3

or, using an alias:

cimport _mersenne_twister as mt

mt.init_genrand(42)

for i inrange(len(x)):

x[i] = mt.genrand_real1()

Several definition files come packaged with Cython itself.

Predefined Definition Files

Conveniently, Cython comes with several predefined definition files for often-used C, C++, and Python header files. These are grouped into definition file packages and are located in the Includes directory underneath the main Cython source directory. There is a package for the C standard library, named libc, that contains .pxd files for the stdlib, stdio, math, string, and stdint header files, among others. There is also a libcpp declaration package with .pxd files for common C++ standard template library (STL) containers such as string, vector, list, map, pair, and set.Python-side, the cpython declaration package has .pxd files for the C header files found in the CPython source distribution, providing easy access to Python/C API functions from Cython. The last declaration package we will mention here is numpy, which provides access to the NumPy/C API. It is covered in Chapter 10.

Common patterns using cimport and their effects are described next.

Using cimport with a module in a package

from libc cimport math

math.sin(3.14)

The from ... cimport ... pattern used here imports the module-like math namespace from the libc package, and allows dotted access to C functions declared in the math.h C standard library.

Using cimport with an object from a dotted module name

from libc.math cimport sin

sin(3.14)

This form allows cimporting the C sin function from libc.math in a Python-like way, but it is important to remember that the call to sin will call the fast C version.

Multiple named cimports

from libc.stdlib cimport rand, srand, qsort, malloc, free

cdef int *a = <int*>malloc(10 * sizeof(int))

This imports multiple C functions from C’s stdlib.h standard library header.

Using cimport with an alias

from libc.string cimport memcpy as c_memcpy

In this form, we can use c_memcpy as an alias for memcpy.

Using cimport with C++ STL template classes

from libcpp.vector cimport vector

cdef vector[int] *vi = new vector[int](10)

Cython supports cimporting C++ classes from the C++ STL.

If we import and cimport different functions with the same name, Cython will issue a compile-time error. For example, the following is not valid:

from libc.math cimport sin

from math import sin

It is simple to fix with an alias, however:

from libc.math cimport sin as csin

from math import sin as pysin

It is possible to import and cimport namespace-like objects (modules or Cython packages) that have the same name, although this is not recommended, for sanity’s sake. So, Cython allows the following:

# compile-time access to functions from math.h

from libc cimport math

# runtime access to the math module

import math

def call_sin(x):

# which `sin()` does this call?

return math.sin(x)

In the preceding example, it is not immediately obvious that call_sin will call the sin function from the C standard library, and not the sin function from Python’s math built-in module. It is better to rename one of the imports to make explicit which math namespace is intended:

from libc cimport math as cmath

import math as pymath

def call_csin(x):

return cmath.sin(x)

def call_pysin(x):

return pymath.sin(x)

Definition files have some similarities to C (and C++) header files:

§ They both declare C-level constructs for use by external code.

§ They both allow us to break up what would be one large file into several components.

§ They both declare the public C-level interface for an implementation.

C and C++ access header files via the #include preprocessor command, which essentially does a dumb source-level inclusion of the named header file. Cython’s cimport statement is more intelligent and less error prone: we can think of it as a compile-time import statement that works with namespaces.

Cython’s predecessor, Pyrex, did not have the cimport statement, and instead had an include statement for source-level inclusion of an external include file. Cython also supports the include statement and include files, which are used in several Cython projects.

Include Files and the include Statement

Suppose we have an extension type that we want available on all major platforms, but it must be implemented differently on different platforms. This scenario may arise due to, for example, filesystem incompatibilities, or wrapping different APIs in a consistent way. Our goal is to abstract away these differences and to provide a consistent interface in a transparent way. Include files and the include statement provide one way to accomplish our nice platform-independent design goals.

We place three different implementations of the extension type in three .pxi files: linux.pxi, darwin.pxi, and windows.pxi. One of the three will be selected and used at compile time. To pull everything together, inside interface.pyx we have the following code, using the IF compile-time statement:

IF UNAME_SYSNAME == "Linux":

include "linux.pxi"

ELIF UNAME_SYSNAME == "Darwin":

include "darwin.pxi"

ELIF UNAME_SYSNAME == "Windows":

include "windows.pxi"

This example does a source-level inclusion of one of the .pxi files.

CAUTION

Using include twice with the same source file may lead to compilation errors due to duplicated definitions or implementations, so take care to use include correctly.

Even though the include statement is indented inside the IF block, the inserted code will not retain this extra indentation level. The include statement can appear in any scope and the indentation level will be adjusted accordingly.

Some older Cython projects use include in place of cimport. For new code, it is recommended to use cimport with definition files rather than include with include files, except when source-level inclusion is what is desired.

With definition files, include files, and implementation files at our command, we can adapt Cython as needed to any Python or C code base.

Organizing and Compiling Cython Modules Inside Python Packages

A great feature of Cython is that it allows us to incrementally convert Python code to Cython code as performance and profiling dictate. This approach allows the external interface to remain unchanged while the overall performance significantly improves.

Let’s take a different approach to our simulation example. Suppose we start with a Python package pysimulator with the following structure:

pysimulator

├── __init__.py

├── main.py

├── core

│ ├── __init__.py

│ ├── core.py

│ └── sim_state.py

├── plugins

│ ├── __init__.py

│ ├── plugin0.py

│ └── plugin1.py

└── utils

├── __init__.py

├── config.py

└── output.py

The focus for this example is not the internal details of the pysimulator modules; it’s how Cython modules can access compile-time declarations and work easily within the framework of a Python project.

Suppose we have profiled the simulator and determined that the core.py, sim_state.py, and plugin0.py modules need to be converted into Cython extension modules for performance. All other modules can remain pure Python for flexibility.

The sim_state.py module contains the State class that we will convert into an extension type. The core.py module contains two functions, run and step, that we will convert to cpdef functions. The plugin0.py module contains a run function that we will also convert to a cpdef function.

The first step is to convert the .py modules into implementation files and extract their public Cython declarations into definition files. Because components are spread out in different packages and subpackages, we must remember to use the proper qualified names for importing.

The sim_state.pxd file contains just the declarations for a ctypedef and the cdef class State:

ctypedef double real_t

cdef class State:

cdef:

unsigned int n_particles

real_t *x

real_t *vx

cpdef real_t momentum(self)

All cpdef functions will take a State instance, and they need C-level access. So, all modules will have to cimport the State declaration from the appropriate definition file.

The core.pxd file declares the run and step cpdef functions:

from simulator.core.sim_state cimport State, real_t

cpdef int run(State, list plugins=None)

cpdef step(State st, real_t dt)

The cimport is absolute, using the fully qualified name to access the sim_state definition file for clarity.

Lastly, the plugin0.pxd file declares its own run cpdef function that takes a State instance:

from simulator.core.sim_state cimport State

cpdef run(State st)

The main.py file—still pure Python, like everything inside the utils subpackage—pulls everything together:

from simulator.utils.config import setup_params

from simulator.utils.output import output_state

from simulator.core.sim_state import State

from simulator.core.core import run

from simulator.plugins import plugin0

def main(fname):

params = setup_params(fname)

state = State(params)

output_state(state)

run(state, plugins=[plugin0.run])

output_state(state)

The main.py module remains unchanged after our conversion to Cython, as do any other pure-Python modules in the project. Cython allows us to surgically replace individual components with extension modules, and the rest of a project remains as is.

To run this simulation, we first have to compile the Cython source into extension modules. We can use pyximport for on-the-fly compilation during development and testing:

In [1]: import pyximport; pyximport.install()

Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c67650>)

In [2]: from simulator.main import main

The import statement here imported all extension modules, and pyximport compiled them for us automatically. We now call main, passing in a parameter file:

In [3]: main("params.txt")

simulator.utils.config.setup_params('dummy.params')

simulator.utils.output.output(State(n_particles=100000))

state.momentum() == 0.0

running simulator.core.run(State(n_particles=100000))

simulator.plugins.plugin0.run(State(n_particles=100000))

simulator.utils.output.output(State(n_particles=100000))

state.momentum() == 300000.0

The output is simply indicating that everything is running as it should. We see output for the simulation setup, for the initial state, and for running the core.run function, which in turn calls the plugin’s run function and the step function. Lastly, the final simulation state is output.

Using pyximport here to compile our simulator on the fly is fine for quick development. To create a distributable compiled package, we will want to use a distutils script or another build system to manage the compilation and packaging for us.

For a package like simulator, the cythonize function from the Cython.Build package can handle all the details for us. A minimal setup.py script for simulator is:

from distutils.core import setup

from Cython.Build import cythonize

setup(name="simulator",

packages=["simulator", "simulator.core",

"simulator.utils", "simulator.plugins"],

ext_modules=cythonize("**/*.pyx"),

)

We call cythonize with a glob pattern to recursively search all directories for .pyx implementation files and compile them as needed. Using cythonize with distutils in this way is flexible and powerful—it will automatically detect when a .pyx file has changed and recompile as needed. Further, it will detect interdependencies between implementation and definition files and recompile all dependent implementation files.

Summary

Cython’s three file types, in conjunction with the cimport and include statements, allow us to organize Cython code into separate modules and packages, without sacrificing performance. This allows Cython to expand beyond speeding up isolated extension modules, and allows it to scale to full-fledged projects. We can use the techniques in this chapter to speed up select Python modules after profiling indicates the need, or we can use them to design and organize an entire project that uses Cython as the primary language.


[12] To follow along with the examples in this chapter, please see https://github.com/cythonbook/examples.