Building an Object Mapper - Essential SQLAlchemy, 2nd Edition (2010)

Essential SQLAlchemy, 2nd Edition (2010)

Chapter 6. Building an Object Mapper

Atop the SQL expression language, SQLAlchemy provides an object-relational mapper (ORM). The purpose of an ORM is to provide a convenient way to store your application data objects in a relational database. Generally, an ORM will provide a way to define the method of storing your object in the database. This chapter focuses on the SQLAlchemy methods that do this.

Introduction to ORMs

ORMs provide methods of updating the database by using your application objects. For instance, to update a column in a mapped table in SQLAlchemy, you merely have to update the object, and SQLAlchemy will take care of making sure that the change is reflected in the database. ORMs also allow you to construct application objects based on database queries. Chapter 7 will focus on how to use SQLAlchemy’s ORM to update and query objects in the database.

Design Concepts in the ORM

There are two major patterns used in the ORM you should become familiar with in order to understand how to best use the ORM. These are the data mapper pattern and the unit of work pattern.

The data mapper pattern

In the data mapper pattern (shown in Figure 6-1), database tables, views, and other “selectable” objects are mapped onto “plain old Python objects” (POPOs) by “mapper” objects. This is different from the “active record” pattern (shown in Figure 6-2), where the objects themselves are responsible for mapping themselves to database views. The data mapper pattern can, of course, be used to emulate the active record pattern by merging the mapper with the application objects.

Data mapper pattern

Figure 6-1. Data mapper pattern

Active record pattern

Figure 6-2. Active record pattern

One benefit of using the data mapper pattern as implemented in SQLAlchemy is that it allows the database design to be decoupled from the object hierarchy. In SQLAlchemy, this decoupling can be nearly complete: you can define your classes in one module and your tables in another with no references from one to the other. The mapping can then be performed by a third module, which imports the other two modules and instantiates the Mapper objects, which do the work of mapping the selectables to your objects.

The unit of work pattern

The major second pattern used in the SQLAlchemy ORM is the unit of work pattern. In this pattern, when you make a change to an object, the database is not updated immediately. Instead, SQLAlchemy tracks changes to your objects in a session object, and then flushes all your changes at once in a single “unit of work.” This has the advantage of generally improving performance by reducing the number of round-trips to the database.

The alternative to the unit of work pattern, of course, is to update the database as soon as a mapped object property changes. This can lead to a very “chatty” application, but it does have the advantage of keeping your objects in sync with the database, which can be handy if you wish to execute queries before flushing the objects you’ve modified back out to the database.

To alleviate this concern, SQLAlchemy actually provides an “autoflush” feature on the session object that will take care of flushing the session before any queries are performed on it. As long as you use an autoflushing session and execute all queries through the session, you generally do not need to worry about inconsistencies between your objects in memory and the database on disk.

Warning

Of course, if you use the SQL expression layer of SQLAlchemy, you can get your in-memory objects out-of-sync with the database, so some care needs to be taken when mixing ORM-level semantics with SQL-level semantics in the same transaction.

Declaring Object Mappers

In order to use the SQLAlchemy ORM, we need three things: a database schema defined on a MetaData object, an object model (no special preparation of the object model is required for use by SQLAlchemy), and a mapper configuration. In this section, we will use the following schema, designed to maintain information about a retail product catalog:

level_table = Table(1

'level', metadata,

Column('id', Integer, primary_key=True),

Column('parent_id', None, ForeignKey('level.id')),

Column('name', String(20)))

category_table = Table(2

'category', metadata,

Column('id', Integer, primary_key=True),

Column('level_id', None, ForeignKey('level.id')),

Column('parent_id', None, ForeignKey('category.id')),

Column('name', String(20)))

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

product_summary_table = Table(3

'product_summary', metadata,

Column('sku', None, ForeignKey('product.sku'), primary_key=True),

Column('name', Unicode(255)),

Column('description', Unicode))

product_category_table = Table(4

'product_category', metadata,

Column('product_id', None, ForeignKey('product.sku'), primary_key=True),

Column('category_id', None, ForeignKey('category.id'), primary_key=True))

region_table = Table(

'region', metadata,

Column('id', Integer, primary_key=True),

Column('name', Unicode(255)))

store_table = Table(

'store', metadata,

Column('id', Integer, primary_key=True),

Column('region_id', None, ForeignKey('region.id')),

Column('name', Unicode(255)))

product_price_table = Table(5

'product_price', metadata,

Column('sku', None, ForeignKey('product.sku'), primary_key=True),

Column('store_id', None, ForeignKey('store.id'), primary_key=True),

Column('price', Numeric, default=0))

1

This is a “level” used in categorizing a product in a hierarchy. In our example, we will use the levels “Gender”, “Department”, “Class”, and “Subclass”.

2

These are the individual categories within a level. In our example, for instance, within the “Gender” level, we have “Men”, “Women”, “Children”, and “Unisex.”

3

This table contains auxiliary information about products that may or may not be present for each product.

4

This table links the product table with the category table. A product should generally have one category per level.

5

This table lists the retail price for each product at each store location.

The application object model in the following listing is extremely basic. In a real application, the classes would probably have additional methods defined for performing domain-specific operations:

class Level(object):

def __init__(self, name, parent=None):

self.name = name

self.parent = parent

def __repr__(self):

return '<Level %s>' % self.name

class Category(object):

def __init__(self, name, level, parent=None):

self.name = name

self.level = level

self.parent = parent

def __repr__(self):

return '<Category %s.%s>' % (self.level.name, self.name)

class Product(object):

def __init__(self, sku, msrp, summary=None):

self.sku = sku

self.msrp = msrp

self.summary = summary

self.categories = []

self.prices = []

def __repr__(self):

return '<Product %s>' % self.sku

class ProductSummary(object):

def __init__(self, name, description):

self.name = name

self.description = description

def __repr__(self):

return '<ProductSummary %s>' % self.name

class Region(object):

def __init__(self, name):

self.name = name

def __repr__(self):

return '<Region %s>' % self.name

class Store(object):

def __init__(self, name):

self.name = name

def __repr__(self):

return '<Store %s>' % self.name

class Price(object):

def __init__(self, product, store, price):

self.product = product

self.store = store

self.price = price

def __repr__(self):

return '<Price %s at %s for $%.2f>' % (

self.product.sku, self.store.name, self.price)

Basic Object Mapping

Now that we have the basic schema and object model in place, we can start exploring how to map objects. The region_table is one of the simplest tables, so we will start there. The following example demonstrates mapping the region_table to the Region class, and also illustrates the alterations that SQLAlchemy performs on the Region class during mapping:

>>> print dir(Region)

['__class__', '__delattr__', '__dict__', '__doc__',

... '__getattribute__', '__hash__', '__init__',

... '__module__','__new__', '__reduce__', '__reduce_ex__',

... '__repr__', '__setattr__', '__str__', '__weakref__']

>>> mapper(Region, region_table)

<sqlalchemy.orm.mapper.Mapper object at 0x2af4d7004310>

>>> print dir(Region)

['__class__', '__delattr__', '__dict__', '__doc__',

... '__getattribute__', '__hash__', '__init__', '__module__',

... '__new__', '__reduce__', '__reduce_ex__', '__repr__',

... '__setattr__', '__str__', '__weakref__',

... '_sa_attribute_manager', 'c', 'id', 'name']

>>> print Region.id

<sqlalchemy.orm.mapper._CompileOnAttr object at 0x2af4d70046d0>

>>> print Region.name

<sqlalchemy.orm.mapper._CompileOnAttr object at 0x2af4d7004790>

>>> print Region.c.id

region.id

>>> print Region.c.name

region.name

Note

It is possible to make SQLAlchemy “forget” all the mappings that have been declared by invoking the clear_mappers() function. This feature can be useful when prototyping various mappers within the interactive shell, as it will let you remap classes to try out different strategies.

As shown previously, the mapper() function has added a few attributes to our class. The attributes we’re interested in are c, id, and name. This c attribute is a proxy for the store_table’s c attribute, and allows access to all the columns of the store_table.

The id and name attributes are actually class properties that track access to these attributes to synchronize them with the database later. These are mapped because the default behavior of the SQLAlchemy mapper is to provide a property for each column in the selectable mapped, and thestore_table has two columns, id and name.

Note that we can still use the object just as if it had not been mapped (unless, of course, we were relying on existing properties id and name, or an existing attribute c):

>>> r0 = Region(name="Northeast")

>>> r1 = Region(name="Southwest")

>>> print r0

<Region Northeast>

>>> print r1

<Region Southwest>

The difference now is that these objects can be loaded or saved to the database using a session object (covered in more detail in the next chapter):

>>> Session = sessionmaker()

>>> session = Session()

>>>

>>> session.save(r0)

>>> session.save(r1)

>>> metadata.bind.echo = True

>>> print r0.id

None

>>> print r1.id

None

>>> session.flush()

2007-10-13 12:47:07,621 INFO sqlalchemy.engine.base.Engine.0x..90

... BEGIN

2007-10-13 12:47:07,623 INFO sqlalchemy.engine.base.Engine.0x..90

... INSERT INTO region (name) VALUES (?)

2007-10-13 12:47:07,623 INFO sqlalchemy.engine.base.Engine.0x..90

... ['Northeast']

2007-10-13 12:47:07,625 INFO sqlalchemy.engine.base.Engine.0x..90

... INSERT INTO region (name) VALUES (?)

2007-10-13 12:47:07,625 INFO sqlalchemy.engine.base.Engine.0x..90

... ['Southwest']

>>> print r0.id

1

>>> print r1.id

2

Note how SQLAlchemy automatically inserted the store names we specified into the database, and then populated the mapped id attribute based on the synthetic key value generated by the database. We can also update mapped properties once an object has been saved to the database:

>>> r0.name = 'Northwest'

>>> session.flush()

2007-10-13 12:47:53,879 INFO sqlalchemy.engine.base.Engine.0x..90

... UPDATE region SET name=? WHERE region.id = ?

2007-10-13 12:47:53,879 INFO sqlalchemy.engine.base.Engine.0x..90

... ['Northwest', 1]

Customizing Property Mapping

The basic way mapping that SQLAlchemy performs is useful, but what if we have a property or function that conflicts with the way SQLAlchemy wants to map columns? Or what if we just want to customize the columns mapped by SQLAlchemy? Fortunately, SQLAlchemy provides a rich set of ways to customize the way properties are mapped onto your classes.

Using include_properties and exclude_properties

The simplest case is where we want to restrict the properties mapped. In this case, we can use the include_properties to only map those columns specified:

>>> print dir(Region)

['__class__', '__delattr__', '__dict__', '__doc__',

... '__getattribute__', '__hash__', '__init__', '__module__',

... '__new__', '__reduce__', '__reduce_ex__', '__repr__',

... '__setattr__', '__str__', '__weakref__']

>>> mapper(Region, region_table, include_properties=['id'])

<sqlalchemy.orm.mapper.Mapper object at 0x2ba1a7ca3310>

>>> print dir(Region)

['__class__', '__delattr__', '__dict__', '__doc__',

... '__getattribute__', '__hash__', '__init__', '__module__',

... '__new__', '__reduce__', '__reduce_ex__', '__repr__',

... '__setattr__', '__str__', '__weakref__',

... '_sa_attribute_manager', 'c', 'id']

We can also use exclude_properties to specify columns to be excluded:

>>> mapper(Region, region_table, exclude_properties=['id'])

<sqlalchemy.orm.mapper.Mapper object at 0x2ba1a7ca34d0>

Customizing the name of the mapped column

If we want to map all the columns to properties with a particular prefix, we can use the column_prefix keyword argument:

>>> mapper(Region, region_table, column_prefix='_')

<sqlalchemy.orm.mapper.Mapper object at 0x2aecf62d5310>

>>> print dir(Region)

['__class__', '__delattr__', '__dict__', '__doc__',

... '__getattribute__', '__hash__', '__init__', '__module__',

... '__new__', '__reduce__', '__reduce_ex__', '__repr__',

... '__setattr__', '__str__', '__weakref__', '_id', '_name',

... '_sa_attribute_manager', 'c']

We can also customize the mapped property names on a column-by-column basis using the properties parameter:

>>> mapper(Region, region_table, properties=dict(

... region_name=region_table.c.name,

... region_id=region_table.c.id))

<sqlalchemy.orm.mapper.Mapper object at 0x2b37165b8310>

>>> print dir(Region)

['__class__', '__delattr__', '__dict__', '__doc__',

... '__getattribute__', '__hash__', '__init__', '__module__',

... '__new__', '__reduce__', '__reduce_ex__', '__repr__',

... '__setattr__', '__str__', '__weakref__',

... '_sa_attribute_manager', 'c', 'region_id', 'region_name']

Using synonyms

SQLAlchemy provides certain functions and methods (covered in the next chapter) that expect mapped property names as keyword arguments. This can be cumbersome to use if we have mapped the column names to other property names (perhaps to allow for user-defined getters and setters). To alleviate the burden of using the actual property names, SQLAlchemy provides the synonym() function to allow a name to be used “as if” it were a real property. Suppose, for instance, that we wish to verify that all store names end in “Store”. We might use the following approach:

>>> class Region(object):

... def __init__(self, name):

... self.name = name

... def __repr__(self):

... return '<Region %s>' % self.name

... def _get_name(self):

... return self._name

... def _set_name(self, value):

... assert value.endswith('Region'), \

... 'Region names must end in "Region"'

... self._name = value

... name=property(_get_name, _set_name)

...

>>> mapper(Region, region_table, column_prefix='_', properties=dict(

... name=synonym('_name')))1

<sqlalchemy.orm.mapper.Mapper object at 0x2b2f953ff4d0>

>>>

>>> s0 = Region('Southeast')2

Traceback (most recent call last):

...

AssertionError: Region names must end in "Region"

>>> s0 = Region('Southeast Region')

>>> session.save(s0)

>>> session.flush()

>>> session.clear()

>>>

>>> q = session.query(Region)

>>> print q.filter_by(name='Southeast Region').first()

3

<Region Southeast Region>

>>> print s0.name

Southeast Region

1

This defines the synonym “name” to be usable in all SQLAlchemy functions where “_name” is usable.

2

Here, we tried to create an object with an invalid name and were rejected.

3

Using the synonym, we can still select stores by name without abusing the private attribute.

If you wish to create a property that is a true proxy for the original mapped property (so you don’t have to write the getter and setter), you can use synonym(name, proxy=True) to define it.

Mapping subqueries

In some cases, we may wish to create a property that is a combination of a few columns or the result of a subquery. For instance, suppose we wanted to map the product_table, providing a property that will yield the average price of the product across all stores. To do this, we use thecolumn_property() function:

>>> average_price = select(

... [func.avg(product_price_table.c.price)],

... product_price_table.c.sku==product_table.c.sku)\

... .as_scalar() \

... .label('average_price')

>>> print average_price

(SELECT avg(product_price.price)

FROM product_price, product

WHERE product_price.sku = product.sku) AS average_price

>>> mapper(Product, product_table, properties=dict(

... average_price=column_property(average_price)))

<sqlalchemy.orm.mapper.Mapper object at 0x2b6b9d5336d0>

>>> metadata.bind.echo = True

>>> p = session.query(Product).get('123')

2007-10-06 18:47:27,289 INFO sqlalchemy.engine.base.Engine.0x..90.

... SELECT (SELECT avg(product_price.price)

FROM product_price

WHERE product_price.sku = product.sku) AS average_price,

... product.sku AS product_sku, product.msrp AS product_msrp

FROM product

WHERE product.sku = ? ORDER BY product.oid

LIMIT 1 OFFSET 0

2007-10-06 18:47:27,290 INFO sqlalchemy.engine.base.Engine.0x..90.

... ['123']

>>> print p.sku, p.msrp, p.average_price

123 12.34 12.34

Mapping composite values

The SQLAlchemy ORM also provides for creating properties from a group of columns. To use this feature, we must create a custom class to store the composite value. That class must have a constructor that accepts column values as positional arguments (to create the object from the database result) and a method __composite_values__() that returns a list or tuple representing the state of the object in the order of the columns that map to it. The custom class should also support equality comparisons via the __eq__() and __ne__() methods.

For instance, suppose we have a mapping database that stores route segments in the following table:

segment_table = Table(

'segment', metadata,

Column('id', Integer, primary_key=True),

Column('lat0', Float),

Column('long0', Float),

Column('lat1', Float),

Column('long1', Float))

In this case, our application expects RouteSegments to have a beginning and an ending MapPoint object, defined as follows:

class RouteSegment(object):

def __init__(self, begin, end):

self.begin = begin

self.end = end

def __repr__(self):

return '<Route %s to %s>' % (self.begin, self.end)

class MapPoint(object):

def __init__(self, lat, long):

self.coords = lat, long

def __composite_values__(self):

return self.coords

def __eq__(self, other):

return self.coords == other.coords

def __ne__(self, other):

return self.coords != other.coords

def __repr__(self):

return '(%s lat, %s long)' % self.coords

We can then map the class and use it with the composite() function:

>>> mapper(RouteSegment, segment_table, properties=dict(

... begin=composite(MapPoint,

... segment_table.c.lat0,

... segment_table.c.long0),

... end=composite(MapPoint,

... segment_table.c.lat1, segment_table.c.long1)))

<sqlalchemy.orm.mapper.Mapper object at 0x2b13e58a5450>

>>>

>>> work=MapPoint(33.775562,-84.29478)

>>> library=MapPoint(34.004313,-84.452062)

>>> park=MapPoint(33.776868,-84.389785)

>>> routes = [

... RouteSegment(work, library),

... RouteSegment(work, park),

... RouteSegment(library, work),

... RouteSegment(library, park),

... RouteSegment(park, library),

... RouteSegment(park, work)]

>>> for rs in routes:

... session.save(rs)

...

>>> session.flush()

>>>

>>> q = session.query(RouteSegment)

>>> print RouteSegment.begin==work

segment.lat0 = ? AND segment.long0 = ?

>>> q = q.filter(RouteSegment.begin==work)

>>> for rs in q:

... print rs

...

<Route (33.775562 lat, -84.29478 long) to (34.004313 lat, -84.452062

... long)>

<Route (33.775562 lat, -84.29478 long) to (33.776868 lat, -84.389785

... long)>

By default, SQLAlchemy generates an equality comparator that generates SQL to compare all mapped columns for use in methods such as filter(), shown previously. If you want to provide custom comparison operators, you can do so by implementing a subclass of PropComparator:

class MapPointComparator(PropComparator):

def __lt__(self, other):

return and_(*[a<b for a, b in

zip(self.prop.columns,

other.__composite_values__())])

mapper(RouteSegment, segment_table, properties=dict(

begin=composite(MapPoint,

segment_table.c.lat0, segment_table.c.long0,

comparator=MapPointComparator),

end=composite(MapPoint,

segment_table.c.lat1, segment_table.c.long1,

comparator=MapPointComparator)))

Eager versus deferred loading

In some cases, it may not be efficient to retrieve all properties of an object at object creation time. For instance, if the table being mapped has a BLOB column that is needed only infrequently in the mapped object, it may be more efficient to retrieve that column only when the property is accessed. In SQLAlchemy, this is referred to as “deferred column loading,” and is accomplished by mapping a property to the deferred() function.

In our product catalog schema, for instance, suppose we have an image stored for each product in a BLOB column:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('image', BLOB))

In this case, we can map the image column as a deferred column:

mapper(Product, product_table, properties=dict(

image=deferred(product_table.c.image)))

Now, if we select a product, we can observe that SQLAlchemy delays loading the deferred column until its mapped property is actually accessed:

>>> metadata.bind.echo=True

>>> q = session.query(Product)

>>> prod=q.get_by(sku='123')

2007-10-08 11:27:45,582 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT product.sku AS product_sku, product.msrp AS product_msrp

FROM product

WHERE product.sku = ? ORDER BY product.oid

LIMIT 1 OFFSET 0

2007-10-08 11:27:45,583 INFO sqlalchemy.engine.base.Engine.0x..d0

... ['123']

>>> print prod.image

2007-10-08 11:27:45,589 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT product.image AS product_image

FROM product

WHERE product.sku = ?

2007-10-08 11:27:45,589 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'123']

abcdef

We can also mark multiple deferred columns to be members of a “group” of deferred columns, so that they are all loaded when any column in the group is accessed:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('image1', Binary),

Column('image2', Binary),

Column('image3', Binary))

mapper(Product, product_table, properties=dict(

image1=deferred(product_table.c.image, group='images'),

image2=deferred(product_table.c.image, group='images'),

image3=deferred(product_table.c.image, group='images')))

If the default deferred behavior is not desired, columns can be individually deferred or undeferred at query creation time by using the defer() and undefer() functions along with the options() method of the Query object (described more completely in the next chapter).

Mapping Arbitrary Selectables

It is worth noting that, although we have been mapping tables in our examples, it is possible to map any “selectable” object in SQLAlchemy. This includes tables, and the result of the select(), *join(), union*(), intersect*(), and except*() functions or methods. For instance, we may wish to map the result of joining the product table with the product summary table to a single object:

q = product_table.join(

product_summary_table,

product_table.c.sku==product_summary_table.c.sku).alias('full_product')

class FullProduct(object): pass

mapper(FullProduct, q)

Other mapper⁠(⁠ ⁠) Parameters

The mapper() function takes a number of keyword arguments, listed next.

entity_name=None

A string to associate with a nonprimary mapper (see the non_primary parameter description for more detail) that allows it to be distinguished from the primary mapper in session methods such as save() and query().

always_refresh=False

Whenever a query returns an object corresponding to an in-memory object, overwrite the in-memory object’s fields with the fields from the query if this flag is True. This will overwrite any changes to the in-memory object, and so using the populate_existing() method on Queryobjects is preferred over this parameter.

allow_column_override=False

Allow a relation() property to be defined with the same name as a mapped column (the column will not be mapped in this case). Otherwise, the name conflict will generate an exception.

allow_null_pks=False

When using a composite primary key in the mapped selectable, this flag allows some (but not all) of the primary key columns to be NULL. Otherwise, any NULL value in any primary key column will cause the row to be skipped when constructing objects.

batch=True

Allow the save operations of multiple object to be batched together for efficiency (for instance, saving all the sku columns of multiple Products). If False, each object will be completely created in the database before moving on to the next object.

column_prefix=None

A string that will be used to prefix all automatically mapped column property names. This is ignored on all explicitly named properties.

concrete=False

If True, indicates the use of concrete table inheritance (covered in detail in Chapter 8).

extension=None

Either a MapperExtension or a list of MapperExtensions to be applied on all operations on this mapper (covered in detail later in this chapter in the section Extending Mappers”).

inherits=None

Another mapper that will serve as the “parent” when using mapper inheritance (covered in detail in Chapter 8).

inherit_condition=None

The method of joining tables in joined table inheritance (covered in detail in Chapter 8).

inherit_foreign_keys=None

The “foreign” side of the inherit_condition parameter.

order_by=None

The default ordering for entities when selecting from this mapper.

non_primary=False

When True, specifies that this is a nonprimary mapper. For any mapped class, only one primary mapper can be registered. When you create an instance of the class and save it to the database, the primary mapper alone determines how that object will be saved. Nonprimary mappers are useful for loading objects through a different way than the primary mapper (e.g., from a different table, with a different set of columns, etc). Any number of non_primary mappers may be defined for a class.

polymorphic_on=None

Column that identifies which class/mapper should be used when using inheritance for a particular row (covered in detail in Chapter 8).

polymorphic_identity=None

Value stored in the polymorphic_on parameter to identify this mapper in an inheritance relationship (covered in detail in Chapter 8).

polymorphic_fetch='union'

The method used to fetch subclasses using joined-table inheritance, either ‘union’, ‘select’, or ‘deferred’ (covered in detail in Chapter 8).

properties=None

Dictionary of properties to be mapped onto the class (in addition to automatically mapped properties).

include_properties=None

List of properties to map onto the class (columns in the mapped table but not referenced in this list will not be mapped automatically).

exclude_properties=None

List of properties not to map onto the class (columns in the mapped table will be mapped automatically unless they are in this list).

primary_key=None

List of columns that define the primary key for the selectable being mapped. (By default, this is the primary key of the table being mapped, but this behavior can be overridden with this parameter.)

select_table=None

The selectable used to select instances of the mapped class. Generally used with polymorphic loading (covered in detail in Chapter 8).

version_id_col=None

An integer column on the mapped selectable that is used to keep a version ID of the data in that row. Each save will increment this version number. If the version number is changed between the time when the object is selected and when it is flushed, then aConcurrentModificationError is thrown.

Declaring Relationships Between Mappers

Although the features that SQLAlchemy provides for mapping tables and other selectables to classes are powerful in their own right, SQLAlchemy also allows you to model relationships between tables as simple Python lists and properties using the relation() function in the propertiesparameter of the mapper() function.

Basic Relationships

The three main relationships modeled by SQLAlchemy are 1:N, M:N, and 1:1 (which is actually a special case of 1:N). In a 1:N relationship, one table (the “N” side) generally has a foreign key to another table (the “1” side). In M:N, two tables (the “primary” tables) are related via a scondary, “join” table that has foreign keys into both primary tables. A 1:1 relationship is simply a 1:N relationship where there is only one “N”-side row with a foreign key to any particular “1”-side row.

1:N relations

To model each type of relationship, SQLAlchemy uses the relation() function in the properties dict of the mapper. In many cases, SQLAlchemy is able to infer the proper join condition for 1:N relations. For instance, since the stores in our data model are members of regions (a 1:N relationship region:store), we can model this on our Region class as follows:

>>> mapper(Store, store_table)

<sqlalchemy.orm.mapper.Mapper object at 0x2b794eb2f610>

>>> mapper(Region, region_table, properties=dict(

... stores=relation(Store)))

1

<sqlalchemy.orm.mapper.Mapper object at 0x2b794eb3af90>

>>> rgn = session.query(Region).get(1)

2007-10-13 12:59:47,876 INFO sqlalchemy.engine.base.Engine.0x..90

... SELECT region.id AS region_id, region.name AS region_name

FROM region

WHERE region.id = ? ORDER BY region.oid

LIMIT 1 OFFSET 0

2007-10-13 12:59:47,877 INFO sqlalchemy.engine.base.Engine.0x..90

... [1]

>>> s0 = Store(name='3rd and Juniper')

>>> rgn.stores.append(s0)2

2007-10-13 13:00:06,339 INFO sqlalchemy.engine.base.Engine.0x..90

... SELECT store.id AS store_id, store.region_id AS store_region_id,

... store.name AS store_name

FROM store

WHERE ? = store.region_id ORDER BY store.oid

2007-10-13 13:00:06,339 INFO sqlalchemy.engine.base.Engine.0x..90

... [1]

>>> session.flush()3

2007-10-13 13:00:14,344 INFO sqlalchemy.engine.base.Engine.0x..90

... INSERT INTO store (region_id, name) VALUES (?, ?)

2007-10-13 13:00:14,345 INFO sqlalchemy.engine.base.Engine.0x..90

... [1, '3rd and Juniper']

1

SQLAlchemy is able to infer the 1:N relation type by the foreign key relationship between region_table and store_table.

2

Adding a store to the region is as simple as appending on to the property. Generally, when working at the ORM level, it is not necessary to worry about foreign keys. The SELECT statement is necessary for SQLAlchemy to retrieve the inital contents of the “stores” property.

3

SQLAlchemy automatically infers that a new store must be inserted with the region_id properly set.

In some cases, SQLAlchemy is unable to infer the proper join condition (for instance, when there are multiple foreign key relations between the two tables). In this case, we can simply use the primaryjoin parameter to the relation() function:

mapper(Region, region_table, properties=dict(

stores=relation(Store,

primaryjoin=(store_table.c.region_id

==region_table.c.id))))

M:N relations

It is often useful to model many-to-many (M:N) type relations between objects. In the database, this is accomplished by the use of an association or join table. In the following schema, the relation between the product_table and the category_table is a many-to-many:

category_table = Table(

'category', metadata,

Column('id', Integer, primary_key=True),

Column('level_id', None, ForeignKey('level.id')),

Column('parent_id', None, ForeignKey('category.id')),

Column('name', String(20)))

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

product_category_table = Table(

'product_category', metadata,

Column('product_id', None, ForeignKey('product.sku'),

... primary_key=True),

Column('category_id', None, ForeignKey('category.id'),

... primary_key=True))

In SQLAlchemy, we can model this relationship with the relation() function and the secondary parameter:

>>> mapper(Category, category_table, properties=dict(

... products=relation(Product,

... secondary=product_category_table)))

<sqlalchemy.orm.mapper.Mapper object at 0xee6810>

>>> mapper(Product, product_table, properties=dict(

... categories=relation(Category,

... secondary=product_category_table)))

<sqlalchemy.orm.mapper.Mapper object at 0xee6d10>

>>>

>>> session.query(Product).get('123').categories

2007-10-15 20:06:17,375 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT category.id AS category_id, category.level_id AS

... category_level_id, category.parent_id AS category_parent_id,

... category.name AS category_name

FROM category, product_category

WHERE ? = product_category.product_id AND category.id =

... product_category.category_id ORDER BY product_category.oid

2007-10-15 20:06:17,375 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'123']

[]

As in the case of the 1:N join, we can also explicitly specify the join criteria by using the primaryjoin (the join condition between the table being mapped and the join table) and the secondaryjoin (the join condition between the join table and the table being related to) parameters:

mapper(Category, category_table, properties=dict(

products=relation(Product, secondary=product_category_table,

primaryjoin=(product_category_table.c.category_id

== category_table.c.id),

secondaryjoin=(product_category_table.c.product_id

== product_table.c.sku))))

mapper(Product, product_table, properties=dict(

categories=relation(Category, secondary=product_category_table,

primaryjoin=(product_category_table.c.product_id

== product_table.c.sku),

secondaryjoin=(product_category_table.c.category_id

== category_table.c.id))))

1:1 relations

SQLAlchemy also supports 1:1 mappings as a type of 1:N mappings. This is modeled in our schema with the product_table and the product_summary_table:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

product_summary_table = Table(

'product_summary', metadata,

Column('sku', None, ForeignKey('product.sku'), primary_key=True),

Column('name', Unicode(255)),

Column('description', Unicode))

Note in particular the foreign key relationship between product_table and product_summary_table. This relationship allows, in SQL, many product_summary_table rows to exist for one product_table row. If left to its own devices, then, SQLAlchemy will assume that this is a 1:N join:

>>> mapper(ProductSummary, product_summary_table)

<sqlalchemy.orm.mapper.Mapper object at 0xeee150>

>>> mapper(Product, product_table, properties=dict(

... summary=relation(ProductSummary)))

<sqlalchemy.orm.mapper.Mapper object at 0xef0410>

>>>

>>> prod = session.query(Product).get('123')

>>> print prod.summary

[]

To avoid this, we simply specify uselist=False to the relation() function:

>>> mapper(ProductSummary, product_summary_table)

<sqlalchemy.orm.mapper.Mapper object at 0xef5c90>

>>> mapper(Product, product_table, properties=dict(

... summary=relation(ProductSummary, uselist=False)))

<sqlalchemy.orm.mapper.Mapper object at 0xef88d0>

>>>

>>> prod = session.query(Product).get('123')

>>> print prod.summary

None

Using BackRefs

In most cases, when mapping a relation between two tables, we want to create a property on both classes. We can certainly do this in SQLAlchemy by using two relation() calls, one for each mapper, but this is verbose and potentially leads to the two properties becoming out-of-sync with each other. To eliminate these problems, SQLAlchemy provides the backref parameter to the relation() function:

>>> mapper(ProductSummary, product_summary_table)

<sqlalchemy.orm.mapper.Mapper object at 0xfbba10>

>>> mapper(Product, product_table, properties=dict(

... summary=relation(ProductSummary, uselist=False,

... backref='product')))

<sqlalchemy.orm.mapper.Mapper object at 0xee7dd0>

>>>

>>> prod = session.query(Product).get('123')

>>> prod.summary = ProductSummary(name="Fruit", description="Some

... Fruit")

>>> print prod.summary

<ProductSummary Fruit>

>>> print prod.summary.product

<Product 123>

>>> print prod.summary.product is prod

True

Note in particular that SQLAlchemy automatically updated the backref property. This is particularly useful in many-to-many (M:N) relations. For instance, to model an M:N relation, we could use the relation() function twice, but the two properties would not remain synchronized with each other. Note the incorrect behavior in the following example:

>>> mapper(Level, level_table, properties=dict(

... categories=relation(Category, backref='level')))

<sqlalchemy.orm.mapper.Mapper object at 0x1044d90>

>>> mapper(Category, category_table, properties=dict(

... products=relation(Product,

... secondary=product_category_table)))

<sqlalchemy.orm.mapper.Mapper object at 0x104a8d0>

>>> mapper(Product, product_table, properties=dict(

... categories=relation(Category,

... secondary=product_category_table)))

<sqlalchemy.orm.mapper.Mapper object at 0x104aed0>

>>> lvl = Level(name='Department')

>>> cat = Category(name='Produce', level=lvl)

>>> session.save(lvl)

>>> prod = session.query(Product).get('123')

>>> print prod.categories

[]

>>> print cat.products

[]

>>> prod.categories.append(cat)

>>> print prod.categories

[<Category Department.Produce>]

>>> print cat.products

[]

If we declare a backref on the products property, however, the two lists are kept in sync:

>>> mapper(Level, level_table, properties=dict(

... categories=relation(Category, backref='level')))

<sqlalchemy.orm.mapper.Mapper object at 0x107cf90>

>>> mapper(Category, category_table, properties=dict(

... products=relation(Product, secondary=product_category_table,

... backref='categories')))

<sqlalchemy.orm.mapper.Mapper object at 0x107c350>

>>> mapper(Product, product_table)

<sqlalchemy.orm.mapper.Mapper object at 0x104f110>

>>> lvl = Level(name='Department')

>>> cat = Category(name='Produce', level=lvl)

>>> session.save(lvl)

>>> prod = session.query(Product).get('123')

>>> print prod.categories

[]

>>> print cat.products

[]

>>> prod.categories.append(cat)

>>> print prod.categories

[<Category Department.Produce>]

>>> print cat.products

[<Product 123>]

Rather than specifying just the backref’s name, we can also use the SQLAlchemy-provided backref() function. This function allows us to pass along arguments to the relation that is created by the backref. For instance, if we wanted to declare the product property on theProductSummary class rather than declaring the summary property on the Product class, we could use backref⁠(⁠ ⁠) with uselist=False as follows:

mapper(ProductSummary, product_summary_table, properties=dict(

product=relation(Product,

backref=backref('summary', uselist=False))))

mapper(Product, product_table)

Using a Self-Referential Mapper

It is sometimes useful to have a relation() map from one object to another object of the same class. This is referred to as self-referential mapping. For instance, in our schema, each row of the level_table has a parent_id column referring to another level_table row:

level_table = Table(

'level', metadata,

Column('id', Integer, primary_key=True),

Column('parent_id', None, ForeignKey('level.id')),

Column('name', String(20)))

To specify the parent-child relationship between different levels, we can use the relation() function with a little extra work. When there is a relation specified with a self-referential foreign key constraint, SQLAlchemy assumes that the relation will be a 1:N relation. If we want to get only the “children” property working, then the mapper setup is as simple as the following:

mapper(Level, level_table, properties=dict(

children=relation(Level)))

However, we would also like to get the backref to the parent working as well. For this, we need to specify the “remote side” of the backref. In the case of the “parent” attribute, the “local side” is the parent_id column, and the “remote side” is the id column. To specify the remote side of a relation (or backref), we use the remote_side parameter:

>>> mapper(Level, level_table, properties=dict(

... children=relation(Level,

... backref=backref('parent',

...

... remote_side=[level_table.c.id]))))

<sqlalchemy.orm.mapper.Mapper object at 0x1050990>

>>>

>>> l0 = Level('Gender')

>>> l1 = Level('Department', parent=l0)

>>> session.save(l0)

>>> session.flush()

2007-10-19 10:23:53,861 INFO sqlalchemy.engine.base.Engine.0x..50

... INSERT INTO level (parent_id, name) VALUES (?, ?)

2007-10-19 10:23:53,862 INFO sqlalchemy.engine.base.Engine.0x..50

... [None, 'Gender']

2007-10-19 10:23:53,875 INFO sqlalchemy.engine.base.Engine.0x..50

... INSERT INTO level (parent_id, name) VALUES (?, ?)

2007-10-19 10:23:53,876 INFO sqlalchemy.engine.base.Engine.0x..50

... [1, 'Department']

We could, of course, specify the relation “in reverse” as well:

mapper(Level, level_table, properties=dict(

parent=relation(Level, remote_side=[level_table.c.parent_id],

backref='children')))

Note that a list is used for the remote_side parameter to allow for compound foreign keys in the relation.

Cascading Changes to Related Objects

It is often the case, particularly in 1:N relations, that you want to cascade the changes on one object to another “child” object. For instance, in the previous schema, if we delete a row from the product_table, we would also want to delete it from the product_summary_table. In many cases, this can be handled natively by the database using ON DELETE CASCADE in SQL or the ondelete parameter in the Table⁠( ⁠) definition. In some cases, however, the underlying database may not support cascading deletes natively. For circumstances such as these, SQLAlchemy provides the cascade parameter to relation⁠(⁠ ⁠)s and backref⁠(⁠ ⁠)s.

The cascade parameter is specified as a string composed of a comma-separated list of keywords that specify which session operations should cascade onto the related objects. In the following list, the “parent” object is the one that has the relation as a property. The “child” object is the object that it is related to. For instance, in the following relation, the Region object is the “parent”, and the related Store objects are the “children”.

mapper(Region, region_table, properties=dict(

stores=relation(Store)))

All of the cascade values in the following list refer to various functions that are performed by the Session object (covered in more detail in Chapter 7). The default value for the cascade parameter on a relation is "save-update,merge".

all

Specifies that all options should be enabled except delete-orphan:.

delete

When the parent object is marked for deletion via session.delete⁠(⁠), mark the child(ren) as well.

save-update

When the parent object is attached to the session, attach the child(ren) as well. (Attachment to a session generally happens by calling the save(), update(), or save_or_update() methods on the Session object.)

refresh-expire

When the parent object is refreshed (reloaded from the database) or expired (marked as expired, to be refreshed if any properties are subsequently read), refresh or expire the child(ren) as well.

merge

When the parent object is merged, then merge the child(ren) as well. Merging is the process of taking an object and copying its state onto a persistent instance of that object that is managed by the session.

expunge

When the parent object is expunged from the session (removing all references to the object from the session, the opposite of save_or_update()), expunge the child(ren) as well.

delete-orphan

When the child object is removed from the relation (by reassigning a 1:1 or N:1 relation or by removing it from the list in a 1:N or M:N relation), mark the child object for deletion. (This operation is referred to as “orphaning” the child object by removing its relation to its parent.)

Other relation⁠(⁠ ⁠) and backref⁠(⁠ ⁠) Parameters

The relation(argument, secondary=None, **kwargs) and backref(name, **kwargs) functions also take a number of other parameters, specified in the following list of arguments. relation() and backref() take the same keyword arguments.

backref (relation⁠(⁠ ⁠)only)

Either the name of the property to be used for the reverse relationship, or an invocation of the backref() function to customize the backreference further.

cascade

String of comma-separated cascade values (for more detail, see the list of cascade values in the preceding section).

collection_class

The class or function used to build a list-holding object (used to store 1:N and M:N relations). See the section Using custom collections in relations” for more detail.

foreign_keys

List of columns that are used as the “foreign keys” in the relation, if no actual foreign keys are present. Always used in conjunction with explicit primaryjoin and/or secondaryjoin parameters.

join_depth=None

When non-None, this limits the depth an eager-loading join will traverse with a self-referential mapper. The join_depth specifies the maximum number of times the same mapper can be present along a join branch before eager loading is stopped. The default value of None stops the traversal of an eager join when it encounters the first duplicate mapper.

lazy=True

Specifies how related items should be loaded. The options are:

True (default)

Load items when the property is first accessed.

False

Load the items eagerly when the parent is fetched, using a JOIN or LEFT OUTER JOIN.

None

SQLAlchemy will never automatically load the related items. This is used for write-only properties or properties that are populated in some other way.

'dynamic'

Returns a Query object when reading the property and supports writes only through the append() and remove() methods. This option allows partial results to be fetched lazily. This option is mainly used in backrefs. To use dynamic loading on a forward relation, use thedynamic_loader() function in place of relation().

order_by

List of ClauseElements specifying the ordering that should be applied when loading a 1:N or M:N relation.

passive_deletes=False

When True, indicates that the database will automatically cascade deletes (either by deleting the child row or by setting its foreign key to NULL, whichever is appropriate). This prevents the default SQLAlchemy behavior of loading related objects from the database to either set them to deleted or to set their foreign key column to NULL.

post_update=False

If True, this property will be handled by a separate statement whenever inserting, updating, or deleting the parent row. If False, SQLAlchemy will attempt to update the row along with all its relations in a single statement, something that is impossible to do when there is a cyclical set of foreign key relationships.

Attempting to insert, update, or delete such a cyclical set will raise a “cyclical dependency” exception when flush()ing the session. Setting post_update to True on one of the relations in the cycle will “break” it and allow flushing to proceed.

primaryjoin

The ClauseElement that specifies how to join the parent row to the child row (in a 1:N, N:1, or 1:1 relation) or the association table row (in an M:N relation). If not specified, SQLAlchemy will infer a relationship based on the foreign key relationships between the tables involved in the relation.

remote_side

In a self-referential relationship, the column or columns that form the “remote side” (i.e., the “child side”) of the relationship.

secondary

In an M:N relationship, this argument specifies the join table used to create the relation. Note that, if you are using SQLAlchemy’s ability to do M:N relationships, the join table should only be used to join the two tables together, not to store auxiliary properties. If you need to use the intermediate join table to store additional properties of the relation, you should use two 1:N relations instead.

secondaryjoin

The ClauseElement that specifies how to join the association table row to the child row in an M:N relation. If not specified, SQLAlchemy will infer a relationship based on the foreign key relationships between the tables involved in the relation.

uselist=True

If False, forces SQLAlchemy to use a scalar to represent a 1:N relationship (thus modeling a 1:1 relationship).

viewonly=False

If True, tells SQLAlchemy that the relation is suitable only for read operations. This allows greater flexibility in the join conditions (normally, these must be fairly straightforward in order for SQLAlchemy to determine how to persist the relation). Updates to a relation marked as viewonlywill not have any effect on the flush process.

Using custom collections in relations

When you specify a relation⁠(⁠ ⁠) that implements a one-to-many or many-to-many join, SQLAlchemy uses a collection to implement the property on the mapped object. By default, this collection is a list. In order to implement appropriate cascade and backref behavior, however, SQLAlchemy must instrument the class, tracking additions and removals of objects to and from the collection. This happens via the CollectionAdapter class, which is used by SQLAlchemy to link the class that implements the collection with the attribute on the mapped object.

To complicate matters further, SQLAlchemy provides the collection_class parameter, which allows you to customize the implementation of list-like relationships. If you specify a collection_class value of the built-in types of list, dict, set, or any subclass of these types, SQLAlchemy will automatically apply the appropriate instrumentation to track changes. For instance, if we wish to use a set to track the changes to the stores attribute in a Region, we could simply write the following:

mapper(Region, region_table, properties=dict(

stores=relation(Store, collection_class=set)))

In some cases, SQLAlchemy can even instrument custom collection classes that are not derived from Python’s built-in collection types by inspecting the class definition and determining whether it is list-like, set-like, or dict-like. This inference is not perfect, however, so SQLAlchemy provides two methods to override it. The first is the __emulates__ class attribute. If you supply a built-in type as the value for this attribute, SQLAlchemy will assume that your custom collection class is “like” the type you name. So, to implement a collection that is set-like but includes alist-like append() method, we could do the following:

class SetAndListLike(object):

__emulates__ = set

def __init__(self):

self._c = set()

def append(self, o):

self._c.add(o)

def remove(self, o):

self._c.remove(o)

def __iter__(self):

return iter(self._c)

The second method for overriding the collection_class inference mechanism is by using the SQLAlchemy-provided collection decorators, which are available as attributes of the collections class in the sqlalchemy.orm.collections module. In the previous example, for instance, SQLAlchemy will correctly infer the usage of remove() and __iter__(), but because append() is not normally used in set-like objects, it will not be instrumented. To force SQLAlchemy to instrument this method, we can use collection.appender:

from sqlalchemy.orm.collections import collection

class SetAndListLike(object):

__emulates__ = set

def __init__(self):

self._c = set()

@collection.appender

def append(self, o):

self._c.add(o)

def remove(self, o):

self._c.remove(o)

def __iter__(self):

return iter(self._c)

The following decorators are available for manually instrumenting your custom collection class:

appender(cls, fn)

This decorator marks the decorated function as a “collection appender.” The decorated function should take one positional argument: the value to add to the collection.

remover(cls, fn)

This decorator marks the decorated function as a “collection remover.” The decorated function should take one positional argument: the value to remove from the collection.

iterator(cls, fn)

This decorator marks the decorated function as a “collection iterator.” The decorated function should take no arguments and return an iterator over all collection members.

internally_instrumented(cls, fn)

This decorator prevents other decorators from being applied to the decorated function. This is useful to prevent “recognized” method names such as append() from being automatically decorated.

on_link(cls, fn)

This decorator marks the decorated function as a “linked to attribute” event handler. This event handler is called when the collection class is linked to the CollectionAdapter that, in turn, is linked to the relation attribute. The decorated function should take one positional argument: theCollectionAdapter being linked (or None if the adapter is being unlinked). This might be useful if you wish to perform some setup on the mapped class or relation when your custom collection is initially linked.

adds(cls, arg)

This decorator factory is used to create decorators that function as “collection appenders.” The one argument to the factory is an indicator of which parameter to the decorated function should be added to the collection. This argument may be specified as either an integer (representing the position number of a positional argument) or a string (indicating the name of the parameter).

replaces(cls, arg)

This decorator factory is used to create decorators that function as “collection replacers.” The one argument to the factory is an indicator of which parameter to the decorated function should be added to the collection. This argument may be specified as either an integer (representing the position number of a positional argument) or a string (indicating the name of the parameter). The return value from the decorated function, if any, is used as the value to be removed from the function.

removes(cls, arg)

This decorator factory is used to create decorators that function as “collection removers.” The one argument to the factory is an indicator of which parameter to the decorated function should be removed from the collection. This argument may be specified as either an integer (representing the position number of a positional argument) or a string (indicating the name of the parameter).

removes_return(cls)

This decorator factory is used to create decorators that function as “collection removers.” The value that is returned from the decorated function is the value that SQLAlchemy will consider to be removed from the collection. This is useful for implementing a list-like pop() method, for instance.

One common use case is using a dict to represent a relation. This presents a problem over using sets and lists, however, as dicts require key values. The sqlalchemy.orm.collections module provides the following helpers for just this purpose:

column_mapped_collection(mapping_spec)

Return a collection class that will be keyed by the mapping_spec, which may be either a column from the related table, or a list of columns from the related table.

attribute_mapped_collection(attr_name)

Return a collection class that will be keyed by the attr_name, which is the name of an attribute on the related class.

mapped_collection(keyfunc)

Return a collection class that will be keyed by the value returned from the supplied keyfunc function. keyfunc takes as its single parameter the related object and returns a key value.

To use a dictionary that is keyed by the store name in our Region class, for instance, we could either use the column:

mapper(Region, region_table, properties=dict(

stores=relation(Store,

collection_class=column_mapped_collection(store_table.c.name)))

or the attribute:

mapper(Region, region_table, properties=dict(

stores=relation(Store,

collection_class=attribute_mapped_collection('name')))

If you wish to determine the key value to be used in some other way, you can also use the SQLAlchemy-supplied MappedCollection class as base class for your custom dict-like classes. MappedCollection takes a keyfunc parameter in its constructor just like themapped_collection() function.

Extending Mappers

Although the mapper function—combined with the various property creation functions—is extremely powerful, it is sometimes useful to extend the functionality of a mapper. To that end, SQLAlchemy provides the MapperExtension class, which can be extended to provide mapper behavior modification via a series of hooks. Multiple MapperExtensions can be registered on a mapper, allowing a chain of responsibility for modifying the mapper behavior. MapperExtensions are registered either in the mapper() function call via the extension parameter, or by using an extension() argument to the option() method in queries (covered in Chapter 7).

Each hook should return either orm.EXT_CONTINUE or orm.EXT_STOP. (Any other value will be interpreted by SQLAlchemy as orm.EXT_STOP.) If orm.EXT_CONTINUE is returned, processing continues, either to the next MapperExtension or by the mapper itself. If orm.EXT_STOP is returned, then the mapper will not call any other extensions in the chain.

Some of the useful hooks in MapperExtension are described in the following list:

before_delete(self, mapper, connection, instance)

Called with an object instance before that instance is deleted.

before_insert(self, mapper, connection, instance)

Called with an object instance before that instance is inserted.

before_update(self, mapper, connection, instance)

Called with an object instance before that instance is updated.

after_delete(self, mapper, connection, instance)

Called with an object instance after that instance is deleted.

after_insert(self, mapper, connection, instance)

Called with an object instance after that instance is inserted.

after_update(self, mapper, connection, instance)

Called with an object instance after that instance is updated.

append_result(self, mapper, selectcontext, row, instance, result, **flags)

Called just before an object instance is appended to a result list. Returning anything other than EXT_CONTINUE will prevent the instance from being appended to the result.

create_instance(self, mapper, selectcontext, row, class_)

Called when a new object is about to be created from a row. If None is returned, normal object creation will take place. Any other value is presumed to be the object instance created by the MapperExtension.

get(self, query, *args, **kwargs)

Overrides the get() method of the Query object if anything other than EXT_CONTINUE is returned.

get_session(self)

Called to retrieve a Session instance with which to register a new object.

load(self, query, *args, **kwargs)

Used to override the load() method of the Query object, if anything other than EXT_CONTINUE is returned.

populate_instance(self, mapper, selectcontext, row, instance, **flags)

Called when a new object is about to have its attributes populated. If EXT_CONTINUE is returned, normal attribute population will take place. Any other value will prevent attribute population by SQLAlchemy.

translate_row(self, mapper, context, row)

Called before rows are converted to instances, allowing the row to be transformed. The new row (or the original, unmodified row) must be returned from this method.

instrument_class(self, mapper, class_)

Called at class instrumentation time.

init_instance(self, mapper, class_, oldinit, instance, args, kwargs)

Called when initializing an instance (as part of the constructor call).

init_failed(self, mapper, class_, oldinit, instance, args, kwargs)

Called when instance initialization fails (when the constructor raises an unhandled exception).

ORM Partitioning Strategies

Sometimes you want to use the ORM to map objects that may exist in multiple databases. SQLAlchemy provides support for “vertical” partitioning (placing different kinds of objects or different tables in different databases) as well as “horizontal” partitioning, also called “sharding” (partitioning the rows of a single table across multiple databases).

Vertical Partitioning

In vertical partitioning, different mapped classes are retrieved from different database servers. In the following example, we create product_table in one in-memory sqlite database and product_summary_table in another:

engine1 = create_engine('sqlite://')

engine2 = create_engine('sqlite://')

metadata = MetaData()

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

product_summary_table = Table(

'product_summary', metadata,

Column('sku', None, ForeignKey('product.sku'), primary_key=True),

Column('name', Unicode(255)),

Column('description', Unicode))

product_table.create(bind=engine1)

product_summary_table.create(bind=engine2)

stmt = product_table.insert()

engine1.execute(

stmt,

[dict(sku="123", msrp=12.34),

dict(sku="456", msrp=22.12),

dict(sku="789", msrp=41.44)])

stmt = product_summary_table.insert()

engine2.execute(

stmt,

[dict(sku="123", name="Shoes", description="Some Shoes"),

dict(sku="456", name="Pants", description="Some Pants"),

dict(sku="789", name="Shirts", description="Some Shirts")])

Now, we can create and map the Product and ProductSummary classes:

class Product(object):

def __init__(self, sku, msrp, summary=None):

self.sku = sku

self.msrp = msrp

self.summary = summary

def __repr__(self):

return '<Product %s>' % self.sku

class ProductSummary(object):

def __init__(self, name, description):

self.name = name

self.description = description

def __repr__(self):

return '<ProductSummary %s>' % self.name

clear_mappers()

mapper(ProductSummary, product_summary_table, properties=dict(

product=relation(Product,

backref=backref('summary', uselist=False))))

mapper(Product, product_table)

Finally, we configure the session to load the Product class from engine1 and ProductSummary from engine2:

>>> Session = sessionmaker(binds={Product:engine1,

... ProductSummary:engine2})

>>> session = Session()

>>> engine1.echo = engine2.echo = True

>>> session.query(Product).all()

2007-11-17 14:32:20,890 INFO sqlalchemy.engine.base.Engine.0x..90

... BEGIN

2007-11-17 14:32:20,895 INFO sqlalchemy.engine.base.Engine.0x..90

... SELECT product.sku AS product_sku, product.msrp AS product_msrp

FROM product ORDER BY product.oid

2007-11-17 14:32:20,895 INFO sqlalchemy.engine.base.Engine.0x..90 []

[<Product 123>, <Product 456>, <Product 789>]

>>> session.query(ProductSummary).all()

2007-11-17 14:32:20,900 INFO sqlalchemy.engine.base.Engine.0x..10

... BEGIN

2007-11-17 14:32:20,901 INFO sqlalchemy.engine.base.Engine.0x..10

... SELECT product_summary.sku AS product_summary_sku,

... product_summary.name AS product_summary_name,

... product_summary.description AS product_summary_description

FROM product_summary ORDER BY product_summary.oid

2007-11-17 14:32:20,902 INFO sqlalchemy.engine.base.Engine.0x..10 []

[<ProductSummary Shoes>, <ProductSummary Pants>, <ProductSummary

... Shirts>]

Note that the appropriate engine is invoked depending on which class is being queried, completely transparently to the user.

Horizontal Partitioning

In horizontal partitioning, or “sharding,” the database schema (or part of it) is replicated across multiple databases (“shards”). This essentially means that some rows of a mapped table will be loaded from one database and some from another. To use sharding, you must provide functions that identify which database to access in various situations. These arguments are passed to the sessionmaker⁠(⁠ ⁠) function, along with a class_ parameter specifying that we will be creating a ShardedSession:

Session = sessionmaker(class_=ShardedSession)

The first function that must be provided is the shard_chooser(mapper, instance, clause=None) function. This function is responsible for returning a “shard ID” that should contain the row for the given mapper and instance. The ID may be based off of the instance’s properties, or it may simply be the result of a round-robin selection scheme. If it is not based on attributes of the instance, the shard_chooser() should modify the instance in some way to mark it as participating in the returned shard.

The next function that must be provided is the id_chooser(query, ident ) function. This function, when presented with a query and a tuple of identity values (the primary key of the mapped class), should return a list of shard IDs where the objects sought by the query might reside. In a round-robin implementation, all of the shard IDs might be returned. In other implementations, the shard ID might be inferred from the ident parameter.

The final function that must be provided when using sharding is the query_chooser(query) function, which should return a list of shard IDs where results for a given query might be found. Note that both id_chooser() and query_chooser() may simply return a list of all the shard IDs, in which case each shard will be searched for the results of the query.

In the following example, we will create a sharded implementation of the product database where products are stored according to the first digit of their SKU. If the first digit is even, the products are stored in engine1; otherwise they are stored in engine2. All other types of objects will be stored in engine2:

engine1 = create_engine('sqlite://')

engine2 = create_engine('sqlite://')

metadata = MetaData()

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

metadata.create_all(bind=engine1)

metadata.create_all(bind=engine2)

class Product(object):

def __init__(self, sku, msrp):

self.sku = sku

self.msrp = msrp

def __repr__(self):

return '<Product %s>' % self.sku

clear_mappers()

product_mapper = mapper(Product, product_table)

def shard_chooser(mapper, instance, clause=None):

if mapper is not product_mapper:

return 'odd'

if (instance.sku

and instance.sku[0].isdigit()

and int(instance.sku[0]) % 2 == 0):

return 'even'

else:

return 'odd'

def id_chooser(query, ident):

if query.mapper is not product_mapper:

return ['odd']

if (ident \

and ident[0].isdigit()

and int(ident[0]) % 2 == 0):

return ['even']

return ['odd']

def query_chooser(query):

return ['even', 'odd']

Session = sessionmaker(class_=ShardedSession)

session = Session(

shard_chooser=shard_chooser,

id_chooser=id_chooser,

query_chooser=query_chooser,

shards=dict(even=engine1,

odd=engine2))

Now we can create some products, save them to the database, observe their partitioning using the SQL layer, and observe that the session’s Query object is able to correctly merge results from both databases:

>>> products = [ Product('%d%d%d' % (i,i,i), 0.0)

... for i in range(10) ]

>>> for p in products:

... session.save(p)

...

>>> session.flush()

>>>

>>> for row in engine1.execute(product_table.select()):

... print row

...

(u'000', Decimal("0"))

(u'222', Decimal("0"))

(u'444', Decimal("0"))

(u'666', Decimal("0"))

(u'888', Decimal("0"))

>>> for row in engine2.execute(product_table.select()):

... print row

...

(u'111', Decimal("0"))

(u'333', Decimal("0"))

(u'555', Decimal("0"))

(u'777', Decimal("0"))

(u'999', Decimal("0"))

>>> for row in engine1.execute(product_table.select()):

... print row

...

>>> for row in engine2.execute(product_table.select()):

... print row

...

>>> session.query(Product).all()

[<Product 000>, <Product 222>, <Product 444>, <Product 666>,

... <Product 888>, <Product 111>, <Product 333>, <Product 555>,

... <Product 777>, <Product 999>]