Inheritance Mapping - Essential SQLAlchemy, 2nd Edition (2010)

Essential SQLAlchemy, 2nd Edition (2010)

Chapter 8. Inheritance Mapping

In this chapter, you will learn the different methods of mapping object-oriented inheritance to relational database tables. You will learn how to use different methods of inheritance mapping with SQLAlchemy, as well as how to use inheritance in the presence of mapped relations between classes.

Overview of Inheritance Mapping

No object-relational mapper would be complete without some method of mapping object-oriented inheritance hierarchies to SQL tables, and so SQLAlchemy provides rich support for modeling inheritance. Inheritance is typically modeled in SQL in one of three ways: single table inheritance, concrete table inheritance, or joined table inheritance.

For the purposes of illustrating SQLAlchemy’s support for the various types of inheritance modeling, we will use a simple inheritance hierarchy that models products, including clothing products and accessories. This hierarchy is illustrated in Figure 8-1 and is implemented by the following Python code:

class Product(object):

def __init__(self, sku, msrp):

self.sku = sku

self.msrp = msrp

def __repr__(self):

return '<%s %s>' % (

self.__class__.__name__, self.sku)

class Clothing(Product):

def __init__(self, sku, msrp, clothing_info):

Product.__init__(self, sku, msrp)

self.clothing_info = clothing_info

class Accessory(Product):

def __init__(self, sku, msrp, accessory_info):

Product.__init__(self, sku, msrp)

self.accessory_info = accessory_info

Sample inheritance hierarchy

Figure 8-1. Sample inheritance hierarchy

Single Table Inheritance Mapping

In single table inheritance, a single table is used to represent all the different types in the class hierarchy, as shown in Figure 8-2.

Single table inheritance mapping (unmapped columns masked)

Figure 8-2. Single table inheritance mapping (unmapped columns masked)

In our preceding example, this table might be defined as follows:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('clothing_info', String),

Column('accessory_info', String),

Column('product_type', String(1), nullable=False))

Notice that we have constructed a table that contains columns for all of the attributes across the entire hierarchy we wish to model. This means that we incur some overhead for all of the classes in the hierarchy in each row. Although this doesn’t cause too many problems with the simple hierarchy we are using in this example, the space overhead can become significant with larger and more attribute-rich hierarchies.

Also note that we have introduced a new column, the 'product_type' column. This column holds the “polymorphic identity” of each row, so named because it allows SQLAlchemy to return the appropriate class from a query on the parent object. The polymorphic identity is used by SQLAlchemy to determine what type of object is contained in the row. SQLAlchemy supports using any data type desired to hold this information; here we use a single character. 'P' will represent a Product (the parent class), 'C' will represent a Clothing product, and 'A' will represent an Accessory product.

To map this table onto our inheritance hierarchy, we will use some new keyword arguments to the mapper() function, namely polymorphic_on, inherits, and polymorphic_identity:

mapper(

Product, product_table,

polymorphic_on=product_table.c.product_type,

polymorphic_identity='P')

mapper(Clothing, inherits=Product,

polymorphic_identity='C')

mapper(Accessory, inherits=Product,

polymorphic_identity='A')

The polymorphic_on parameter identifies which column contains the polymorphic identity of each row. The polymorphic_identity parameter identifies the value that should be present in that column to tell SQLAlchemy to use this particular mapper, and the inherits parameter tells SQLAlchemy to retrieve all other parameters and properties from the named mapper.

Once we have defined the mappers, we can insert some data and perform some queries:

>>> # Create some products

... products = [

... # Some parent class products

... Product('123', 11.22),

... Product('456', 33.44),

... # Some clothing

... Clothing('789', 123.45, "Nice Pants"),

... Clothing('111', 125.45, "Nicer Pants"),

... # Some accessories

... Accessory('222', 24.99, "Wallet"),

... Accessory('333', 14.99, "Belt") ]

>>>

>>> Session = sessionmaker()

>>> session = Session()

>>> for p in products: session.save(p)

...

>>> session.flush()

>>> session.clear()

>>>

>>> metadata.bind.echo = True

>>>

>>> print session.query(Product).all()

2007-11-19 14:35:55,244 INFO sqlalchemy.engine.base.Engine.0x..90

... SELECT product.sku AS product_sku, product.msrp AS product_msrp,

... product.clothing_info AS product_clothing_info,

... product.accessory_info AS product_accessory_info,

... product.product_type AS product_product_type

FROM product ORDER BY product.oid

2007-11-19 14:35:55,245 INFO sqlalchemy.engine.base.Engine.0x..90 []

[<Product 123>, <Product 456>, <Clothing 789>, <Clothing 111>,

... <Accessory 222>, <Accessory 333>]

>>> print session.query(Clothing).all()

2007-11-19 14:35:55,259 INFO sqlalchemy.engine.base.Engine.0x..90

... SELECT product.sku AS product_sku, product.msrp AS product_msrp,

... product.clothing_info AS product_clothing_info,

... product.accessory_info AS product_accessory_info,

... product.product_type AS product_product_type

FROM product

WHERE product.product_type IN (?) ORDER BY product.oid

2007-11-19 14:35:55,259 INFO sqlalchemy.engine.base.Engine.0x..90

... ['C']

[<Clothing 789>, <Clothing 111>]

>>> print session.query(Accessory).all()

2007-11-19 14:35:55,274 INFO sqlalchemy.engine.base.Engine.0x..90

... SELECT product.sku AS product_sku, product.msrp AS product_msrp,

... product.clothing_info AS product_clothing_info,

... product.accessory_info AS product_accessory_info,

... product.product_type AS product_product_type

FROM product

WHERE product.product_type IN (?) ORDER BY product.oid

2007-11-19 14:35:55,274 INFO sqlalchemy.engine.base.Engine.0x..90

... ['A']

[<Accessory 222>, <Accessory 333>]

Note in particular that SQLAlchemy generated appropriate queries (through filtering based on product_type) based on whether we were selecting from a parent class or a child class.

Also note how SQLAlchemy was able to create appropriate objects based on the polymorphic identity column (which SQLAlchemy generated itself when flushing the instances). If we inspect the table at the SQL level, we will see the 'type_' column populated just as we expect:

>>> metadata.bind.echo = False

>>> for row in product_table.select().execute():

... print row

...

(u'123', Decimal("11.22"), None, None, u'P')

(u'456', Decimal("33.44"), None, None, u'P')

(u'789', Decimal("123.45"), u'Nice Pants', None, u'C')

(u'111', Decimal("125.45"), u'Nicer Pants', None, u'C')

(u'222', Decimal("24.99"), None, u'Wallet', u'A')

(u'333', Decimal("14.99"), None, u'Belt', u'A')

Aside from the space overhead, there is one problem in using single table inheritance mapping: the mapper will try to map all the columns of the single table unless you manually specify columns to map at each level of the inheritance hierarchy via the include_columns or exclude_columnsarguments to the mapper. For instance, if we try to get the clothing_info for a nonclothing product, SQLAlchemy will not complain:

>>> print session.query(Accessory)[0].clothing_info

None

This problem is alleviated in the concrete table and joined table inheritance mappings, which each use a different table for each class in the hierarchy.

Concrete Table Inheritance Mapping

In concrete table inheritance mapping, we use a separate table for each class in the inheritance hierarchy, with each table containing all the columns necessary to represent the object in its entirety, as shown in Figure 8-3.

Concrete table inheritance mapping

Figure 8-3. Concrete table inheritance mapping

So, for the product hierarchy in our example, we would define the following tables in this way:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

clothing_table = Table(

'clothing', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('clothing_info', String))

accessory_table = Table(

'accessory', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('accessory_info', String))

Note that in concrete table inheritance, each table contains exactly the amount of data that is required to implement its class; there is no wasted space, unlike single table inheritance. Also note that there is no longer a need for the “polymorphic identity” column, as SQLAlchemy knows thatClothing objects are created from the clothing_table, Accessory objects from the accessory_table, etc.

The mapper configuration is likewise straightforward:

mapper(Product, product_table)

mapper(Clothing, clothing_table)

mapper(Accessory, accessory_table)

In fact, as far as SQLAlchemy is concerned, we aren’t modeling inheritance at all! We’ve just persisted three classes which happen to have an inheritance relationship that is completely ignored by SQLAlchemy. Unfortunately, in doing so, we have lost the ability to query polymorphically. For instance, we may wish to retrieve the Product with sku '222'. Without some extra work, we’d have to query each of the classes in the inheritance hierarchy. Luckily, SQLAlchemy provides support for polymorphic loading if we do a little extra work in the mapper configuration.

The first thing we need to do is get a selectable that yields something like what the single table select yielded. SQLAlchemy provides a utility function polymorphic_union() which provides just such a selectable. To use it, we simply supply a dict object whose keys are the old polymorphic identities and whose values are the tables in the inheritance hierarchy:

>>> punion = polymorphic_union(

... dict(P=product_table,

... C=clothing_table,

... A=accessory_table),

... 'type_')

>>>

>>> print punion

SELECT accessory.sku, CAST(NULL AS TEXT) AS clothing_info,

... accessory.msrp, accessory.accessory_info, 'A' AS type_

FROM accessory UNION ALL SELECT product.sku, CAST(NULL AS TEXT) AS

... clothing_info, product.msrp, CAST(NULL AS TEXT) AS

... accessory_info, 'P' AS type_

FROM product UNION ALL SELECT clothing.sku, clothing.clothing_info,

... clothing.msrp, CAST(NULL AS TEXT) AS accessory_info, 'C' AS

... type_

FROM clothing

>>>

Now, we have a nicely labeled selectable that can be selected from, just as in the single table inheritance. To complete the mapping, we need to let the mappers know about the union and the inheritance relationship:

mapper(

Product, product_table, select_table=punion,

polymorphic_on=punion.c.type_,

polymorphic_identity='P')

mapper(Clothing, clothing_table, inherits=Product,

polymorphic_identity='C',

concrete=True)

mapper(Accessory, accessory_table, inherits=Product,

polymorphic_identity='A',

concrete=True)

Here, we have specified a different table for selects (the polymorphic_union⁠(⁠ ⁠) result) and let SQLAlchemy know to use concrete table inheritance in the child classes. Otherwise, the mapper configuration is identical to the single table inheritance. Now, assuming we have created the objects in the database as we did previously, we can perform polymorphic loads as follows:

>>> session.query(Product).get('222')

2007-11-19 15:13:55,727 INFO sqlalchemy.engine.base.Engine.0x..50

... SELECT p_union.accessory_info AS p_union_accessory_info,

... p_union.type_ AS p_union_type_, p_union.sku AS p_union_sku,

... p_union.clothing_info AS p_union_clothing_info, p_union.msrp AS

... p_union_msrp

FROM (SELECT accessory.sku AS sku, CAST(NULL AS TEXT) AS

... clothing_info, accessory.msrp AS msrp, accessory.accessory_info

... AS accessory_info, 'A' AS type_

FROM accessory UNION ALL SELECT product.sku AS sku, CAST(NULL AS

... TEXT) AS clothing_info, product.msrp AS msrp, CAST(NULL AS TEXT)

... AS accessory_info, 'P' AS type_

FROM product UNION ALL SELECT clothing.sku AS sku,

... clothing.clothing_info AS clothing_info, clothing.msrp AS msrp,

... CAST(NULL AS TEXT) AS accessory_info, 'C' AS type_

FROM clothing) AS p_union

WHERE p_union.sku = ? ORDER BY p_union.oid

2007-11-19 15:13:55,737 INFO sqlalchemy.engine.base.Engine.0x..50

... ['222']

<Accessory 222>

Joined Table Inheritance Mapping

Joined table inheritance is perhaps the closest to directly mapping the inheritance hierarchy to the database. In joined table inheritance mapping, as in concrete table inheritance mapping, a distinct table is used to map each class. Unlike concrete inheritance mapping, however, each table contains only the columns the attributes added, allowing the row in the “parent” table to take care of inherited attributes, as shown in Figure 8-4.

Joined table inheritance mapping

Figure 8-4. Joined table inheritance mapping

The total set of attributes required to represent an instance are then retrieved by joining along the inheritance hierarchy. In our product database, this would have the following declaration:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('product_type', String(1), nullable=False))

clothing_table = Table(

'clothing', metadata,

Column('sku', None, ForeignKey('product.sku'),

primary_key=True),

Column('clothing_info', String))

accessory_table = Table(

'accessory', metadata,

Column('sku', None, ForeignKey('product.sku'),

primary_key=True),

Column('accessory_info', String))

Notice that we have reintroduced the 'product_type' polymorphic identity column from single table inheritance mapping. In joined table inheritance, this column is only required on the “root” table of the inheritance hierarchy, again to let SQLAlchemy know what type of object to create in a polymorphic load.

The mappers we build are almost identical to the ones we used in the single table inheritance mapping, except that each mapper references a distinct table, whereas all the mappers shared a table in the single-table inheritance case:

mapper(

Product, product_table,

polymorphic_on=product_table.c.product_type,

polymorphic_identity='P')

mapper(Clothing, clothing_table, inherits=Product,

polymorphic_identity='C')

mapper(Accessory, accessory_table, inherits=Product,

polymorphic_identity='A')

We can now perform polymorphic selects just as before:

>>> metadata.bind.echo = True

>>> session.query(Product).all()

2007-11-19 19:51:11,985 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT product.sku AS product_sku, product.msrp AS product_msrp,

... product.product_type AS product_product_type

FROM product ORDER BY product.oid

2007-11-19 19:51:11,985 INFO sqlalchemy.engine.base.Engine.0x..d0 []

2007-11-19 19:51:11,989 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT accessory.sku AS accessory_sku, accessory.accessory_info

... AS accessory_accessory_info

FROM accessory

WHERE ? = accessory.sku

2007-11-19 19:51:11,990 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'222']

2007-11-19 19:51:11,991 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT accessory.sku AS accessory_sku, accessory.accessory_info

... AS accessory_accessory_info

FROM accessory

WHERE ? = accessory.sku

2007-11-19 19:51:11,991 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'333']

2007-11-19 19:51:11,993 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT clothing.sku AS clothing_sku, clothing.clothing_info AS

... clothing_clothing_info

FROM clothing

WHERE ? = clothing.sku

2007-11-19 19:51:11,993 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'789']

2007-11-19 19:51:11,994 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT clothing.sku AS clothing_sku, clothing.clothing_info AS

... clothing_clothing_info

FROM clothing

WHERE ? = clothing.sku

2007-11-19 19:51:11,995 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'111']

[<Product 123>, <Product 456>, <Clothing 789>, <Clothing 111>,

... <Accessory 222>, <Accessory 333>]

As you can see, the various types of products are selected from their tables appropriately. Note, however, that the single query() call yielded not one, but five SELECT statements. This is due to the fact that SQLAlchemy must perform an auxiliary query for each row that represents a child object. The next section shows how we can improve performance in this situation.

Optimizing Performance with Joined Table Inheritance Mapping

As shown previously, the default query strategy for joined table inheritance mapping requires one query to the database to retrieve the “parent” row, and one additional query to retrieve each “child” row. Although this is bandwidth-efficient for small fetches (since only the columns that are actually required are returned from the database), the latency of additional queries can incur significant performance overheads, especially when retrieving large result sets.

There are two main strategies for addressing these performance concerns: deferring the child table loads and using a join with the select_table parameter to the mapper⁠(⁠ ⁠) function.

Using deferred loading

If the child attributes will not be accessed, or will not be accessed frequently, then the child table’s select statements can be deferred until a mapped attribute is accessed. In the previous example, for instance, if we were displaying a table with only the sku and msrp columns, we could eliminate the extra selects by using the polymorphic_fetch parameter to the mapper⁠(⁠ ⁠) function:

mapper(

Product, product_table,

polymorphic_on=product_table.c.product_type,

polymorphic_identity='P',

polymorphic_fetch='deferred')

mapper(Clothing, clothing_table, inherits=Product,

polymorphic_identity='C')

mapper(Accessory, accessory_table, inherits=Product,

polymorphic_identity='A')

Now, when we iterate over all the Products, we see that the auxiliary queries have been eliminated:

>>> session.clear()

>>> session.query(Product).all()

2007-11-19 21:25:44,320 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT product.sku AS product_sku, product.msrp AS product_msrp,

... product.product_type AS product_product_type

FROM product ORDER BY product.oid

2007-11-19 21:25:44,321 INFO sqlalchemy.engine.base.Engine.0x..d0 []

[<Product 123>, <Product 456>, <Clothing 789>, <Clothing 111>,

... <Accessory 222>, <Accessory 333>]

If we access one of the child attributes, then the secondary select executes to retrieve that value:

>>> prod=session.get(Product, '789')

>>> print prod.clothing_info

2007-11-19 21:27:11,856 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT clothing.sku AS clothing_sku, clothing.clothing_info AS

... clothing_clothing_info

FROM clothing

WHERE ? = clothing.sku

2007-11-19 21:27:11,856 INFO sqlalchemy.engine.base.Engine.0x..d0

... [u'789']

Nice Pants

Using select_table

Although using deferred polymorphic fetching alleviates some of the performance problems with joined table inheritance, it still does not help in the case where you need attributes from the child table. In this case, you can simply use the select_table parameter with the mapper(), similar to the way we used it with concrete table inheritance and the polymorphic_union() function. In this case, however, because of the foreign key relationships between parent and child tables, we can simply use an outerjoin():

pjoin = product_table

pjoin = pjoin.outerjoin(clothing_table)

pjoin = pjoin.outerjoin(accessory_table)

mapper(

Product, product_table,

polymorphic_on=product_table.c.product_type,

polymorphic_identity='P',

select_table=pjoin)

mapper(Clothing, clothing_table, inherits=Product,

polymorphic_identity='C')

mapper(Accessory, accessory_table, inherits=Product,

polymorphic_identity='A')

Now, when we iterate over all Products, we have access to all attributes of all child classes in a single query:

>>> session.clear()

>>> for prod in session.query(Product):

... if hasattr(prod, 'clothing_info'):

... print '%s : %s' % (prod, prod.clothing_info)

... elif hasattr(prod, 'accessory_info'):

... print '%s : %s' % (prod, prod.accessory_info)

... else:

... print prod

...

2007-11-19 21:35:11,193 INFO sqlalchemy.engine.base.Engine.0x..d0

... SELECT product.sku AS product_sku, clothing.sku AS clothing_sku,

... accessory.sku AS accessory_sku, product.msrp AS product_msrp,

... product.product_type AS product_product_type,

... clothing.clothing_info AS clothing_clothing_info,

... accessory.accessory_info AS accessory_accessory_info

FROM product LEFT OUTER JOIN clothing ON product.sku = clothing.sku

... LEFT OUTER JOIN accessory ON product.sku = accessory.sku ORDER

... BY product.oid

2007-11-19 21:35:11,194 INFO sqlalchemy.engine.base.Engine.0x..d0 []

<Product 123>

<Product 456>

<Clothing 789> : Nice Pants

<Clothing 111> : Nicer Pants

<Accessory 222> : Wallet

<Accessory 333> : Belt

Relations and Inheritance

In the cases of single table and joined table inheritance mapping, relations “just work” in SQLAlchemy. In particular, it is possible for a mapped class to declare a relation to a class that is part of an inheritance hierarchy (a “polymorphic class”), and have that relation comprise instances of various child classes. This setup is shown in the following listing, where inventory information is added to our schema:

store_table = Table(

'store', metadata,

Column('id', Integer, primary_key=True),

Column('name', String))

inventory_table = Table(

'inventory', metadata,

Column('store_id', None, ForeignKey('store.id')),

Column('product_id', None, ForeignKey('product.sku')),

Column('quantity', Integer, default=0)

class Store(object): pass

class Inventory(object): pass

mapper(Store, store_table, properties=dict(

inventory=relation(Inventory, backref='store')))

mapper(Inventory, inventory_table, properties=dict(

product=relation(Product, backref='inventory')))

It is also possible to declare relations on a polymorphic class at any level of the inheritance hierarchy, and those relations will be inherited by the child classes. In the previous example, for instance, the Clothing and Accessory classes inherit the backref to their Inventory records.

In concrete table inheritance, mapping relations to a “parent class” is more difficult because there is no unique table to join to. For instance, it is possible to implement one-to-many and one-to-one joins where the polymorphic class has a foreign key into another table. As an example, if we introduced a “vendor” table identifying the manufacturer of all products, we could relate it to the Product hierarchy as follows:

vendor_table = Table(

'vendor', metadata,

Column('id', Integer, primary_key=True),

Column('name', String))

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('vendor_id', None, ForeignKey('vendor.id'))

clothing_table = Table(

'clothing', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('vendor_id', None, ForeignKey('vendor.id'),

Column('clothing_info', String))

accessory_table = Table(

'accessory', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric),

Column('vendor_id', None, ForeignKey('vendor.id'),

Column('accessory_info', String))

punion = polymorphic_union(

dict(P=product_table,

C=clothing_table,

A=accessory_table),

'type_')

mapper(

Product, product_table, select_table=punion,

polymorphic_on=punion.c.type_,

polymorphic_identity='P')

mapper(Clothing, clothing_table, inherits=Product,

polymorphic_identity='C',

concrete=True)

mapper(Accessory, accessory_table, inherits=Product,

polymorphic_identity='A',

concrete=True)

class Vendor(object): pass

mapper(Vendor, vendor_table, properties=dict(

products=relation(Product)))

The main limitation with relations and concrete table inheritance mapping is that relations from the polymorphic classes (rather than to them, as shown previously) are not inherited and must therefore be configured individually for each mapper. This includes all many-to-many relations, as the secondary join condition (and probably the secondary table as well) is different depending on which child class is being related to.

Nonpolymorphic Inheritance

All of the inheritance relationships shown so far were implemented using SQLAlchemy’s polymorphic loading. If polymorphic loading is not desired, either because of its overhead or because you always know what types of classes you will be fetching, it is possible to use nonpolymorphic loading by omitting all of the polymorphic_* parameters from the mappers.

Nonpolymorphic loading will always return the type of object being selected in the case of a query (never the child class, as polymorphic loading does). Relations to nonpolymorphic classes also apply only to the actual class being mapped, not to its descendants. Polymorphic loading is much more flexible than nonpolymorphic loading, and therefore should probably be selected unless the performance overhead is prohibitive.