SqlSoup: An Automatic Mapper for SQLAlchemy - Essential SQLAlchemy, 2nd Edition (2010)

Essential SQLAlchemy, 2nd Edition (2010)

Chapter 10. SqlSoup: An Automatic Mapper for SQLAlchemy

This chapter describes SqlSoup, an extension to SQLAlchemy that provides automatic mapping of introspected tables. You will learn how to use SqlSoup to map to an existing database and how to perform queries and updates. Finally, the chapter will describe the pros and cons of using SQLSoup, Elixir, or “bare” SQLAlchemy in your application.

Introduction to SqlSoup

If Elixir is ideally suited for blue sky, legacy-free development, SqlSoup is ideally suited for connecting to legacy databases. In fact, SqlSoup provides no method of defining a database schema through tables, classes, and mappers; it uses extensive autoloading to build the SQLAlchemy constructs (Tables, classes, and mapper⁠(⁠)s) automatically from an existing database.

To illustrate the uses of SQLAlchemy in this chapter, we will use the following SQLAlchemy-created schema. Note that, unlike in previous chapters, we will be saving the test database in an on-disk SQLite database rather than using an in-memory database, to illustrate the fact that SqlSoup relies entirely on auto loading:

from sqlalchemy import *

engine = create_engine('sqlite:///chapter10.db')

metadata = MetaData(engine)

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

store_table = Table(

'store', metadata,

Column('id', Integer, primary_key=True),

Column('name', Unicode(255)))

product_price_table = Table(

'product_price', metadata,

Column('sku', None, ForeignKey('product.sku'), primary_key=True),

Column('store_id', None, ForeignKey('store.id'), primary_key=True),

Column('price', Numeric, default=0))

metadata.create_all()

stmt = product_table.insert()

stmt.execute([dict(sku="123", msrp=12.34),

dict(sku="456", msrp=22.12),

dict(sku="789", msrp=41.44)])

stmt = store_table.insert()

stmt.execute([dict(name="Main Store"),

dict(name="Secondary Store")])

stmt = product_price_table.insert()

stmt.execute([dict(store_id=1, sku="123"),

dict(store_id=1, sku="456"),

dict(store_id=1, sku="789"),

dict(store_id=2, sku="123"),

dict(store_id=2, sku="456"),

dict(store_id=2, sku="789")])

In order to use SqlSoup, we must first create an instance of the SqlSoup class. This instance must be created either with an existing MetaData instance as its first argument, or with the same arguments as SQLAlchemy’s MetaData class. In our case, we will pass in a database URI to use in autoloading the tables:

>>> from sqlalchemy.ext.sqlsoup import SqlSoup

>>> db = SqlSoup('sqlite:///chapter10.db')

If we wish to restrict the set of tables loaded to a particular schema (in databases that support this), we can specify it by setting the as db.schema attribute. Because we’re using SQLite, there is no need to specify a schema.

To access the tables we’ve defined in the database, simply use attribute access from the SqlSoup instance we’ve created:

>>> print db.product.all()

[MappedProduct(sku='123',msrp=Decimal("12.34")),

... MappedProduct(sku='456',msrp=Decimal("22.12")),

... MappedProduct(sku='789',msrp=Decimal("41.44"))]

>>> print db.product.get('123')

MappedProduct(sku='123',msrp=Decimal("12.34"))

Note that there was no mapper or table setup required to retrieve the objects (other than when we first created the database!). The following sections describe in more detail how you can use SqlSoup.

Using SqlSoup for ORM-Style Queries and Updates

You may have noticed in the previous section that when we queried the db.product table, rather than being served with RowProxy objects as in regular SQLAlchemy, we were served with MappedProduct instances. This is because technically we’re not selecting from the product table; we’re selecting from the automatically created and mapped MappedProduct class, created from the product table.

The MappedProduct class provides a basic mapping of the columns of the table to the properties of the class. It also provides a query property, similar to the Elixir query property, which provides access to a session query for the MappedProduct. It also provides insert(), delete(), and update() methods for modifying the underlying data. To create a new product, for instance, we can do the following:

>>> newprod = db.product.insert(sku='111', msrp=22.44)

>>> db.flush()

>>> db.clear()

>>> db.product.all()

[MappedProduct(sku='123',msrp=Decimal("12.34")),

... MappedProduct(sku='456',msrp=Decimal("22.12")),

... MappedProduct(sku='789',msrp=Decimal("41.44")),

... MappedProduct(sku='111',msrp=Decimal("22.44"))]

You may have noticed in the previous example that we accessed the session-like methods flush() and clear() on the SqlSoup instance. SqlSoup strives to provide a rich set of functionality with a limited set of interfaces, namely the SqlSoup instance and automatically mapped classes. As such, the SqlSoup instance provides several session-like functions as well as providing access to the automatically mapped classes:

bind(attribute)

The underlying Engine or Connectable for this SqlSoup instance.

schema(attribute)

Use the specified schema name for auto loading and automatically mapping tables.

clear(self )

Call the underlying contextual session’s clear() method.

delete(self, *args, **kwargs)

Call the underlying contextual session’s delete() method with the specified arguments.

flush(self)

Call the underlying contextual session’s flush() method.

join(self, *args, *kwargs)

Call SQLAlchemy’s join() function with the specified arguments and return an automatically mapped object corresponding to rows of the generated join.

map(self, selectable, *kwargs)

Automatically map an arbitrary selectable, returning the generated mapped class.

with_labels(self, item)

Add labels to the columns of item (generally a join) based on the name of their table of origin. This is useful when mapping joins between two tables with the same column names.

You may have also noticed that the MappedProduct class provides some query-like methods. In fact, the MappedProduct class (and other automatically mapped classes) uses some __getattr__() magic to forward all unrecognized attribute and method access to its query attribute. Automatically mapped classes also provide some data manipulation functions for use in updating the underlying table:

c (attribute)

The c attribute of the underlying table.

query (attribute)

An ORM-based query object on the automatically mapped class.

_table (attribute)

The underlying selectable to this automatically mapped object. Useful when dropping to the SQL layer in SqlSoup queries.

column_name (attribute)

The SQLAlchemy property object of the automatically mapped column.

delete(cls, *args, **kwargs) (classmethod)

Execute a delete() on the underlying table with the given arguments.

insert(cls, **kwargs) (classmethod)

Execute an insert() on the underlying table with the given arguments, and return a newly constructed instance of the automatically mapped class.

update(cls, whereclause=None, values=None, **kwargs) (classmethod)

Execute an update() on the underlying table with the given arguments.

SqlSoup and Relations

The short story on SqlSoup and SQLAlchemy relation()s is that they are not supported. Although SqlSoup can make reasonable assumptions about how to automatically map columns to classes, inferring the correct relations, relation names, and relation options is currently beyond its capabilities. SqlSoup does, however, fully support manually creating joins between tables and mapping the resulting selectable object. This feature is covered next in Joins with SqlSoup.”

Joins with SqlSoup

The SqlSoup object provides a join() method, described briefly in the list under the section Using SqlSoup for ORM-Style Queries and Updates,” earlier in this chapter. This method is actually just a thin wrapper on the SQLAlchemy join() function that creates an automatically mapped class for the resulting selectable. To join between the product and product_price tables, for example, we could use the following code, taking care to use the isouter=True to ensure we get a LEFT OUTER JOIN:

>>> join1 = db.join(db.product, db.product_price, isouter=True)

>>> join1.all()

[MappedJoin(sku='123',msrp=Decimal("12.34"),store_id=1,

... price=Decimal("0")),

... MappedJoin(sku='123',msrp=Decimal("12.34"),store_id=2,

... price=Decimal("0")),

... MappedJoin(sku='456',msrp=Decimal("22.12"),store_id=1,

... price=Decimal("0")),

... MappedJoin(sku='456',msrp=Decimal("22.12"),store_id=2,

... price=Decimal("0")),

... MappedJoin(sku='789',msrp=Decimal("41.44"),store_id=1,

... price=Decimal("0")),

... MappedJoin(sku='789',msrp=Decimal("41.44"),store_id=2,

... price=Decimal("0")),

... MappedJoin(sku='111',msrp=Decimal("22.44"),store_id=None,price=None)]

In order to chain the join object to other tables, just use the join() method again:

>>> join2 = db.join(join1, db.store, isouter=True)

>>> join2.all()

[MappedJoin(sku='123',msrp=Decimal("12.34"),store_id=1,

... price=Decimal("0"),id=1,name='Main Store'),

... MappedJoin(sku='123',msrp=Decimal("12.34"),store_id=2,

... price=Decimal("0"),id=2,name='Secondary Store'),

... MappedJoin(sku='456',msrp=Decimal("22.12"),store_id=1,

... price=Decimal("0"),id=1,name='Main Store'),

... MappedJoin(sku='456',msrp=Decimal("22.12"),store_id=2,

... price=Decimal("0"),id=2,name='Secondary Store'),

... MappedJoin(sku='789',msrp=Decimal("41.44"),store_id=1,

... price=Decimal("0"),id=1,name='Main Store'),

... MappedJoin(sku='789',msrp=Decimal("41.44"),store_id=2,

... price=Decimal("0"),id=2,name='Secondary Store'),

... MappedJoin(sku='111',msrp=Decimal("22.44"),store_id=None,price=None,

... id=None,name=None)]

In some cases, it’s nice to label the columns according to their table of origin. To accomplish this, use the with_labels() SqlSoup method:

>>> join3 = db.with_labels(join1)

>>> join3.first()

MappedJoin(product_sku='123',product_msrp=Decimal("12.34"),

... product_price_sku='123',product_price_store_id=1,

... product_price_price=Decimal("0"))

>>> db.with_labels(join2).first()

MappedJoin(product_sku='123',product_msrp=Decimal("12.34"),

... product_price_sku='123',product_price_store_id=1,

... product_price_price=Decimal("0"),store_id=1,store_name='Main

... Store')

It is also possible to label a mapped table and then use the labeled table in joins:

>>> labelled_product = db.with_labels(db.product)

>>> join4 = db.join(labelled_product, db.product_price,

... isouter=True)

>>> join4.first()

MappedJoin(product_sku='123',product_msrp=Decimal("12.34"),sku='123',

... store_id=1,price=Decimal("0"))

Note that the columns from db.product are labeled, whereas the columns from db.product_price are not.

Mapping Arbitrary Selectables

Simple tables and joins are supported in SqlSoup, but what about mapping more complex selectables? The automatically mapping machinery of SqlSoup is actually exposed via the SqlSoup map() method. For instance, if we wished to add a column for the average price of a product over all the stores in which it is sold, we might write the following SQL-layer SQLAlchemy query:

>>> db.clear()

>>>

>>> from sqlalchemy import *

>>>

>>> join5 = db.join(db.product, db.product_price)

>>>

>>> s = select([db.product._table,

... func.avg(join5.c.price).label('avg_price')],

... from_obj=[join5._table],

... group_by=[join5.c.sku])

>>> s = s.alias('products_with_avg_price')

>>> products_with_avg_price = db.map(s, primary_key=[join5.c.sku])

>>> products_with_avg_price.all()

[MappedJoin(sku='123',msrp=Decimal("12.34"),avg_price=0.0),

... MappedJoin(sku='456',msrp=Decimal("22.12"),avg_price=0.0),

... MappedJoin(sku='789',msrp=Decimal("41.44"),avg_price=0.0)]

>>>

>>> db.product_price.first().price = 50.00

>>> db.flush()

>>> db.clear()

>>> products_with_avg_price.all()

[MappedJoin(sku='123',msrp=Decimal("12.34"),avg_price=25.0),

... MappedJoin(sku='456',msrp=Decimal("22.12"),avg_price=0.0),

... MappedJoin(sku='789',msrp=Decimal("41.44"),avg_price=0.0)]

A common usage pattern is to add such mapped selectables to the SqlSoup instance for access in other parts of the application:

>>> db.products_with_avg_price = products_with_avg_price

There’s no magic here; this is just Python’s ability to declare new, ad-hoc attributes on existing objects. Do note that if you happen to add an attribute with the same name as a table in your database, SqlSoup will not be able to access that table until you remove the new attribute.

Directly Accessing the Session

Although SqlSoup provides access to most session-oriented functionality through the SqlSoup object, it is possible to access the underlying SQLAlchemy contextual session through the global SqlSoup object objectstore:

>>> from sqlalchemy.ext.sqlsoup import objectstore

>>> session = objectstore.current

>>> print session

<sqlalchemy.orm.session.Session object at 0x2ae69954f210>

Using SqlSoup for SQL-Level Inserts, Updates, and Deletes

As mentioned in the list of automatically mapped class attributes and methods, mapped classes contain insert(), update(), and delete() methods. These are just thin wrappers around the corresponding methods on the underlying table. If we wanted to set the price for all products in all stores to their MSRP, for instance, we could do the following:

>>> msrp=select([db.product.c.msrp],

... db.product.sku==db.product_price.sku)

>>> db.product_price.update(

... values=dict(price=msrp))

>>> db.product_price.all()

[MappedProduct_price(sku='123',store_id=1,price=Decimal("12.34")),

... MappedProduct_price(sku='456',store_id=1,price=Decimal("22.12")),

... MappedProduct_price(sku='789',store_id=1,price=Decimal("41.44")),

... MappedProduct_price(sku='123',store_id=2,price=Decimal("12.34")),

... MappedProduct_price(sku='456',store_id=2,price=Decimal("22.12")),

... MappedProduct_price(sku='789',store_id=2,price=Decimal("41.44"))]

We can similarly use the insert() and delete() method to perform SQL-level inserts and deletes.

When to Use SqlSoup Versus Elixir Versus “Bare” SQLAlchemy

As we’ve discussed before, SqlSoup is useful when it’s necessary to use an existing database, whereas Elixir is most useful when Elixir is the primary definition of the schema. This section compares SqlSoup and Elixir with “Bare” SQLAlchemy and gives the advantages and disadvantages of each.

SqlSoup Pros and Cons

Generally speaking, SqlSoup has the following pros and cons in comparison to “base” SQLAlchemy:

Succinct usage

SqlSoup requires very little code to get started: just a database URI, and you’re ready to go. Raw SQLAlchemy is much more verbose, requiring setup for tables, mappers, and mapped classes. Even if you’re using autoloading with SQLAlchemy, it still requires you to set up your mappers and mapped classes if you wish to use the ORM.

Ad-hoc queries and mappers

Due to the ease of setting up SqlSoup, it is much more convenient to create queries and mappings from joins and other selectable objects.

Rich relation support

Because SqlSoup’s method of inference does not support SQLAlchemy relation()s, it is not well-suited to schemas where it is more convenient to use mapped properties to implement relations between tables.

Adding behavior to mapped objects

Because SqlSoup creates its own automatically mapped classes, it is inconvenient to have domain logic from your application attached as methods. In this regard, SqlSoup mapped objects are little more than “smart rows” allowing convenient access to the database, with little ability to model domain objects.

Flexibility

The convenience of SqlSoup generally comes at the cost of flexibility. Various SQLAlchemy ORM-level features, such as synonyms and relations, are either unsupported or not well supported.

Elixir Pros and Cons

Generally speaking, Elixir has the following pros and cons in comparison to “base” SQLAlchemy:

Succinct usage

Although not as terse as SqlSoup, Elixir generally requires less code than raw SQLAlchemy to implement similar functionality. This is particularly true when using the associable and versioned extensions, for instance.

Rapid model development

Because Elixir generally sees itself as the keeper of the schema, it can be more aggressive in what types of schema it supports. When using the associable extension, for instance, it is possible to create auxiliary tables with a single DSL line of code. This allows complex schema to be developed rapidly when your application is first being written.

Clear separation of concerns

Due to the data mapper pattern used in SQLAlchemy (rather than the active record pattern used in Elixir), it is clear where the database schema resides (in theTable⁠(⁠) classes), where the application logic resides (in the mapped classes), and where the mapping occurs (in the mapper()configuration). Elixir puts all this information into the Entity classes, making it a bit more difficult to separate these concerns.

Ability to use or migrate existing schemas

Elixir’s aggressiveness in defining new tables and columns implicitly based on DSL statements in the Entity classes can make it challenging to use with an existing database. In such a situation, it’s important to be aware of what schema changes are implied by each change to theEntity classes and/or to have access to a schema migration tool that can assist in migrating existing databases.

Flexibility

Raw SQLAlchemy’s win over Elixir is much more limited than its win over SqlSoup, mainly because Elixir provides convenient ways to “drop into” the underlying SQLAlchemy tables, mappers, and classes. SQLAlchemy still wins on flexibility over Elixir, however, as it is, in fact, necessary to drop into regular SQLAlchemy to model some things when using Elixir.