Running Queries and Updates - Essential SQLAlchemy, 2nd Edition (2010)

Essential SQLAlchemy, 2nd Edition (2010)

Chapter 5. Running Queries and Updates

SQLAlchemy provides a rich Pythonic interface for constructing SQL updates and queries, known as the SQL Expression Language. This language is based around the concept of an SQL statement, which represents some database-independent SQL syntax that may have one or more bind variables, and that can be executed on an SQL Engine or other Connectable. This chapter introduces the various kinds of data manipulation supported by SQLAlchemy (SQL INSERT, UPDATE, and DELETE) and performed on the query interface (SQL SELECT).

Inserts, Updates, and Deletes

Insert, Update, and Delete constructs are created in SQLAlchemy via the Table methods insert, update, and delete, or via the insert, update, and delete functions. The functionality of these data manipulation language (DML) constructs is equivalent, regardless of whether they are constructed via methods or functions; the distinction is a question of style more than substance.

Although each DML construct has its own particulars regarding construction, they all end up generating a Statement. We can inspect the SQL text corresponding to the statement by printing it out:

>>> metadata=MetaData()

>>>

>>> simple_table = Table(

... 'simple', metadata,

... Column('id', Integer, primary_key=True),

... Column('col1', Unicode(20)))

>>>

>>> stmt = simple_table.insert()

>>> print stmt

INSERT INTO simple (id, col1) VALUES (:id, :col1)

Note in the previous example that SQLAlchemy has created bind parameters for each of the columns in the table we created in the insert statement. We can examine the bind parameters in a statement by compiling the statement and looking at its params attribute:

>>> compiled_stmt = stmt.compile()

>>> print compiled_stmt.params

ClauseParameters:{'id': None, 'col1': None}

To execute the statement, we can directly execute it on an Engine, or we can bind the MetaData used to construct the statement and use the MetaData’s engine:

>>> engine = create_engine('sqlite://')

>>> simple_table.create(bind=engine)

>>> engine.execute(stmt, col1="Foo")

<sqlalchemy.engine.base.ResultProxy object at 0x2b3210b00f10>

>>> metadata.bind = engine

>>> stmt.execute(col1="Bar")

<sqlalchemy.engine.base.ResultProxy object at 0x2b3210b020d0>

Note that the bind parameter values are supplied to the execute() method as keyword parameters. These parameters can either be supplied either directly to the execute() method or in the statement construction phase:

>>> stmt = simple_table.insert(values=dict(col1="Initial value"))

>>> print stmt

INSERT INTO simple (col1) VALUES (?)

>>> compiled_stmt = stmt.compile()

>>> print compiled_stmt.params

ClauseParameters:{'col1': 'Initial value'}

If parameters are supplied in the statement construction and the execute() call, the parameters supplied with the execute() call override those supplied when creating the statement.

Insert Statements

The Insert construct is perhaps the simplest. In order to create an Insert statement, you can use the Table.insert() method or the insert() function. (The method is actually just a wrapper for the function.) The insert takes two arguments: the table into which a row is being inserted, and an optional dictionary of values to be inserted. Each key in the dictionary represents a column and may be either the metadata Column object or its string identifier. The values provided can be one of the following:

§ A literal Python value to be inserted.

§ An SQL expression to be inserted, such as func.current_timestamp⁠(⁠ ⁠), which will create the SQL INSERT INTO simple2 (col1, col2) VALUES (?, current_timestamp).

§ A Select statement (covered later in this chapter). In this case, the value to be inserted is provided by a subquery.

If we wish to insert multiple rows into the table, we can create an insert statement and execute it multiple times with different bind parameters:

>>> stmt = simple_table.insert()

>>> stmt.execute(col1="First value")

<sqlalchemy.engine.base.ResultProxy object at 0xd0a490>

>>> stmt.execute(col1="Second value")

<sqlalchemy.engine.base.ResultProxy object at 0xd0a050>

>>> stmt.execute(col1="Third value")

<sqlalchemy.engine.base.ResultProxy object at 0xd0a3d0>

It is also possible to use the DB-API’s executemany() to insert multiple rows in one database call. To do this, simply provide an list (or other iterable) of binding dictionaries to the execute() method on the statement or engine:

>>> stmt.execute([dict(col1="Fourth Value"),

... dict(col1="Fifth Value"),

... dict(col1="Sixth Value")])

<sqlalchemy.engine.base.ResultProxy object at 0xd0a310>

Update Statements

Update statements are similar to inserts, except that they can specify a “where” clause that indicates which rows to update. Like insert statements, update statements can be created by either the update() function or the update() method on the table being updated. The only parameters to the update() function are the table being updated (omitted if using the update() method), the where clause, and the values to be set.

The where clause of the update() query can be a SQL clause object (covered later in this chapter) or a text string specifying the update condition. In order to update every row of a table, you can simply leave off the where clause. To update this simple table, we can execute the following statement:

>>> stmt = simple_table.update(

... whereclause=text("col1='First value'"),

... values=dict(col1='1st Value'))1

>>> stmt.execute()

<sqlalchemy.engine.base.ResultProxy object at 0xc77910>

>>> stmt = simple_table.update(text("col1='Second value'"))

>>> stmt.execute(col1='2nd Value') 2

...

<sqlalchemy.engine.base.ResultProxy object at 0xc77950>

>>> stmt = simple_table.update(text("col1='Third value'"))

>>> print stmt

UPDATE simple SET id=?, col1=? WHERE col1='Third value'

... 3

>>> engine.echo = True

>>> stmt.execute(col1='3rd value')

2007-09-25 08:57:11,253 INFO sqlalchemy.engine.base.Engine.0x..d0

... UPDATE simple SET col1=? WHERE col1='Third value'

2007-09-25 08:57:11,254 INFO sqlalchemy.engine.base.Engine.0x..d0

... ['3rd value']

2007-09-25 08:57:11,255 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0xc77990>

1

Here, we create an UPDATE statement, complete with both values to update and a where clause.

2

Here, the where clause is bound when the statement is created, but the actual values to be updated are passed to the execute() method.

3

Note that prior to execution, the SQL has a bind parameter for the id column, but when the statement is executed, id is omitted because no value was provided for it.

Correlated update statements can also be generated using the SQL expression language. A correlated update is an update whose values are provided by a select statement. Suppose that we have a product catalog with the schema in the following listing, and the data in Tables 5-1 through 5-3:

product_table = Table(

'product', metadata,

Column('sku', String(20), primary_key=True),

Column('msrp', Numeric))

store_table = Table(

'store', metadata,

Column('id', Integer, primary_key=True),

Column('name', Unicode(255)))

product_price_table = Table(

'product_price', metadata,

Column('sku', None, ForeignKey('product.sku'), primary_key=True),

Column('store_id', None, ForeignKey('store.id'), primary_key=True),

Column('price', Numeric, default=0))

Table 5-1. Contents of product table

sku

msrp

"123"

12.34

"456"

22.12

"789"

41.44

Table 5-2. Contents of store table

id

name

1

"Main Store"

2

"Secondary Store"

Table 5-3. Contents of product_price table (initial)

sku

store_id

price

"123"

1

0

"456"

1

0

"789"

1

0

"123"

2

0

"456"

2

0

"789"

2

0

If we wish to globally set the price for all products in all stores to their MSRP price, we could execute the following update:

>>> msrp = select(

... [product_table.c.msrp],

... product_table.c.sku==product_price_table.c.sku,

... limit=1)

>>> stmt = product_price_table.update(

... values=dict(price=msrp))

>>> stmt.execute()

2007-09-26 10:05:17,184 INFO sqlalchemy.engine.base.Engine.0x..d0

... UPDATE product_price SET price=(SELECT product.msrp

FROM product

WHERE product.sku = product_price.sku

LIMIT 1 OFFSET 0)

2007-09-26 10:05:17,184 INFO sqlalchemy.engine.base.Engine.0x..d0

... []

2007-09-26 10:05:17,185 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0xd0e510>

This would cause the updated product_price_table to contain the values in Table 5-4.

Table 5-4. Contents of product_price_table (after update)

sku

store_id

price

"123"

1

12.34

"456"

1

22.12

"789"

1

41.44

"123"

2

12.34

"456"

2

22.12

"789"

2

41.44

Delete Statements

The Delete construct is used to delete data from the database. To create a Delete construct, you can use either the delete() function or the delete() method on the table from which you are deleting data. Unlike insert() and update(), delete() takes no values parameter, only an optional where clause (omitting the where clause will delete all rows from the table). To delete all rows from the product_price table for sku 123, in the previous section, for instance, we would execute the code as shown here:

>>> stmt = product_price_table.delete(

... text("sku='123'"))

>>> stmt.execute()

2007-09-27 19:22:51,612 INFO sqlalchemy.engine.base.Engine.0x..d0

... DELETE FROM product_price WHERE sku='123'

2007-09-27 19:22:51,612 INFO sqlalchemy.engine.base.Engine.0x..d0

... []

2007-09-27 19:22:51,613 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0xd92510>

Queries

The real power of the SQL expression language is in its query interface. This includes the actual queries (SQL “SELECT” statements) as well as the syntax for specifying “WHERE” clauses (which may be used in UPDATEs and DELETEs, as well).

The goal of the SQL expression language, like the goal of SQLAlchemy in general, is to provide functionality that doesn’t “get in your way” when you need to be more specific about the SQL you need. In that vein, you can always use the Text construct (used previously in the UPDATE and DELETE examples) to specify the exact SQL text you would like to use. For most operations, however, the SQL expression language makes for a succinct, secure, and less error-prone way of expressing your queries.

Basic Query Construction

SQLAlchemy makes simple SQL queries easy to express, while also enabling the construction of quite complex queries in a straightforward manner. This section describes the basic building blocks of query construction in SQLAlchemy.

The select⁠(⁠ ⁠) function versus the select⁠(⁠ ⁠) method

Like the DML statements INSERT, UPDATE, and DELETE, SELECT statements can be generated using either a function or a Table method. Unlike the DML statements, however, there is a minor difference in functionality between the select() function and the Table.select()method. The select() function requires you to specify which columns you want in your result set. So, to select one column from the product_table shown previously, you could use the select() function:

>>> stmt = select([product_table.c.sku])

>>> for row in stmt.execute():

... print row

...

(u'123',)

(u'456',)

(u'789',)

To select all columns from the product_table, you would use the Table.select() method:

>>> stmt = product_table.select()

>>> for row in stmt.execute():

... print row

...

(u'123', Decimal("12.34"))

(u'456', Decimal("22.12"))

(u'789', Decimal("41.44"))

To achieve the same result using the select() function, simply provide the table in lieu of columns:

>>> stmt = select([product_table])

>>> for row in stmt.execute():

... print row

...

(u'123', Decimal("12.34"))

(u'456', Decimal("22.12"))

(u'789', Decimal("41.44"))

The actual parameters used by select() are listed next. They are discussed in more detail later in the chapter.

columns=None

A list of ClauseElement structures to be returned from the query.

bind=None

An engine on a connectable object on which to execute the statement. If this is omitted, an engine binding will be inferred from referenced columns and/or tables, if possible.

whereclause=None

A ClauseElement expression used to for the WHERE clause.

from_obj=[]

A list of Tables or other selectable objects that will be used to form the FROM clause. If this is not specified, the FROM clause is inferred from the tables referenced in other clauses.

order_by=None

A list of ClauseElements used to construct the ORDER BY clause.

group_by=None

A list of ClauseElements used to construct the GROUP BY clause.

having=None

A ClauseElement used to construct the HAVING clause.

distinct=False

Adds a DISTINCT qualifier to the column list in the SELECT statement.

for_update=False

Adds a FOR UPDATE qualifier to the SELECT statement. Some databases support other values for this parameter, such as MySQL, which supports "read" (translating to LOCK IN SHARE MODE), or Oracle, which supports "nowait" (translating to FOR UPDATE NOWAIT).

limit=None

The numerical limit for the number of rows returned. Typically this uses the LIMIT clause, but SQLAlchemy provides some support for LIMIT even when the underlying database does not support it directly.

offset=None

The numerical offset for the starting row that is returned. Typically this uses the OFFSET clause, but SQLAlchemy provides some support for OFFSET even when the underlying database does not support it directly.

correlate=True

Indicates that this SELECT statement is to be “correlated” with its enclosing SELECT statement if it is used as a subquery. In particular, any selectables present in both this statement’s from_obj list and the enclosing statement’s from_obj list will be omitted from this statement’s FROM clause.

use_labels=False

Generates unique labels for each column in the columns list, to ensure there are no name collisions.

prefixes=None

A list of ClauseElements to be included directly after the SELECT keyword in the generated SQL. This is used for dialect-specific SQL extensions, to insert text between the SELECT keyword and the column list.

Result set objects

Thus far, we have glossed over the return value of the execute() method on SQL statements, showing only that it is possible to iterate over this value and receive tuple-like objects. In fact, SQLAlchemy provides an object, defined in the ResultProxy class, to allow cursor-like access to the results of a query. Some of the useful methods and attributes available on the ResultProxy object are summarized here:

fetchone ()

Fetch one result from the cursor.

fetchmany (size=None)

Fetch several results from the cursor (if size is omitted, fetch all results).

fetchall ()

Fetch all results from the cursor.

__iter__ ()

Return an iterator through the result set.

close ()

Close the ResultProxy, as well as the underlying cursor. This method is called automatically when all result rows are exhausted.

scalar ()

Fetch the first column of the first row, and close the result set (useful for queries such as “SELECT DATETIME('NOW')”).

rowcount (valid only for DML statements)

Return the number of rows updated, deleted, or inserted by the statement.

The “rows” returned from a ResultProxy object, either via the fetch*() methods or iteration, is actually a RowProxy object. As we have seen previously, it supports a tuple-like interface. We can also retrieve columns from the RowProxy object through its dict-like interface or itsobject-like interface:

>>> result = select([product_table]).execute()

>>> row = result.fetchone()

>>> print row

(u'123', 12.34)

>>> print row[0]

123

>>> print row['sku']

123

>>> print row[product_table.c.sku]

123

>>> print row.sku

123

>>> print row.items()

[('sku', u'123'), ('msrp', 12.34)]

>>> print row.keys()

['sku', 'msrp']

>>> print row.values()

[u'123', 12.34]

>>> print row.has_key('msrp')

True

>>> print row.has_key('price')

False

Operators and functions in WHERE clauses

To actually construct a SELECT statement with a WHERE clause, we can use either the Text construct (as shown previously) or the SQL expression language. The easiest way to use the SQL expression language to generate a WHERE clause is to use SQLAlchemy-provided operator overloading on the Column class:

>>> x = product_table.c.sku=="123"

>>> print type(x)

<class 'sqlalchemy.sql._BinaryExpression'>

>>> print x

product.sku = ?

>>> stmt=product_table.select(product_table.c.sku=="123")

>>> print stmt

SELECT product.sku, product.msrp

FROM product

WHERE product.sku = ? 1

...

>>> print stmt.execute().fetchall()

2007-09-30 16:34:44,800 INFO sqlalchemy.engine.base.Engine.0x..10

... SELECT product.sku, product.msrp

FROM product

WHERE product.sku = ?

2007-09-30 16:34:44,800 INFO sqlalchemy.engine.base.Engine.0x..10

... ['123']2

[(u'123', 12.34)]

1

Note that the “123” literal has been replaced by a “?” placeholder. This is an example of SQLAlchemy using a bind parameter. By using bind parameters, SQLAlchemy ensures that the entire SQL string passed to the database driver was constructed by SQLAlchemy, and that it is safe from SQL-injection attacks. (Of course, this can be subverted via the Text construct, which passes whatever the programmer specifies to the database driver.)

2

Here, SQLAlchemy provides the value of the bind parameter to the database driver directly.

All SQLAlchemy-provided operators generate a ClauseElement-derived object as a result of the operation. ClauseElements provide the overloaded operators (and other SQL-constructing features) of the SQL expression language. This allows complex SQL expressions to be built up from complex Python expressions. SQLAlchemy provides overloading for most of the standard Python operators. This includes all the standard comparison operators (==, !=, <, >, <=, >=). Note in particular the conversion of “== None” to “IS NULL”.

>>> print product_price_table.c.price == 12.34

product_price.price = ?

>>> print product_price_table.c.price != 12.34

product_price.price != ?

>>> print product_price_table.c.price < 12.34

product_price.price < ?

>>> print product_price_table.c.price > 12.34

product_price.price > ?

>>> print product_price_table.c.price <= 12.34

product_price.price <= ?

>>> print product_price_table.c.price >= 12.34

product_price.price >= ?

>>> print product_price_table.c.price == None

product_price.price IS NULL

Support is also provided for the arithmetic operators (+, -, *, /, and %), with special support for database-independent string concatenation:

>>> print product_price_table.c.price + 14.44

product_price.price + ?

>>> expr = product_table.c.sku + "-sku"

>>> print expr

product.sku || ?

>>> from sqlalchemy.databases.mysql import MySQLDialect

>>> print expr.compile(dialect=MySQLDialect())

product.sku + %s

Arbitrary SQL operators (such as MySQL’s NULL-safe equality operator, <=>) are also supported via the op() method on ClauseElements:

>>> print product_table.c.sku.op('my_new_operator')(

... product_table.c.msrp)

product.sku my_new_operator product.msrp

SQLAlchemy also provides for use of the SQL boolean operators AND, OR, and NOT, as well as the LIKE operator for comparing strings. The bitwise logical operators &, |, and ~ are used to implement AND, OR, and NOT, while the like() method on ClauseElements is used to implement LIKE. Special care must be taken when using the AND, OR, and NOT overloads because of Python operator precendence rules. For instance, & binds more closely than <, so when you write A < B & C < D, what you are actually writing is A < (B&C) < D, which is probably not what you intended. You can also use the SQLAlchemy-provided functions and_, or_, and not_ to represent AND, OR, and NOT if you prefer:

>>> print (product_table.c.msrp > 10.00) & (product_table.c.msrp <

... 20.00)

product.msrp > ? AND product.msrp < ?

>>> print and_(product_table.c.msrp > 10.00,

... product_table.c.msrp < 20.00)

product.msrp > ? AND product.msrp < ?

>>> print product_table.c.sku.like('12%')

product.sku LIKE ?

>>> print ~((product_table.c.msrp > 10.00) &

... (product_table.c.msrp < 20.00))

NOT (product.msrp > ? AND product.msrp < ?)

>>> print not_(and_(product_table.c.msrp > 10.00,

... product_table.c.msrp < 20.00))

NOT (product.msrp > ? AND product.msrp < ?)

SQLAlchemy also provides for the use of arbitrary SQL functions via the func variable, which generates functions using attribute access. You can also use the special function func._ to add parentheses around a subexpression if necessary:

>>> print func.now()

now()

>>> print func.current_timestamp

current_timestamp

>>> print func.abs(product_table.c.msrp)

abs(product.msrp)

>>> print func._(text('a=b'))

(a=b)

SQLAlchemy provides several other useful methods on ClauseElements, summarized here:

between(cleft, cright)

Produces a BETWEEN clause like column BETWEEN cleft AND cright.

distinct ()

Adds a DISTINCT modifier like DISTINCT column.

startswith(other)

Produces the clause column LIKE 'other%'.

endswith (other)

Produces the clause column LIKE '%other‘.

in_ (*other)

Produces an IN clause like column IN (other[0], other[1], ...). other can also be a subquery.

like (other)

Produces a LIKE clause like column LIKE other.

op (operator)

Produces an arbitrary operator like column operator.

label (name)

Produces an AS construct for the column (a column alias) like column AS name.

Using custom bind parameters

Up to this point, SQLAlchemy has been automatically creating bind parameters whenever we used a literal expression in the SQL query language. It is also possible to generate a custom bind parameter. This might be useful, for instance, if you wanted to generate a statement without knowinga priori what values would be used to bind the statement. You can also use this to speed up your queries when you have many statements that are identical except for the bind parameter values. (The Python overhead for executing each query is lower in such cases, and some database servers will cache the execution plan, making the server-side processing faster as well.) Using the schema introduced earlier in this chapter, we might generate a statement that selects the price for a given product using the following code:

>>> stmt = select([product_table.c.msrp],

... whereclause=product_table.c.sku==bindparam('sku'))

>>> print stmt

SELECT product.msrp

FROM product

WHERE product.sku = ?

>>> print stmt.compile().get_params()

ClauseParameters:{'sku': None}

>>> print stmt.execute(sku='123').fetchall()

[(12.34,)]

>>> print stmt.execute(sku='456').fetchall()

[(22.120000000000001,)]

>>> print stmt.execute(sku='789').scalar()

41.44

The actual bindparam() parameters are summarized here:

key

Either a string representing the bind parameter name or a Column object (which will be used to generate a bind parameter name). This name is used in the execute() call to provide a value for the bind parameter.

value=None

The default value for this bind parameter (If no value is supplied in the execute() call, this value will be used instead.) If no value is supplied here or in the execute() call, an exception is raised.

type=None

A TypeEngine object representing the type of the bind parameter. The TypeEngine is used to format the value provided to the bind parameter using the TypeEngine’s convert_bind_param() method.

shortname=None

An alias for the bind parameter (this name can be used in the execute() call instead of the key parameter). This can be useful if the key name is cumbersome, as when using a Column object.

unique=False

Generate a unique name for the bind parameter based on the key. This can be useful for ensuring there are no unintended name collisions. This is typically used along with the value parameter.

Using literal text in queries

We have already briefly seen the use of the text() in constructing customized SQL strings. In fact, even when we want to use custom SQL strings, we rarely need to use the text() function; SQLAlchemy can infer the need for it automatically in most cases. For instance, if we wanted to select the price for SKU “123”, we could simply write:

>>> stmt = select(['product.msrp'],

... from_obj=['product'],

... whereclause="product.sku=='123'")

>>> print stmt

SELECT product.msrp

FROM product

WHERE product.sku=='123'

>>> print metadata.bind.execute(stmt).fetchall()

[(12.34,)]

>>> stmt2 = select([text('product.msrp')],

... from_obj=[text('product')],

... whereclause=text("product.sku=='123'"))

>>> print str(stmt2) == str(stmt)

True

We can use bind parameters with text() by using the “named colon” format (:name) for the bind parameters. We can also bind the clause constructed to a particular engine using the bind parameter to the text() function.

>>> stmt = text("SELECT product.msrp FROM product WHERE

... product.sku==:sku",

... bind=metadata.bind)

>>> print stmt

SELECT product.msrp FROM product WHERE product.sku==?

>>> print stmt.compile().get_params()

ClauseParameters:{'sku': None}

>>> print stmt.execute(sku='456').fetchall()

[(22.120000000000001,)]

The actual parameters of the text() function are summarized here:

text

The string with the SQL text to be constructed. Bind parameters can be used with the :parameter syntax.

bind=None

The engine to which to bind the constructed ClauseElement. Useful when constructing a statement entirely out of text() objects.

bindparams=None

A list of bindparam( ⁠) objects to be used to define the types and/or values of the bind parameters used.

typemap=None

A dictionary mapping column names used in a SELECT statement to TypeEngines. Used to convert result set values to Python objects.

Ordering and grouping results, returning distinct values

SQLAlchemy supports the use of the ORDER BY, GROUP BY, HAVING, and UNIQUE clauses of SQL queries via the order_by, group_by, having, and unique parameters of the select() function and method.

The Difference Between WHERE and HAVING

Both the WHERE clause in SQL and the HAVING clause restrict results to those results matching a given SQL expression. The difference is that HAVING is always accompanied by grouping (typically via the GROUP BY clause), and the HAVING clause filters the results after they are grouped, whereas the WHERE clause filters the rows before they are grouped. WHERE clauses therefore can’t reference the results of aggregation functions such as SUM or COUNT, but the HAVING clause can.

If we wanted to see the products in our database listed by price, for instance, we could use the following query:

>>> stmt = product_table.select(order_by=[product_table.c.msrp])

>>> print stmt

SELECT product.sku, product.msrp

FROM product ORDER BY product.msrp

>>> print stmt.execute().fetchall()

[(u'123', 12.34), (u'456', 22.120000000000001), (u'789',

... 41.439999999999998)]

>>> stmt =

... product_table.select(order_by=[desc(product_table.c.msrp)])

>>> print stmt

SELECT product.sku, product.msrp

FROM product ORDER BY product.msrp DESC

>>> print stmt.execute().fetchall()

[(u'789', 41.439999999999998), (u'456', 22.120000000000001),

... (u'123', 12.34)]

We could use the grouping provided by group_by (possibly filtered by having) to retrieve how many stores carry each product:

>>> stmt = select([product_price_table.c.sku,

... func.count(product_price_table.c.store_id)],

... group_by=[product_price_table.c.sku])

>>> print stmt

SELECT product_price.sku, count(product_price.store_id)

FROM product_price GROUP BY product_price.sku

>>> print stmt.execute().fetchall()

[(u'456', 2), (u'789', 2)]

>>>

>>> stmt = select([product_price_table.c.sku,

... func.count(product_price_table.c.store_id)],

... group_by=[product_price_table.c.sku],

... having=func.count(product_price_table.c.store_id)

... > 2)

>>> print stmt

SELECT product_price.sku, count(product_price.store_id)

FROM product_price GROUP BY product_price.sku

HAVING count(product_price.store_id) > ?

>>> print stmt.execute().fetchall()

[]

We have already seen how we can use the distinct() method on ClauseElements to specify that a column should be distinct in a result set. SQLAlchemy also provides support for selecting only distinct rows in a result set via the distinct parameter to select().

>>> stmt = select([product_price_table.c.sku,

... product_price_table.c.price])

>>> print stmt

SELECT product_price.sku, product_price.price

FROM product_price

>>> print stmt.execute().fetchall()

[(u'456', 22.120000000000001), (u'789', 41.439999999999998),

... (u'456', 22.120000000000001), (u'789', 41.439999999999998)]

>>> stmt = select([product_price_table.c.sku,

... product_price_table.c.price],

... distinct=True)

>>> print stmt

SELECT DISTINCT product_price.sku, product_price.price

FROM product_price

>>> print stmt.execute().fetchall()

[(u'456', 22.120000000000001), (u'789', 41.439999999999998)]

Limiting results returned

One common operation when working with large data sets is the use of the OFFSET and LIMIT clauses to return only a subset of data from a cursor. SQLAlchemy supports OFFSET and LIMIT (even in databases without direct support) through the use of offset and limit with theselect() function and method:

>>> stmt = product_table.select()

>>> print stmt.execute().fetchall()

[(u'123', 12.34), (u'456', 22.120000000000001), (u'789',

... 41.439999999999998)]

>>> stmt = product_table.select(offset=1, limit=1)

>>> print stmt

SELECT product.sku, product.msrp

FROM product

LIMIT 1 OFFSET 1

>>> print stmt.execute().fetchall()

[(u'456', 22.120000000000001)]

Limiting and offsetting is done after ordering and grouping, so you can use this construct to provide a “paged” view of sorted data. This can be very useful, for instance, when displaying sortable data on a web form.

Using the “generative” query interface

Up until this point, we have been using the select() function and method as a query constructor, generating a complete SQL statement as a result of the select() call. SQLAlchemy also supports a “generative” interface for the select() function and method that allows us to build up the query, one piece at a time. For instance, suppose we have a product table with the following defintion:

product_table = Table(

'product', metadata,

Column('id', Integer, primary_key=True),

Column('sku', String(20), unique=True),

Column('manufacturer', Unicode(255)),

Column('department', Unicode(255)),

Column('category', Unicode(255)),

Column('class', Unicode(255)),

Column('subclass', Unicode(255)))

Now, suppose we have a user interface that displays all the “product” records in the system, optionally filtered by various criteria (manufacturer, department, etc.). We might write the following function to return the filtered user list:

def get_prods(manufacturer=None,

department=None,

category=None,

class_=None,

subclass=None,

offset=None,

limit=None):

where_parts = []

if manufacturer is not None:

where_parts.append(product_table.c.manufacturer

== manufacturer)

if department is not None:

where_parts.append(product_table.c.department

== department)

if category is not None:

where_parts.append(product_table.c.category

== category)

if class_ is not None:

where_parts.append(product_table.c.class_

== class_)

if subclass is not None:

where_parts.append(product_table.c.subclass

== subclass)

whereclause=and_(*where_parts)

query = product_table.select(whereclause,

offset=offset, limit=limit)

return query

We can use arbitrary filters, and the appropriate SQL WHERE clause will automatically be constructed for us automatically:

>>> q = get_prods()

>>> print q

SELECT product.id, product.sku, product.manufacturer,

... product.department, product.category, product.class,

... product.subclass

FROM product

>>> q = get_prods(manufacturer="Neon")

>>> print q

SELECT product.id, product.sku, product.manufacturer,

... product.department, product.category, product.class,

... product.subclass

FROM product

WHERE product.manufacturer = ?

>>> q = get_prods(manufacturer="Neon", department="Auto")

>>> print q

SELECT product.id, product.sku, product.manufacturer,

... product.department, product.category, product.class,

... product.subclass

FROM product

WHERE product.manufacturer = ? AND product.department = ?

The generative interface allows us to rewrite the previous function as the following:

def get_prods(manufacturer=None,

department=None,

category=None,

class_=None,

subclass=None,

offset=None,

limit=None):

query = product_table.select()

if manufacturer is not None:

query = query.where(product_table.c.manufacturer

== manufacturer)

if department is not None:

query = query.where(product_table.c.department

== department)

if category is not None:

query = query.where(product_table.c.category

== category)

if class_ is not None:

query = query.where(product_table.c.class_

== class_)

if subclass is not None:

query = query.where(product_table.c.subclass

== subclass)

query = query.offset(offset)

query = query.limit(limit)

return query

Although the two functions have the same functionality, the second one (using the generative interface) is more flexible. Suppose we wanted to refactor the original function into multiple parts, with each part potentially adding a different filtering criterion. In that case, we would need to pass a where_parts list through all the intermediate functions. In the generative approach, all the information about the query is “wrapped up” in the query itself, allowing us to build up a query piecemeal in several different functions, without passing anything around but the query itself.

The generative interface actually consists of a set of methods on the statement constructed by the select() function or method. Those methods are summarized next. Note that none of these functions actually modify the query object in place; rather, they return a new query object with the new condition applied:

where(whereclause)

Add a constraint to the WHERE clause. All constraints added this way will be AND-ed together to create the whole WHERE clause.

order_by(*clauses)

Generate an ORDER BY clause (or append the given clauses to an existing ORDER BY clause).

group_by(*clauses)

Generate a GROUP BY clause (or append the given clauses to an existing GROUP BY clause).

having(having)

Generate a HAVING clause (or add to an existing HAVING clause). Like where(), the final statement’s HAVING clause will be all of the clauses added via this function, AND-ed together.

select_from(fromclause)

Generate a FROM clause or append to the existing one.

limit(limit)

Equivalent to the limit parameter in the select() function or method.

offset(offset)

Equivalent to the offset parameter in the select() function or method.

column(column)

Add a column to the list of columns being selected.

distinct ()

Equivalent to passing distinct=True to the select() function or method.

count(whereclause=None, **params)

Generate a statement that will count the rows that would be returned from the query, optionally with a whereclause and additional params to be passed to the generated SELECT COUNT(...) statement.

apply_labels ()

Equivalent to use_labels=True in the select() function/method.

prefix_with(clause)

Append a prefix to the generated SQL. (A prefix is inserted immediately after the SELECT keyword, as in the prefixes parameter to select().)

replace_selectable(old, alias)

Replace every occurrence of old with the alias alias. (Aliasing is covered in more detail in later in this chapter, Using aliases”). This can be useful when it is necessary to modify a query to use an alias when that query was originally written to use a reference to the actual table, for instance.

union(other, **kwargs)

Return an UNION with this selectable and another (covered in more detail later under Joins and Set Operations”).

union_all(other, **kwargs)

Return an UNION ALL with this selectable and another (covered in more detail later under Joins and Set Operations”).

intersect(other, **kwargs)

Return an INTERSECT with this selectable and another (covered in more detail later under Joins and Set Operations”).

intersect_all(other, **kwargs)

Return an INTERSECT ALL with this selectable and another (covered in more detail under Joins and Set Operations”).

except_(other, **kwargs)

Return an EXCEPT with this selectable and another (covered in more detail under Joins and Set Operations”).

except_all(other, **kwargs)

Return an EXCEPT ALL with this selectable and another (covered in more detail under Joins and Set Operations”).

join(right, *args, **kwargs)

Return a INNER JOIN between this selectable and another (covered in more detail under Joins and Set Operations”).

outerjoin(right, *args, **kwargs)

Return a LEFT OUTER JOIN between this selectable and another (covered in more detail under Joins and Set Operations”).

as_scalar ()

Allows the query to be embedded in a column list of an enclosing query.

label(name)

Label the result of this query with name for use in the column list of an enclosing query. Also implies as_scalar().

correlate(fromclause)

Specify a table on which to correlate, or use None to disable SQLAlchemy’s auto-correlation on embedded subqueries.

select(whereclauses, **params)

Generate an enclosing SELECT statment that selects all columns of this select.

Joins and Set Operations

In addition to the interface for selecting, filtering, sorting, and grouping on SELECT statements from single tables, SQLAlchemy provides full support for operations that combine multiple tables or other selectables (JOINs), as well as set operations on selectables (UNION, INTERSECT, and EXCEPT).

Joining selectables

To join two selectables (in tables or other select statements) together, SQLAlchemy provides the join() (implementing INNER JOIN) and outerjoin() (implementing OUTER JOIN) functions, as well as join() and outerjoin() methods on all selectables. The only difference between the *join() methods and the *join() functions is that the methods implicitly use self as the lefthand side of the join.

If you are familiar with the JOIN constructs in SQL, then you are used to specifyingthe ON clause of the JOIN. For instance, to select all stores where the price of a product is different than its MSRP, you might write the following SQL:

SELECT store.name

FROM store

JOIN product_price ON store.id=product_price.store_id

JOIN product ON product_price.sku=product.sku

WHERE product.msrp != product_price.price;

Notice how we had to specify the join criteria for each of the joins in the statement. Wouldn’t it be nice if the database could infer the ON clauses based on the foreign key constraints? Well, SQLAlchemy does this automatically:

>>> from_obj = store_table.join(product_price_table)

... .join(product_table)

>>> query = store_table.select()

>>> query = query.select_from(from_obj)

>>> query = query.where(product_table.c.msrp

... != product_price_table.c.price)

>>> print query

SELECT store.id, store.name

FROM store JOIN product_price ON store.id = product_price.store_id

... JOIN product ON product.sku = product_price.sku

WHERE product.msrp != product_price.price

In some cases, we are not using the JOINed table to filter results, but we would like to see the results from a JOINed table alongside results from the table we are using. In this case, we can either use the select() function or use the column() method of the query object:

>>> print query.column('product.sku')

SELECT store.id, store.name, product.sku

FROM store JOIN product_price ON store.id = product_price.store_id

... JOIN product ON product.sku = product_price.sku

WHERE product.msrp != product_price.price

>>> query2 = select([store_table, product_table.c.sku],

... from_obj=[from_obj],

... whereclause=(product_table.c.msrp

... !=product_price_table.c.price))

>>> print query2

SELECT store.id, store.name, product.sku

FROM store JOIN product_price ON store.id = product_price.store_id

... JOIN product ON product.sku = product_price.sku

WHERE product.msrp != product_price.price

But what if we want to return results that may not have matching rows in the JOINed table? For this, we use the outerjoin function/method:

>>> from_obj = store_table.outerjoin(product_price_table)

>>> from_obj = from_obj.outerjoin(product_table)

>>> query = store_table.select()

>>> query = query.select_from(from_obj)

>>> query = query.column('product.msrp')

>>> print query

SELECT store.id, store.name, product.msrp

FROM store LEFT OUTER JOIN product_price

... ON store.id = product_price.store_id

LEFT OUTER JOIN product

... ON product.sku = product_price.sku

In this case, if there is not a matching entry in the product_price table or the product table, then the query will insert None for the msrp column.

Although SQLAlchemy can automatically infer the correct join condition most of the time, support is also provided for custom ON clauses via the onclause argument to join() and outerjoin(), a ClauseElement specifying the join condition.

Set operations (UNION, INTERSECT, EXCEPT)

The SQL language and SQLAlchemy also support set operations on selectables. For instance, you may wish to retrieve the union of results from two queries (those rows satisfying either or both queries), the intersection (those rows satisfying both queries), or the difference (those rows satisfying the first query but not the second). For these functions, SQL provides the UNION, INTERSECT, and EXCEPT clauses, as well as the related UNION ALL, INTERSECT ALL, and EXCEPT ALL clauses (although the INTERSECT and EXCEPT clauses are not supported on all databases).

To support these constructs, SQLAlchemy provides the union(), union_all(), intersect(), intersect_all(), except_(), and except_all() functions and selectable methods. Like the *join() methods, the set-oriented methods are simply the corresponding functions with the first parameter bound to itself. Suppose we wanted to select all the products with prices greater than $10 but less than $20. One way we could do this is with the following simple query:

>>> query = product_table.select(and_(product_table.c.msrp > 10.00 ,

... product_table.c.msrp < 20.00))

>>> print query

SELECT product.sku, product.msrp

FROM product

WHERE product.msrp > ? AND product.msrp < ?

>>> for r in query.execute():

... print r

...

(u'123', Decimal("12.34"))

We could rewrite this query as an INTERSECT using the intersect() function:

>>> query0 = product_table.select(product_table.c.msrp > 10.00)

>>> query1 = product_table.select(product_table.c.msrp < 20.00)

>>> query = intersect(query0, query1)

>>> print query

SELECT product.sku, product.msrp

FROM product

WHERE product.msrp > ? INTERSECT SELECT product.sku, product.msrp

FROM product

WHERE product.msrp < ?

>>> for r in query.execute():

... print r

(u'123', Decimal("12.34"))

Using aliases

When using joins, it is often necessary to refer to a table more than once. In SQL, this is accomplished by using aliases in the query. For instance, suppose we have the following (partial) schema that tracks the reporting structure within an organization:

employee_table = Table(

'employee', metadata,

Column('id', Integer, primary_key=True),

Column('manager', None, ForeignKey('employee.id')),

Column('name', String(255)))

Now, suppose we want to select all the employees managed by an employee named Fred. In SQL, we might write the following:

SELECT employee.name

FROM employee, employee AS manager

WHERE employee.manager_id = manager.id

AND manager.name = 'Fred'

SQLAlchemy also allows the use of aliasing selectables in this type of situation via the alias() function or method:

>>> manager = employee_table.alias('mgr')

>>> stmt = select([employee_table.c.name],

... and_(employee_table.c.manager_id==manager.c.id,

... manager.c.name=='Fred'))

>>> print stmt

SELECT employee.name

FROM employee, employee AS mgr

WHERE employee.manager_id = mgr.id AND mgr.name = ?

SQLAlchemy can also choose the alias name automatically, which is useful for guaranteeing that there are no name collisions:

>>> manager = employee_table.alias()

>>> stmt = select([employee_table.c.name],

... and_(employee_table.c.manager_id==manager.c.id,

... manager.c.name=='Fred'))

>>> print stmt

SELECT employee.name

FROM employee, employee AS employee_1

WHERE employee.manager_id = employee_1.id AND employee_1.name = ?

Subqueries

SQLAlchemy provides rich support for subqueries (using a query inside another query). We have already seen one type of subquery in the use of the join and in set operation support. SQLAlchemy also allows subqueries to appear in the column list of a select statement, in the right hand side of the SQL IN operator (using the SQLAlchemy-provided in_() method on ClauseElements), and as an argument to the from_obj parameter on the select() function.

Embedding subqueries in the column list

In order to embed a subquery in a column list, we need to use the as_scalar() method on the inner query to indicate that the query will return a single value. For instance, if we want to retrieve the number of stores that offer each product, we could use the following query:

>>> subquery = select(

... [func.count(product_price_table.c.sku)],

... product_price_table.c.sku==product_table.c.sku)

>>> print subquery

SELECT count(product_price.sku)

FROM product_price, product

WHERE product_price.sku = product.sku

>>> stmt = select([product_table.c.sku,

... product_table.c.msrp,

... subquery.as_scalar()])

>>> print stmt

SELECT product.sku, product.msrp, (SELECT count(product_price.sku)

FROM product_price

WHERE product_price.sku = product.sku)

FROM product

>>> for row in stmt.execute():

... print row

...

(u'123', Decimal("12.34"), 0)

(u'456', Decimal("22.12"), 2)

(u'789', Decimal("41.44"), 2)

Correlated versus uncorrelated subqueries

You may have noticed in the previous example that when SQLAlchemy inserted the subquery into the main query, it left out the product table in the subquery’s FROM list. This is because SQLAlchemy attempts to correlate subqueries with outer queries whenever they reference the same table. To disable this behavior, you can use the correlate() method on the subquery to manually specify a FROM clause to remove from the subquery, or, by passing None, to disable correlation in the subquery:

>>> stmt = select([product_table.c.sku,

... product_table.c.msrp,

... subquery.correlate(None).as_scalar()])

>>> print stmt

SELECT product.sku, product.msrp, (SELECT count(product_price.sku)

FROM product_price, product

WHERE product_price.sku = product.sku)

FROM product

>>> for row in stmt.execute():

... print row

...

(u'123', Decimal("12.34"), 4)

(u'456', Decimal("22.12"), 4)

(u'789', Decimal("41.44"), 4)

Because the inner query is uncorrelated, rather than totaling the number of stores that carry the given product, the query repeatedly calculates the number of rows in the product_price table with any valid SKU.

Embedding subqueries in an IN clause

It is often useful in SQL to embed subqueries in an IN clause of another query. SQLAlchemy provides support for this as well, allowing you to specify a selectable as an argument for the ClauseElement’s in_() method. For instance, if we wanted to retrieve all the employees whose names start with “Ted” and who do not have a manager, we could write the query as follows:

>>> subquery = select([employee_table.c.id],

... employee_table.c.manager_id==None)

>>> stmt = employee_table.select(

... and_(employee_table.c.id.in_(subquery),

... employee_table.c.name.like('Ted%')))

>>> print stmt

SELECT employee.id, employee.manager_id, employee.name

FROM employee

WHERE employee.id IN (SELECT employee.id

FROM employee

WHERE employee.manager_id IS NULL) AND employee.name LIKE ?

Embedding subqueries in the FROM clause

It is sometimes useful to generate a SQL query in multiple stages by using a subquery in the FROM clause of another query (and continuing this nesting if necessary). SQLAlchemy provides support for such subqueries by allowing you to specify any list of selectables (not just Table objects) to the from_obj parameter of the select(). If we follow this pattern, then the previous query could be rewritten as follows:

>>> subquery =

... employee_table.select(employee_table.c.manager_id==None)

>>> stmt = select([subquery.c.id, subquery.c.manager_id,

... subquery.c.name],

... whereclause=subquery.c.name.like('Ted%'),

... from_obj=[subquery])

>>> print stmt

SELECT id, manager_id, name

FROM (SELECT employee.id AS id, employee.manager_id AS manager_id,

... employee.name AS name

FROM employee

WHERE employee.manager_id IS NULL)

WHERE name LIKE ?