Engines and MetaData - Essential SQLAlchemy, 2nd Edition (2010)

Essential SQLAlchemy, 2nd Edition (2010)

Chapter 3. Engines and MetaData

This chapter introduces SQLAlchemy’s Engine and MetaData classes. The Engine class provides database connectivity, including a connection pool with various strategies for acquiring connections from the pool. The MetaData class maintains information about your database schema, including any tables and indexes defined. In this chapter, you will learn how to define a new database schema using MetaData as well as how to connect a MetaData instance to an existing schema.

Engines and Connectables

The SQLAlchemy-provided Engine class is responsible for managing the connection to the database. It does this by incorporating a database connection pool and a database-specific Dialect layer to translate the SQL expression language (Chapter 5) into database-specific SQL.

To get started using an Engine, you use the create_engine() function:

# Create a connection to a SQLite in-memory database

engine = create_engine('sqlite://')

# Create a connection to a SQLite on-disk database "data.sqlite"

engine = create_engine('sqlite:///data.sqlite')

# Create a connection to a PostGreSQL database

engine = create_engine('postgres://rick:foo@localhost:5432/pg_db')

# Create a connection to a MySQL database

engine = create_engine('mysql://localhost/mysql_db')

# Create a connection to an Oracle database (via TNS)

engine = create_engine('oracle://rick:foo@oracle_tns')

# Create a connection to an Oracle database (without a TNS name)

engine =

... create_engine('oracle://rick:foo@localhost:1521/oracle_sid')

The first argument to create_engine() is the RFC-1738 style URL specifying the database. The general form of the url is: driver://username:password@host:port/database. Of course, the various database drivers interpret these URLs in slightly different ways, as illustrated here. It is also possible to pass additional arguments to the low-level DB-API driver created by SQLAlchemy via either a URL query string:

url='postgres://rick:foo@localhost/pg_db?arg1=foo&arg2=bar'

engine = create_engine(url)

or via the connect_args parameter to create_engine():

engine = create_engine('postgres://rick:foo@localhost/pg_db',

connect_args=dict(arg1='foo', arg2='bar'))

If you wish complete control over the connection creation process, you can even pass a function (or other callable object) that returns a DB-API connection to create_engine() in the moreinfo="none">creator argument:

import psycopg

def connect_pg():

return psycopg.connect(user='rick', host='localhost')

engine = create_engine('postgres://', creator=connect_pg)

The full set of keyword arguments accepted by create_engine() are specified in here:

connect_args

A dictionary of options to be passed to the DB-API’s connect⁠(⁠) method. The default is {}.

convert_unicode

Indicates whether the engine should convert all unicode values into raw byte strings before going into the database, and convert raw byte strings to unicode coming out of result sets. This can be useful, for instance, when dealing with a database server or schema that does not provide unicode support natively. The default is False.

creator

A callable that returns a DB-API connection. The default is None.

echo

A flag that tells SQLAlchemy to echo all statements and bind parameter values to its logger. The default is None.

echo_pool

A flag that tells SQLAlchemy to log all connection pool checkins and checkouts. The default is False.

encoding

Specifies the encoding to use in all translations between raw byte strings and Python unicode objects. The default is False.

module

Specifies which module to use when a database implementation can use more than one (such as PostgreSQL and Oracle). The default is None.

pool

Use an existing connection pool rather than creating a new one. The default is None.

poolclass

If the engine is creating its own connection pool, the class (a subclass of sqlalchemy.pool.Pool) to use when constructing the pool object. If no pool class is specified, sqlalchemy.pool.QueuePool will be used for all database drivers except for SQLite, which uses thesqlalchemy.pool.SingletonThreadPool. The default is None.

max_overflow

The number of connections to allow the connection pool to overflow to (only applicable with the QueuePool). The default is 10.

pool_size

The number of connections to keep alive in the connection pool (only applicable to the QueuePool and SingletonThreadPool pool classes). The default is 5.

pool_recycle

Close and reopen connections after this number of seconds of inactivity, or, if –1 (the default), disable connection recycling. This is useful if the database server times out connections after a period of inactivity, as MySQL does.

pool_timeout

The number of seconds to wait when getting a connection from the pool before giving up, (applicable only to QueuePool connection pools). The default is 30.

strategy

Selects an alternate implementation of the engine; the only current strategies are 'plain' and 'threadlocal‘. 'threadlocal' reuses connections for multiple statements within a thread; 'plain' (the default) uses a new connection for each statement.

threaded

Used only by cx_Oracle, makes the engine threadsafe. If this is not required, performance might be improved by setting this parameter to False.

use_ansi

Used only by Oracle to correct for a quirk of Oracle versions 8 and earlier when handling LEFT OUTER JOINs.

use_oids

Used only by PostgreSQL to enable the column name "oid" (object ID).

Configuring SQLAlchemy Logging

SQLAlchemy uses the Python standard library logging module to log various actions. The echo and echo_pool arguments to create_engine() and the echo_uow flag used on Session objects all affect the regular loggers.

One useful debugging strategy is to add a logfile for a particular class of operations that SQLAlchemy is performing. For instance, to capture all of the engine-related operations, we could set up the logger as follows:

import logging

handler = logging.FileHandler('sqlalchemy.engine.log')

handler.level = logging.DEBUG

logging.getLogger('sqlalchemy.engine').addHandler(handler)

The loggers used with SQLAlchemy are listed next. Note that several of these loggers deal with material covered in later chapters (in particular, the sqlalchemy.orm.* loggers):

§ sqlalchemy.engine—control SQL echoing. logging.INFO logs SQL query output, logging.DEBUG logs result sets as well.

§ sqlalchemy.pool—control connection pool logging. logging.INFO logs checkins and checkouts.

§ sqlalchemy.orm—control logging of ORM functions. logging.INFO logs configurations and unit of work dumps.

· sqlalchemy.orm.attributes—logs instrumented attribute operations.

· sqlalchemy.orm.mapper—logs mapper configurations and operations.

· sqlalchemy.orm.unitofwork—logs unit of work operations, including dependency graphs.

· sqlalchemy.orm.strategies—logs relation loader operations (lazy and eager loads).

· sqlalchemy.orm.sync—logs synchronization of attributes from one object to another during a flush.

Database Connections and ResultProxys

Although the Engine is the normal method of performing database operations, SQLAlchemy does make the lower-level Connection object available through the connect() method on the engine, as shown in the following example:

conn = engine.connect()

result = conn.execute('select user_name, email_address from

... tf_user')

for row in result:

print 'User name: %s Email address: %s' % (

row['user_name'], row['email_address'])

conn.close()

The Connection object is actually an instance of the sqlalchemy.engine.Connection class, which serves as a proxy for the particular DB-API connection object. The result object is an instance of the sqlalchemy.engine.ResultProxy class, which has many features in common with a database cursor.

Both Engines and Connections are implementations of the Connectable interface, which has two important methods: connect(), which in the case of a Connection simply returns itself, and execute(), which executes some SQL and generates a ResultProxy. Most SQLAlchemy functions that therefore take an Engine as a parameter (usually named bind) can also take a Connection, and vice versa.

The ResultProxy object has several useful methods and attributes for returning information about the query:

__iter__ ()

Allows iteration over a result proxy, generating RowProxy objects

fetchone ()

Fetches the next RowProxy object from the ResultProxy

fetchall ()

Fetches all RowProxy objects at once

scalar ()

Fetches the next row from the cursor and treat it as a scalar (i.e., not a RowProxy)

keys

List of the column names in the result set

rowcount

The total number of rows in the result set

close ()

Closes the ResultProxy, possibly returning the underlying Connection to the pool

The RowProxy object generated by the ResultProxy provides several useful methods that allow you to retrieve data, such as a tuple, dictionary, or object:

__getattr__ ()

Provides access to data via object.column name

__getitem__ ()

Provides access to data via object[column name] or object[column position]

keys ()

Provides a list of all the column names in the row

values ()

Provides a list of all the values in the row

items ()

Provides a list of (column name, value) tuples for the row

Connection Pooling

SQLAlchemy provides the connection pool as an easy and efficient way to manage connections through the database. Normally, you don’t need to worry about the connection pool because it is automatically managed by the Engine class. The connection pool can, however, be used on its own to manage regular DB-API connections. If you wish to manage such a pool, you could do the following:

from sqlalchemy import pool

import psycopg2

psycopg = pool.manage(psycopg2)

connection = psycopg.connect(database='mydb',

username='rick', password='foo')

The pool.manage() call sets up a connection pool (the exact object is an instance of sqlalchemy.pool.DBProxy). The connect() method then works just as the Engine’s connect() method, returning a proxy for the DB-API connection from the managed connection pool. When the connection proxy is garbage collected, the underlying DB-API connection is returned to the connection pool.

By default, the connect() method returns the same connection object if it is called multiple times in a given thread (the same “threadlocal” strategy used by the Engine). To specify that the pool should generate a new connection each time that connect() is called, passuse_threadlocal=False to the pool.manage() function.

If you wish to use a particular connection pool class instead of the DBProxy as shown previously, SQLAlchemy provides the ability to directly instantiate the pool:

from sqlalchemy import pool

import psycopg2

import sqlite

def getconn_pg():

c = psycopg2.connect(database='mydb', username='rick',

password='foo')

return c

def getconn_sl():

c = sqlite.connect(filename='devdata.sqlite')

return c

pool_pg = pool.QueuePool(getconn_pg, use_threadlocal=True)

# SQLite requires use of the SingletonThreadPool

pool_sl = pool.SingletonThreadPool(getconn_sl)

Some of the various pool types that are available in the sqlalchemy.pool module are:

AssertionPool

Allows only one connection to be checked out at a time and raises an AssertionError when this constraint is violated.

NullPool

Does no pooling; instead, actually opens and closes the underlying DB-API connection on each checkout/checkin of a connection.

QueuePool

Maintains a fixed-size connection pool. This is the default connection pool class used for nonsqlite connections.

SingletonThreadPool

Maintains a single connection per thread. It is used with sqlite because this database driver does not handle using a single connection in multiple threads well.

StaticPool

Maintains a single connection that is returned for all connection requests.

MetaData

SQLAlchemy provides the MetaData class, which collects objects that describe tables, indexes, and other schema-level objects. Before using any of the higher-level features of SQLAlchemy, such as the SQL query language and the ORM, the schema of the database must be described using metadata. In some cases, you can reflect the structure of schema items into the MetaData from the database. In this case, you need only specify the name of the entity, and its structure will be loaded from the database directly.

Getting Started with MetaData

To create a new MetaData object, you simply call its constructor, possibly with information about how to connect to the database. If the constructor is called with no arguments, it is considered to be unbound; if it is called with either an Engine or a SQL connection URI, it is consideredbound. Shortcuts are available to bound MetaData and to objects within a bound MetaData to facilitate the execution of statements against the bound engine. Most of the time you will probably use a bound MetaData object. However, it is sometimes useful to use an unbound MetaData if you need to connect to multiple database servers, where each server contains the same database schema.

The various ways to construct MetaData objects are illustrated in the following examples:

# create an unbound MetaData

unbound_meta = MetaData()

# create an Engine and bind the MetaData to it

db1 = create_engine('sqlite://')

unbound_meta.bind = db

# Create an engine and then a bound MetaData

db2 = MetaData('sqlite:///test1.db')

bound_meta1 = MetaData(db2)

# Create a bound MetaData with an implicitly created engine

bound_meta2 = MetaData('sqlite:///test2.db')

Note that you are never required to bind the MetaData object; all operations that rely on a database connection can also be executed by passing the Engine explicitly as the keyword parameter bind. This is referred to as explicit execution. If a MetaData instance is bound, then the bindparameter can be omitted from method calls that rely on the database connection. This is referred to as implicit execution. The “bound-ness” of a MetaData object is shared by all Tables, Indexes, and Sequences in the MetaData, so a Table attached to a bound MetaData, for instance, would be able to create itself via:

table.create()

whereas a Table in an unbound MetaData would need to supply a bind parameter:

table.create(bind=some_engine_or_connection)

Defining Tables

The most common use of the MetaData object is in defining the tables in your schema. To define tables in the MetaData, you use the Table and Column classes as shown in the following example:

from sqlalchemy import *

from datetime import datetime

metadata=MetaData()

user_table = Table(

'tf_user', metadata,

Column('id', Integer, primary_key=True),

Column('user_name', Unicode(16), unique=True, nullable=False),

Column('email_address', Unicode(255), unique=True, nullable=False),

Column('password', Unicode(40), nullable=False),

Column('first_name', Unicode(255), default=''),

Column('last_name', Unicode(255), default=''),

Column('created', DateTime, default=datetime.now))

Unlike some other database mapping libraries, SQLAlchemy fully supports the use of composite and noninteger primary and foreign keys:

brand_table = Table(

'brand', metadata,

Column('id', Integer, primary_key=True),

Column('name', Unicode(255), unique=True, nullable=False))

product_table = Table(

'product', metadata,

Column('brand_id', Integer, ForeignKey('brand.id'),

... primary_key=True),

Column('sku', Unicode(80), primary_key=True))

style_table = Table(

'style', metadata,

Column('brand_id', Integer, primary_key=True),

Column('sku', Unicode(80), primary_key=True),

Column('code', Unicode(80), primary_key=True),

ForeignKeyConstraint(['brand_id', 'sku'],

['product.brand_id',

'product.sku']))

To actually create a table, you can call the create() method on it. Here, we will create the style table on an in-memory SQLite database and view the generated SQL:

>>> style_table.create(bind=create_engine('sqlite://', echo=True))

2007-08-25 08:05:44,396 INFO sqlalchemy.engine.base.Engine.0x..50

CREATE TABLE style (

brand_id INTEGER NOT NULL,

sku VARCHAR(80) NOT NULL,

code VARCHAR(80) NOT NULL,

PRIMARY KEY (brand_id, sku, code),

FOREIGN KEY(brand_id, sku) REFERENCES product (brand_id, sku)

)

2007-08-25 08:05:44,396 INFO sqlalchemy.engine.base.Engine.0x..50

... None

2007-08-25 08:05:44,397 INFO sqlalchemy.engine.base.Engine.0x..50

... COMMIT

We see that the composite primary key and foreign key constraints are correctly generated. Although the foreign key constraints are ignored by SQLite, it is still useful to generate them, as SQLAlchemy can use this information to perform joins automatically based on the foreign key relationships between tables.

The Table constructor Table.__init__(self, name, metadata,*args, **kwargs), takes the following arguments:

name

The table name as known by the database (may be combined with the schema parameter).

metadata

The MetaData object to which to attach this table.

*args

The series of Column and Constraint objects to define for this table.

schema

The schema name for this table, if required by the database. In **kwargs, the default is None.

autoload

Indicates whether to reflect the columns from the database. In **kwargs, the default is False.

autoload_with

The Connectable used to autoload the columns. In **kwargs, the default is None.

include_columns

The list of column names (strings) to be reflected if autoload=True. If None, all columns are reflected. If not None, any columns omitted from the list will not be represented on the reflected Table object. In **kwargs, the default is None.

mustexist

Indicates that the table must already be defined elsewhere in the Python application (as part of this MetaData). An exception is raised if this is not true. In **kwargs, the default is False.

useexisting

Directs SQLAlchemy to use the previous Table definition for this table name if it exists elsewhere in the application. (SQLAlchemy disregards the rest of the constructor arguments if this is True.) in **kwargs, the default is False.

owner

Specifies the owning user of the table. This is useful for some databases (such as Oracle) to help with table reflection. In **kwargs, the default is None.

quote

Forces the table name to be escaped and quoted before being sent to the database (useful for table names that conflict with SQL keywords, for example). In **kwargs, the default is False.

quote_schema

Forces the schema name to be escaped and quoted before being sent to the database. In **kwargs, the default is False.

The Table constructor also supports database-specific keyword arguments. For instance, the MySQL driver supports the mysql_engine argument to specify the backend database driver (i.e., 'InnoDB' or 'MyISAM', for instance).

Table reflection

Tables can also be defined using reflection from an existing database. This requires a database connection, and so either a bound MetaData must be used, or a Connectable must be supplied via the autoload_with parameter:

db = create_engine('sqlite:///devdata.sqlite')

brand_table = Table('brand', metadata, autoload=True,

... autoload_with=db)

You can also override the reflected columns if necessary. This can be particularly useful when specifying custom column data types, or when the database’s introspection facilities fail to identify certain constraints.

brand_table = Table('brand', metadata,

Column('name', Unicode(255)), # override the reflected type

autoload=True)

If you want to reflect the entire database schema, you may do so by specifying reflect=True in the metadata constructor. Of course, in this case, the MetaData must be created as a bound MetaData. When reflecting an entire schema in this way, the individual tables can be accessed via the MetaData’s tables attribute:

db = create_engine('sqlite:///devdata.sqlite')

metadata = MetaData(bind=db, reflect=True)

brand_table = metadata.tables['brand']

You can also use the reflect() method of the MetaData to load the schema. MetaData.reflect(bind=None, schema=None, only=None) takes the following arguments:

bind

A Connectable used to access the database; required only when the MetaData is unbound. The default is None.

schema

Specifies an alternate schema from which to reflect tables. The default is None.

only

Directs the MetaData to load only a subset of the available tables. This can be specified either as a sequence of the names to be loaded or as a boolean callable that will be called for each available table with the parameters only(metadata, table name). If the callable returns True, the table will be reflected. The default is None.

The MetaData constructor itself has the definition MetaData.__init__(bind=None, reflect=None).

Column Definitions

The Column constructor Column.__init__(self, name, type_, *args, **kwargs) takes the following arguments:

name

The name of the column as it is known by the database.

type_

The TypeEngine for this column. This can also be None if the column is a ForeignKey, in which case the type will be the same as the referenced column.

*args

Constraint, ForeignKey, ColumnDefault, andSequence objects that apply to the column.

key

An alias for this column. If specified, the column will be identified everywhere in Python by this name rather than by its SQL-native name. In **kwargs, the default is None.

primary_key

If True, marks the column as part of the primary key. (Alternatively, the Table can have a PrimaryKeyConstraint defined.) In **kwargs, the default is False.

nullable

If set to False, this does not allow None as a value for the column. In **kwargs, the default is True, unless the column is a primary key.

default

A Python callable or a SQL expression language construct specifying a default value for this column. Note that this is an active (Python-generated) default when a callable is specified; the SQL has the generated value inserted as a literal. In **kwargs, the default is None.

index

Indicates that the column is indexed (with an autogenerated index name). Alternatively, use an Index object in the table declaration instead. In **kwargs, the default is False.

unique

Indicates that the column has a unique constraint. Alternatively, use a UniqueConstraint object in the table declation instead. In **kwargs, the default is False.

onupdate

Specifies an active default value (generated by SQLAlchemy rather than the database server) to be used when updating (but not inserting) a row in the table. In **kwargs, the default is None.

autoincrement

Indicates that integer-based primary keys should have autoincrementing behavior. This is applicable only if the column has no default value and is a type or subtype of Integer. In **kwargs, the default is True.

quote

This forces the column name to be escaped and quoted before being sent to the database (useful for column names that conflict with SQL keywords, for example). In **kwargs, the default is False.

Constraints

SQLAlchemy also supports a variety of constraints, both at the column level and at the table level. All constraints are derived from the Constraint class, and take an optional name parameter.

Note

If the name is not specified, SQLAlchemy auto-generates a suitable name if necessary.

Primary keys

The usual way to declare primary key columns is to specify primary_key=True in the Column constructor:

product_table = Table(

'product', metadata,

Column('brand_id', Integer, ForeignKey('brand.id'),

... primary_key=True),

Column('sku', Unicode(80), primary_key=True))

You can also specify primary keys using the PrimaryKeyConstraint object:

product_table = Table(

'product', metadata,

Column('brand_id', Integer, ForeignKey('brand.id')),

Column('sku', Unicode(80)),

PrimaryKeyConstraint('brand_id', 'sku', name='prikey'))

To see the SQL generated to create such a table, we can create it on the in-memory SQLite database:

>>> product_table.create(bind=create_engine('sqlite://', echo=True))

2007-08-25 14:26:56,706 INFO sqlalchemy.engine.base.Engine.0x..d0

CREATE TABLE product (

brand_id INTEGER,

sku VARCHAR(80),

CONSTRAINT prikey PRIMARY KEY (brand_id, sku),

FOREIGN KEY(brand_id) REFERENCES brand (id)

)

2007-08-25 14:26:56,706 INFO sqlalchemy.engine.base.Engine.0x..d0

... None

2007-08-25 14:26:56,707 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

Foreign keys

Foreign keys are references from a row in one table to a row in another table. The usual way to specify simple (noncomplex) foreign keys is by passing a ForeignKey object to the Column constructor. The ForeignKey constructor ForeignKey.__init__(self, column, constraint=None,use_alter=False, name=None, onupdate=None, ondelete=None) takes the following parameters:

column

Either a Column object or a database-recognized string, such as tablename.columnname or schemaname.tablename.columnname, that specifies the referenced column.

constraint

The owning ForeignKeyConstraint, if any. If left unspecified, a new ForeignKeyConstraint will be created and added to the parent table. The default is None.

use_alter

Use an ALTER TABLE command to create the constraint (passed along to the owning ForeignKeyConstraint). Otherwise, the constraint will be created in the CREATE TABLE statement. The default is False.

name

The name of the constraint (passed along to the owning ForeignKeyConstraint). The default is None.

onupdate

Generates an ON UPDATE clause in the SQL for the constraint (e.g., onupdate='CASCADE' would generate “ON UPDATE CASCADE” to cascade changes in the referenced columns to the foreign key). Commonly supported values for ON UPDATE are RESTRICT (raise an error), CASCADE (shown previously), SET NULL (set the column to NULL), and SET DEFAULT (set the column to its passive default). The default for this parameter is None. Not supported for all database drivers/backends.

ondelete

Generates an ON DELETE clause in the SQL for the constraint (e.g., ondelete='CASCADE' would generate “ON DELETE CASCADE” to cascade deletions of the referenced row to the row with the foreign key). The default is None. Not supported for all database drivers/backends.

If you need to reference a compound primary key, SQLAlchemy provides the ForeignKeyConstraint class for increased flexibility. To use the ForeignKeyConstraint, simply pass a list of columns in the local table (the compound foreign key) and a list of columns in the referenced table (the compound primary key):

style_table = Table(

'style', metadata,

Column('brand_id', Integer, primary_key=True),

Column('sku', Unicode(80), primary_key=True),

Column('code', Unicode(80), primary_key=True),

ForeignKeyConstraint(

['brand_id', 'sku'],

['product.brand_id', 'product.sku']))

The ForeignKeyConstraint constructor ForeignKeyConstraint.__init__(self, columns, refcolumns, name=None, onupdate=None, ondelete=None, use_alter=False) takes the same parameters as the ForeignKey constructor except for columns and refcolumns:

columns

Either a list of Column objects or a list of database-recognized strings (such as tablename.columnname or schemaname.tablename.columnname) that specifies the referenced column in the local table (the compound foreign key)

refcolumns

Either a list of Column objects or a list of database-recognized strings (such as tablename.columnname or schemaname.tablename.columnname) that specifies the referenced column in the remote table (the compound primary key)

UNIQUE constraints

UniqueConstraint is a more flexible version of specifying unique=True in the Column definition, as it allows multiple columns to participate in a uniqueness constraint:

product_table = Table(

'product', metadata,

Column('id', Integer, primary_key=True),

Column('brand_id', Integer, ForeignKey('brand.id')),

Column('sku', Unicode(80)),

UniqueConstraint('brand_id', 'sku'))

The SQL generated is just as we would expect:

>>> product_table.create(bind=create_engine('sqlite://', echo=True))

2007-08-25 13:55:19,450 INFO sqlalchemy.engine.base.Engine.0x..50

CREATE TABLE product (

id INTEGER NOT NULL,

brand_id INTEGER,

sku VARCHAR(80),

PRIMARY KEY (id),

FOREIGN KEY(brand_id) REFERENCES brand (id),

UNIQUE (brand_id, sku)

)

2007-08-25 13:55:19,450 INFO sqlalchemy.engine.base.Engine.0x..50

... None

2007-08-25 13:55:19,451 INFO sqlalchemy.engine.base.Engine.0x..50

COMMIT

CHECK constraints

CheckConstraints can also be specified, either at the column level (in which case they should only refer to the column on which they are defined), or at the Table level (in which case they should refer only to any column in the table). CheckConstraints are specified with a text constraint that will be passed directly through to the underlying database implementation, so care should be taken if you want to maintain database independence in the presence of CheckConstraints. MySQL and SQLite, in particular, do not actively support such constraints.

For instance, if you wanted to validate that payments were always positive amounts, you might create a payment table similar to the following:

payment_table = Table(

'payment', metadata,

Column('amount', Numeric(10,2), CheckConstraint('amount > 0')))

To see the SQL generated, we can execute the table creation statements on SQLite (recognizing that SQLite will not enforce the CHECK constraint):

>>> payment_table.create(bind=create_engine('sqlite://', echo=True))

2007-08-25 14:13:13,132 INFO sqlalchemy.engine.base.Engine.0x..90

CREATE TABLE payment (

amount NUMERIC(10, 2) CHECK (amount > 0)

)

2007-08-25 14:13:13,133 INFO sqlalchemy.engine.base.Engine.0x..90

... None

2007-08-25 14:13:13,133 INFO sqlalchemy.engine.base.Engine.0x..90

... COMMIT

You can also generate CHECK constraints involving multiple columns:

>>> discount_table = Table(

... 'discount', metadata,

... Column('original', Numeric(10,2), CheckConstraint('original

... > 0')),

... Column('discounted', Numeric(10,2),

... CheckConstraint('discounted > 0')),

... CheckConstraint('discounted < original',

... name='check_constraint_1'))

>>>

>>> discount_table.create(bind=create_engine('sqlite://',

... echo=True))

2007-08-25 14:17:57,600 INFO sqlalchemy.engine.base.Engine.0x..d0

CREATE TABLE discount (

original NUMERIC(10, 2) CHECK (original > 0),

discounted NUMERIC(10, 2) CHECK (discounted > 0),

CONSTRAINT check_constraint_1 CHECK (discounted < original)

)

2007-08-25 14:17:57,601 INFO sqlalchemy.engine.base.Engine.0x..d0

... None

2007-08-25 14:17:57,602 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

Defaults

SQLAlchemy provides several methods of generating default values for columns when inserting and updating rows. These default values fall into one of two categories: active defaults or passive defaults.

Active defaults

Active defaults are values that are generated by SQLAlchemy and then sent to the database in a separate statement. Active defaults include constants, Python callables, SQL expressions (including function calls) to be executed before the insert or update, or a pre-executed sequence. In all of these cases, SQLAlchemy manages the generation of the default value and the statement that actually sends the default to the database.

Active defaults are divided into two classes: insert defaults and the update defaults, which are specified separately (allowing a different default on insert and update, if that is desired). To specify an insert default, use the default parameter when creating the Column object. default can be a constant, a Python callable, an SQL expression, or an SQL sequence. For instance, to record the time at which a user record was created, you might use the following:

from datetime import datetime

user_table = Table(

'tf_user', MetaData(),

Column('id', Integer, primary_key=True),

Column('user_name', Unicode(16), unique=True, nullable=False),

Column('password', Unicode(40), nullable=False),

Column('first_name', Unicode(255), default=''),

Column('last_name', Unicode(255), default=''),

Column('created_apptime', DateTime, default=datetime.now),

Column('created_dbtime', DateTime,

default=func.current_timestamp(),

Column('modified', DateTime, onupdate=datetime.now)))

Here, we have created several defaults with constants, as well as two “created” defaults. One is the standard library function datetime.now⁠(⁠ ⁠), and the other is the SQL function CURRENT_TIMESTAMP. The created_apptime column, upon insertion, will contain the current time on the application’s machine, whereas the created_dbtime column will contain the database server’s current time. The SQL generated is illustrative:

>>> e=create_engine('sqlite://', echo=True)

>>> user_table.create(bind=e)

2007-08-25 14:52:17,595 INFO sqlalchemy.engine.base.Engine.0x..50

CREATE TABLE tf_user (

id INTEGER NOT NULL,

user_name VARCHAR(16) NOT NULL,

password VARCHAR(40) NOT NULL,

first_name VARCHAR(255),

last_name VARCHAR(255),

created_apptime TIMESTAMP,

created_dbtime TIMESTAMP,

modified TIMESTAMP,

PRIMARY KEY (id),

UNIQUE (user_name)

)1

2007-08-25 14:52:17,596 INFO sqlalchemy.engine.base.Engine.0x..50

... None

2007-08-25 14:52:17,597 INFO sqlalchemy.engine.base.Engine.0x..50

... COMMIT

>>>

>>> e.execute(user_table.insert(), user_name='rick', password='foo')

2007-08-25 14:52:17,604 INFO sqlalchemy.engine.base.Engine.0x..50

... SELECT current_timestamp 2

2007-08-25 14:52:17,605 INFO sqlalchemy.engine.base.Engine.0x..50 []

2007-08-25 14:52:17,606 INFO sqlalchemy.engine.base.Engine.0x..50

... INSERT INTO tf_user (user_name, password, first_name,

... last_name, created_apptime, created_dbtime) VALUES (?,

... ?, ?, ?, ?, ?)

2007-08-25 14:52:17,606 INFO sqlalchemy.engine.base.Engine.0x..50

... ['rick', 'foo', '', '', '2007-08-25 14:52:17.604140',

... u'2007-08-25 18:52:17'] 3

2007-08-25 14:52:17,607 INFO sqlalchemy.engine.base.Engine.0x..50

... COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0x2aff31673690>

>>> e.execute(user_table.update(user_table.c.user_name=='rick'),

... password='secret')

2007-08-25 15:01:48,804 INFO sqlalchemy.engine.base.Engine.0x..50

... UPDATE tf_user SET password=?, modified=?

... 4 WHERE

... tf_user.user_name = ?

2007-08-25 15:01:48,805 INFO sqlalchemy.engine.base.Engine.0x..50

... ['secret', '2007-08-25 15:01:48.774391', 'rick']

2007-08-25 15:01:48,805 INFO sqlalchemy.engine.base.Engine.0x..50

... COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0x2adaf2551e50>

>>>

1

The SQL generated for the table creation had no reference to the default values. This is because these values were active defaults, as opposed to the passive defaults covered in the next section.

2

The current_timestamp is selected from the database for use in the insert statement.

3

Two different timestamps are sent to the database, one for created_apptime, and one for created_dbtime. In this case, the application machine’s native Python time resolution is greater than the current_timestamp provided by SQLite.

4

Though we did not specify an update to the modified column, SQLAlchemy provides an update value based on the onupdate parameter of the column definition.

Passive defaults

Passive defaults are default values provided by the database itself. If a column is marked with a PassiveDefault instance, then the column will have a database-level default value and SQLAlchemy will make the Engine aware of the passive default. The Engine will, in turn, mark theResultProxy as having passive default values. The ResultProxy is actually inspected by the object-relational mapping system to determine whether to refetch the row after an insert to get the default column values.

We can enhance the previous example by providing a passive default for the created_dbtime column:

from sqlalchemy import *

from datetime import datetime

user_table = Table(

'tf_user', MetaData(),

Column('id', Integer, primary_key=True),

Column('user_name', Unicode(16), unique=True, nullable=False),

Column('password', Unicode(40), nullable=False),

Column('first_name', Unicode(255), default=''),

Column('last_name', Unicode(255), default=''),

Column('created_apptime', DateTime, default=datetime.now),

Column('created_dbtime', DateTime, PassiveDefault('sysdate')),

Column('modified', DateTime, onupdate=datetime.now))

Again, it is illustrative to see the creation and manipulation SQL:

>>> e=create_engine('sqlite://', echo=True)

>>> user_table.create(bind=e)

2007-08-25 15:50:49,912 INFO sqlalchemy.engine.base.Engine.0x..50

CREATE TABLE tf_user (

id INTEGER NOT NULL,

user_name VARCHAR(16) NOT NULL,

password VARCHAR(40) NOT NULL,

first_name VARCHAR(255),

last_name VARCHAR(255),

created_apptime TIMESTAMP,

created_dbtime TIMESTAMP DEFAULT current_timestamp,

modified TIMESTAMP,

PRIMARY KEY (id),

UNIQUE (user_name)

)1

2007-08-25 15:50:49,912 INFO sqlalchemy.engine.base.Engine.0x..50

... None

2007-08-25 15:50:49,913 INFO sqlalchemy.engine.base.Engine.0x..50

... COMMIT

>>>

>>> rs = e.execute(user_table.insert(), user_name='rick',

... password='foo')

2007-08-25 15:50:49,930 INFO sqlalchemy.engine.base.Engine.0x..50

... INSERT INTO tf_user (user_name, password, first_last_name,

... created_apptime) VALUES (?, ?, ?, ?, ?)

... 2

2007-08-25 15:50:49,931 INFO sqlalchemy.engine.base.Engine.0x..50

... ['rick', 'foo', '', '', '2007-08-25 15:50:49.930339']

2007-08-25 15:50:49,932 INFO sqlalchemy.engine.base.Engine.0x..50

... COMMIT

>>> if rs.lastrow_has_defaults(): 3

... prikey = rs.last_inserted_ids()

... row = e.execute(user_table.select(

... user_table.c.id == prikey[0])).fetchone()

... print 'Created at', row.created_dbtime

...

2007-08-25 15:50:50,966 INFO sqlalchemy.engine.base.Engine.0x..50

... SELECT tf_user.id, tf_user.user_name, tf_user.password,

... tf_user.first_name, tf_user.last_name, tf_user.created_apptime,

... tf_user.created_dbtime, tf_user.modified

FROM tf_user

WHERE tf_user.id = ?

2007-08-25 15:50:50,966 INFO sqlalchemy.engine.base.Engine.0x..50

... [1]

Created at 2007-08-25 19:50:49

1

The SQL generated for the table creation does contain a reference to the default created_dbtime, unlike the active default example.

2

The created_dbtime is not provided to the database in the insert statement; it will be provided by the database itself.

3

The result set is flagged as having a passive default via the lastrow_has_defaults() function, and so we recognize the need to fetch the row back from the database.

Caution

PostgreSQL does not support passive defaults for primary keys. This is due to the fact that SQLAlchemy does not use the PostgreSQL OIDs to determine the identity of rows inserted (OIDs are actually disabled by default in PostgreSQL version 8.), and psycopg2’scursor.lastrowid() function only returns OIDs. Thus, the only way to know the primary key of a row that is being inserted is to provide it as an active default.

Defining Indexes

Once your database grows to a certain size, you will probably need to consider adding indexes to your tables to speed up certain selects. The easiest way to index a column is to simply specify index=True when defining the Column:

user_table = Table(

'tf_user', MetaData(),

Column('id', Integer, primary_key=True),

Column('user_name', Unicode(16), unique=True, nullable=False,

... index=True),

Column('password', Unicode(40), nullable=False),

Column('first_name', Unicode(255), default=''),

Column('last_name', Unicode(255), default='', index=True))

In this case, the index will be created with an auto-generated name. If a column is defined with both index=True and unique=True, then the UNIQUE constraint is created on the index rather than on the column. The SQL generated for the previous table definition is illustrative:

>>> e = create_engine('sqlite://', echo=True)

>>> user_table.create(bind=e)

2007-08-25 16:30:36,542 INFO sqlalchemy.engine.base.Engine.0x..90

CREATE TABLE tf_user (

id INTEGER NOT NULL,

user_name VARCHAR(16) NOT NULL,

password VARCHAR(40) NOT NULL,

first_name VARCHAR(255),

last_name VARCHAR(255),

PRIMARY KEY (id)

)

2007-08-25 16:30:36,542 INFO sqlalchemy.engine.base.Engine.0x..90

... None

2007-08-25 16:30:36,543 INFO sqlalchemy.engine.base.Engine.0x..90

... COMMIT

2007-08-25 16:30:36,544 INFO sqlalchemy.engine.base.Engine.0x..90

... CREATE UNIQUE INDEX ix_tf_user_user_name ON tf_user

... (user_name)

2007-08-25 16:30:36,544 INFO sqlalchemy.engine.base.Engine.0x..90

... None

2007-08-25 16:30:36,545 INFO sqlalchemy.engine.base.Engine.0x..90

... COMMIT

2007-08-25 16:30:36,546 INFO sqlalchemy.engine.base.Engine.0x..90

... CREATE INDEX ix_tf_user_last_name ON tf_user (last_name)

2007-08-25 16:30:36,546 INFO sqlalchemy.engine.base.Engine.0x..90

... None

2007-08-25 16:30:36,547 INFO sqlalchemy.engine.base.Engine.0x..90

... COMMIT

The Index object

Although the index=True syntax is convenient in column definition, SQLAlchemy also provides an independent Index object, which can be used to:

§ Define indexes on multiple columns

§ Define named indexes

§ Create indexes independently of the table (useful when adding an index to an existing table)

To create an index using the Index object, simply instantiate the object using the column attributes of the table.c object:

i = Index('idx_name', user_table.c.first_name,

... user_table.c.last_name,

unique=True)

If the index is defined before the table is created, then the index will be created along with the table. Otherwise, you can create the index independently via its own create() function:

>>> i = Index('idx_name', user_table.c.first_name,

... user_table.c.last_name,

... unique=True)

>>> i.create(bind=e)

2007-08-25 16:30:36,566 INFO sqlalchemy.engine.base.Engine.0x..90

... CREATE UNIQUE INDEX idx_name ON tf_user (first_name, last_name)

2007-08-25 16:30:36,566 INFO sqlalchemy.engine.base.Engine.0x..90

... None

2007-08-25 16:30:36,567 INFO sqlalchemy.engine.base.Engine.0x..90

... COMMIT

Index("idx_name", Column('first_name', Unicode(length=255),

... default=ColumnDefault('')),

... Column('last_name',Unicode(length=255),

... default=ColumnDefault('')), unique=True)

Creating Explicit Sequences

In our examples up to this point, to generate a unique integer key for inserted rows, we have simply specified that the table’s primary key was an integer value. In this case, SQLAlchemy does what is generally The Right Thing: it either generates a column with an auto-incrementing data type (AUTOINCREMENT, SERIAL, etc.) if one is available in the Dialect being used, or, if an auto-incrementing data type is not available (as in the case of PostgreSQL and Oracle), it implicitly generates a sequence and fetches values from that sequence.

SQLAlchemy also provides for the explicit use of a Sequence object to generate default values for columns (not just primary keys). To use such a sequence, simply add it to the parameter list of the Column object:

brand_table = Table(

'brand', metadata,

Column('id', Integer, Sequence('brand_id_seq'), primary_key=True),

Column('name', Unicode(255), unique=True, nullable=False))

The SQL generated to create this table is:

>>> e = create_engine('postgres://postgres:password@localhost/test',

... echo=True)

>>>

>>> brand_table.create(bind=e)

2007-08-25 18:25:40,624 INFO sqlalchemy.engine.base.Engine.0x..d0

... CREATE SEQUENCE brand_id_seq

2007-08-25 18:25:40,624 INFO sqlalchemy.engine.base.Engine.0x..d0

... None

2007-08-25 18:25:40,630 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

2007-08-25 18:25:40,634 INFO sqlalchemy.engine.base.Engine.0x..d0

CREATE TABLE brand (

id INTEGER,

name VARCHAR(255) NOT NULL,

UNIQUE (name)

)

2007-08-25 18:25:40,635 INFO sqlalchemy.engine.base.Engine.0x..d0

... None

2007-08-25 18:25:40,659 INFO sqlalchemy.engine.base.Engine.0x..d0

... COMMIT

The parameters accepted by the Sequence constructor Sequence.__init__(name, start=None, increment=None, optional=False, quote=False, for_update=False) are as follows:

name

The name of the sequence to be created.

start

The initial value of the sequence being created (default None). This may be ignored, depending on the Dialect.

increment

The increment value of the sequence being created (default None). This may be ignored, depending on the Dialect.

optional

If True, this specifies that the sequence should be used only if it is necessary (e.g., if no other method of generating autoincrementing columns is possible). The default is False.

quote

This forces the sequence name to be escaped and quoted before being sent to the database (useful for names that conflict with SQL keywords, for example). The default is False.

for_update

Uses the sequence when updating the row, not just when inserting. The default is False.

MetaData Operations

SQLAlchemy uses the MetaData object internally for several purposes, particularly inside the object relational mapper (ORM), which is covered in Chapter 6. MetaData can also be used in connection with Engine and other Connectable instances to create or drop tables, indexes, and sequences from the database.

Binding MetaData

As mentioned previously, MetaData can be bound to a database Engine. This is done in one of three ways:

§ Specify the Engine URI in the MetaData constructor

§ Specify an actual Engine or other Connectable object in the MetaData constructor

§ Assign the bind attribute of an “unbound” MetaData to an Engine or other Connectable

The various ways of binding MetaData are illustrated in the following examples:

# Create a bound MetaData with an implicitly created engine

bound_meta2 = MetaData('sqlite:///test2.db')

# Create an engine and then a bound MetaData

db2 = MetaData('sqlite:///test1.db')

bound_meta1 = MetaData(db2)

# Create an unbound MetaData

unbound_meta = MetaData()

# Create an Engine and bind the MetaData to it

db1 = create_engine('sqlite://')

unbound_meta.bind = db1

Binding the MetaData object to an engine allows the MetaData and the objects attached to it (Tables, Indexes, Sequences, etc.) to perform database operations without explicitly specifying an Engine:

from sqlalchemy import *

metadata = MetaData('sqlite://')

user_table = Table(

'tf_user', metadata,

Column('id', Integer, primary_key=True),

Column('user_name', Unicode(16), unique=True, nullable=False,

... index=True),

Column('password', Unicode(40), nullable=False),

Column('first_name', Unicode(255), default=''),

Column('last_name', Unicode(255), default='', index=True))

user_table.create() # we can omit the bind parameter

Create/drop MetaData and schema objects

Bound and unbound MetaData objects can create and drop schema objects either by using the create() and drop() methods on the objects, or by using the MetaData methods create_all() and drop_all(). The schema objects’ (Table, Index, and Sequence) create() anddrop() and methods take the following keyword parameters:

bind

The Engine on which to execute the schema item creation (default is None).

checkfirst

Add an IF NOT EXISTS or IF EXISTS clause, whichever is appropriate to the SQL generated (not supported for Indexes). The default is False.

The MetaData object itself supports the following arguments to its create_all() and drop_all methods:

bind

The Engine on which to execute the operation. The default is None.

tables

The Table objects to create/drop. If not specified, create/drop all schema items known to the MetaData. The default is None.

checkfirst

Add an IF NOT EXISTS or IF EXISTS clause (whichever is appropriate to the SQL generated). The default is False.

Adapt Tables from one MetaData to another

A table that has been created against one MetaData can be adapted to another MetaData via the Table.tometadata (self, metadata, schema=None) method. This can be useful when working with the same schema in more than one Engine because it allows you to have bound MetaDataand Tables for both engines. You can also use the MetaData. table_iterator() method to reflect an entire schema into another engine, for example:

meta1 = MetaData('postgres://postgres:password@localhost/test',

... reflect=True)

meta2 = MetaData('sqlite://')

for table in meta1.table_iterator():

table.tometadata(meta2)

meta2.create_all()