Essential SQLAlchemy, 2nd Edition (2010)
Chapter 3. Engines and MetaData
This chapter introduces SQLAlchemy’s Engine and MetaData classes. The Engine class provides database connectivity, including a connection pool with various strategies for acquiring connections from the pool. The MetaData class maintains information about your database schema, including any tables and indexes defined. In this chapter, you will learn how to define a new database schema using MetaData as well as how to connect a MetaData instance to an existing schema.
Engines and Connectables
The SQLAlchemy-provided Engine class is responsible for managing the connection to the database. It does this by incorporating a database connection pool and a database-specific Dialect layer to translate the SQL expression language (Chapter 5) into database-specific SQL.
To get started using an Engine, you use the create_engine() function:
# Create a connection to a SQLite in-memory database
engine = create_engine('sqlite://')
# Create a connection to a SQLite on-disk database "data.sqlite"
engine = create_engine('sqlite:///data.sqlite')
# Create a connection to a PostGreSQL database
engine = create_engine('postgres://rick:foo@localhost:5432/pg_db')
# Create a connection to a MySQL database
engine = create_engine('mysql://localhost/mysql_db')
# Create a connection to an Oracle database (via TNS)
engine = create_engine('oracle://rick:foo@oracle_tns')
# Create a connection to an Oracle database (without a TNS name)
engine =
... create_engine('oracle://rick:foo@localhost:1521/oracle_sid')
The first argument to create_engine() is the RFC-1738 style URL specifying the database. The general form of the url is: driver://username:password@host:port/database. Of course, the various database drivers interpret these URLs in slightly different ways, as illustrated here. It is also possible to pass additional arguments to the low-level DB-API driver created by SQLAlchemy via either a URL query string:
url='postgres://rick:foo@localhost/pg_db?arg1=foo&arg2=bar'
engine = create_engine(url)
or via the connect_args parameter to create_engine():
engine = create_engine('postgres://rick:foo@localhost/pg_db',
connect_args=dict(arg1='foo', arg2='bar'))
If you wish complete control over the connection creation process, you can even pass a function (or other callable object) that returns a DB-API connection to create_engine() in the moreinfo="none">creator argument:
import psycopg
def connect_pg():
return psycopg.connect(user='rick', host='localhost')
engine = create_engine('postgres://', creator=connect_pg)
The full set of keyword arguments accepted by create_engine() are specified in here:
connect_args
A dictionary of options to be passed to the DB-API’s connect() method. The default is {}.
convert_unicode
Indicates whether the engine should convert all unicode values into raw byte strings before going into the database, and convert raw byte strings to unicode coming out of result sets. This can be useful, for instance, when dealing with a database server or schema that does not provide unicode support natively. The default is False.
creator
A callable that returns a DB-API connection. The default is None.
echo
A flag that tells SQLAlchemy to echo all statements and bind parameter values to its logger. The default is None.
echo_pool
A flag that tells SQLAlchemy to log all connection pool checkins and checkouts. The default is False.
encoding
Specifies the encoding to use in all translations between raw byte strings and Python unicode objects. The default is False.
module
Specifies which module to use when a database implementation can use more than one (such as PostgreSQL and Oracle). The default is None.
pool
Use an existing connection pool rather than creating a new one. The default is None.
poolclass
If the engine is creating its own connection pool, the class (a subclass of sqlalchemy.pool.Pool) to use when constructing the pool object. If no pool class is specified, sqlalchemy.pool.QueuePool will be used for all database drivers except for SQLite, which uses thesqlalchemy.pool.SingletonThreadPool. The default is None.
max_overflow
The number of connections to allow the connection pool to overflow to (only applicable with the QueuePool). The default is 10.
pool_size
The number of connections to keep alive in the connection pool (only applicable to the QueuePool and SingletonThreadPool pool classes). The default is 5.
pool_recycle
Close and reopen connections after this number of seconds of inactivity, or, if –1 (the default), disable connection recycling. This is useful if the database server times out connections after a period of inactivity, as MySQL does.
pool_timeout
The number of seconds to wait when getting a connection from the pool before giving up, (applicable only to QueuePool connection pools). The default is 30.
strategy
Selects an alternate implementation of the engine; the only current strategies are 'plain' and 'threadlocal‘. 'threadlocal' reuses connections for multiple statements within a thread; 'plain' (the default) uses a new connection for each statement.
threaded
Used only by cx_Oracle, makes the engine threadsafe. If this is not required, performance might be improved by setting this parameter to False.
use_ansi
Used only by Oracle to correct for a quirk of Oracle versions 8 and earlier when handling LEFT OUTER JOINs.
use_oids
Used only by PostgreSQL to enable the column name "oid" (object ID).
Configuring SQLAlchemy Logging
SQLAlchemy uses the Python standard library logging module to log various actions. The echo and echo_pool arguments to create_engine() and the echo_uow flag used on Session objects all affect the regular loggers.
One useful debugging strategy is to add a logfile for a particular class of operations that SQLAlchemy is performing. For instance, to capture all of the engine-related operations, we could set up the logger as follows:
import logging
handler = logging.FileHandler('sqlalchemy.engine.log')
handler.level = logging.DEBUG
logging.getLogger('sqlalchemy.engine').addHandler(handler)
The loggers used with SQLAlchemy are listed next. Note that several of these loggers deal with material covered in later chapters (in particular, the sqlalchemy.orm.* loggers):
§ sqlalchemy.engine—control SQL echoing. logging.INFO logs SQL query output, logging.DEBUG logs result sets as well.
§ sqlalchemy.pool—control connection pool logging. logging.INFO logs checkins and checkouts.
§ sqlalchemy.orm—control logging of ORM functions. logging.INFO logs configurations and unit of work dumps.
· sqlalchemy.orm.attributes—logs instrumented attribute operations.
· sqlalchemy.orm.mapper—logs mapper configurations and operations.
· sqlalchemy.orm.unitofwork—logs unit of work operations, including dependency graphs.
· sqlalchemy.orm.strategies—logs relation loader operations (lazy and eager loads).
· sqlalchemy.orm.sync—logs synchronization of attributes from one object to another during a flush.
Database Connections and ResultProxys
Although the Engine is the normal method of performing database operations, SQLAlchemy does make the lower-level Connection object available through the connect() method on the engine, as shown in the following example:
conn = engine.connect()
result = conn.execute('select user_name, email_address from
... tf_user')
for row in result:
print 'User name: %s Email address: %s' % (
row['user_name'], row['email_address'])
conn.close()
The Connection object is actually an instance of the sqlalchemy.engine.Connection class, which serves as a proxy for the particular DB-API connection object. The result object is an instance of the sqlalchemy.engine.ResultProxy class, which has many features in common with a database cursor.
Both Engines and Connections are implementations of the Connectable interface, which has two important methods: connect(), which in the case of a Connection simply returns itself, and execute(), which executes some SQL and generates a ResultProxy. Most SQLAlchemy functions that therefore take an Engine as a parameter (usually named bind) can also take a Connection, and vice versa.
The ResultProxy object has several useful methods and attributes for returning information about the query:
__iter__ ()
Allows iteration over a result proxy, generating RowProxy objects
fetchone ()
Fetches the next RowProxy object from the ResultProxy
fetchall ()
Fetches all RowProxy objects at once
scalar ()
Fetches the next row from the cursor and treat it as a scalar (i.e., not a RowProxy)
keys
List of the column names in the result set
rowcount
The total number of rows in the result set
close ()
Closes the ResultProxy, possibly returning the underlying Connection to the pool
The RowProxy object generated by the ResultProxy provides several useful methods that allow you to retrieve data, such as a tuple, dictionary, or object:
__getattr__ ()
Provides access to data via object.column name
__getitem__ ()
Provides access to data via object[column name] or object[column position]
keys ()
Provides a list of all the column names in the row
values ()
Provides a list of all the values in the row
items ()
Provides a list of (column name, value) tuples for the row
Connection Pooling
SQLAlchemy provides the connection pool as an easy and efficient way to manage connections through the database. Normally, you don’t need to worry about the connection pool because it is automatically managed by the Engine class. The connection pool can, however, be used on its own to manage regular DB-API connections. If you wish to manage such a pool, you could do the following:
from sqlalchemy import pool
import psycopg2
psycopg = pool.manage(psycopg2)
connection = psycopg.connect(database='mydb',
username='rick', password='foo')
The pool.manage() call sets up a connection pool (the exact object is an instance of sqlalchemy.pool.DBProxy). The connect() method then works just as the Engine’s connect() method, returning a proxy for the DB-API connection from the managed connection pool. When the connection proxy is garbage collected, the underlying DB-API connection is returned to the connection pool.
By default, the connect() method returns the same connection object if it is called multiple times in a given thread (the same “threadlocal” strategy used by the Engine). To specify that the pool should generate a new connection each time that connect() is called, passuse_threadlocal=False to the pool.manage() function.
If you wish to use a particular connection pool class instead of the DBProxy as shown previously, SQLAlchemy provides the ability to directly instantiate the pool:
from sqlalchemy import pool
import psycopg2
import sqlite
def getconn_pg():
c = psycopg2.connect(database='mydb', username='rick',
password='foo')
return c
def getconn_sl():
c = sqlite.connect(filename='devdata.sqlite')
return c
pool_pg = pool.QueuePool(getconn_pg, use_threadlocal=True)
# SQLite requires use of the SingletonThreadPool
pool_sl = pool.SingletonThreadPool(getconn_sl)
Some of the various pool types that are available in the sqlalchemy.pool module are:
AssertionPool
Allows only one connection to be checked out at a time and raises an AssertionError when this constraint is violated.
NullPool
Does no pooling; instead, actually opens and closes the underlying DB-API connection on each checkout/checkin of a connection.
QueuePool
Maintains a fixed-size connection pool. This is the default connection pool class used for nonsqlite connections.
SingletonThreadPool
Maintains a single connection per thread. It is used with sqlite because this database driver does not handle using a single connection in multiple threads well.
StaticPool
Maintains a single connection that is returned for all connection requests.
MetaData
SQLAlchemy provides the MetaData class, which collects objects that describe tables, indexes, and other schema-level objects. Before using any of the higher-level features of SQLAlchemy, such as the SQL query language and the ORM, the schema of the database must be described using metadata. In some cases, you can reflect the structure of schema items into the MetaData from the database. In this case, you need only specify the name of the entity, and its structure will be loaded from the database directly.
Getting Started with MetaData
To create a new MetaData object, you simply call its constructor, possibly with information about how to connect to the database. If the constructor is called with no arguments, it is considered to be unbound; if it is called with either an Engine or a SQL connection URI, it is consideredbound. Shortcuts are available to bound MetaData and to objects within a bound MetaData to facilitate the execution of statements against the bound engine. Most of the time you will probably use a bound MetaData object. However, it is sometimes useful to use an unbound MetaData if you need to connect to multiple database servers, where each server contains the same database schema.
The various ways to construct MetaData objects are illustrated in the following examples:
# create an unbound MetaData
unbound_meta = MetaData()
# create an Engine and bind the MetaData to it
db1 = create_engine('sqlite://')
unbound_meta.bind = db
# Create an engine and then a bound MetaData
db2 = MetaData('sqlite:///test1.db')
bound_meta1 = MetaData(db2)
# Create a bound MetaData with an implicitly created engine
bound_meta2 = MetaData('sqlite:///test2.db')
Note that you are never required to bind the MetaData object; all operations that rely on a database connection can also be executed by passing the Engine explicitly as the keyword parameter bind. This is referred to as explicit execution. If a MetaData instance is bound, then the bindparameter can be omitted from method calls that rely on the database connection. This is referred to as implicit execution. The “bound-ness” of a MetaData object is shared by all Tables, Indexes, and Sequences in the MetaData, so a Table attached to a bound MetaData, for instance, would be able to create itself via:
table.create()
whereas a Table in an unbound MetaData would need to supply a bind parameter:
table.create(bind=some_engine_or_connection)
Defining Tables
The most common use of the MetaData object is in defining the tables in your schema. To define tables in the MetaData, you use the Table and Column classes as shown in the following example:
from sqlalchemy import *
from datetime import datetime
metadata=MetaData()
user_table = Table(
'tf_user', metadata,
Column('id', Integer, primary_key=True),
Column('user_name', Unicode(16), unique=True, nullable=False),
Column('email_address', Unicode(255), unique=True, nullable=False),
Column('password', Unicode(40), nullable=False),
Column('first_name', Unicode(255), default=''),
Column('last_name', Unicode(255), default=''),
Column('created', DateTime, default=datetime.now))
Unlike some other database mapping libraries, SQLAlchemy fully supports the use of composite and noninteger primary and foreign keys:
brand_table = Table(
'brand', metadata,
Column('id', Integer, primary_key=True),
Column('name', Unicode(255), unique=True, nullable=False))
product_table = Table(
'product', metadata,
Column('brand_id', Integer, ForeignKey('brand.id'),
... primary_key=True),
Column('sku', Unicode(80), primary_key=True))
style_table = Table(
'style', metadata,
Column('brand_id', Integer, primary_key=True),
Column('sku', Unicode(80), primary_key=True),
Column('code', Unicode(80), primary_key=True),
ForeignKeyConstraint(['brand_id', 'sku'],
['product.brand_id',
'product.sku']))
To actually create a table, you can call the create() method on it. Here, we will create the style table on an in-memory SQLite database and view the generated SQL:
>>> style_table.create(bind=create_engine('sqlite://', echo=True))
2007-08-25 08:05:44,396 INFO sqlalchemy.engine.base.Engine.0x..50
CREATE TABLE style (
brand_id INTEGER NOT NULL,
sku VARCHAR(80) NOT NULL,
code VARCHAR(80) NOT NULL,
PRIMARY KEY (brand_id, sku, code),
FOREIGN KEY(brand_id, sku) REFERENCES product (brand_id, sku)
)
2007-08-25 08:05:44,396 INFO sqlalchemy.engine.base.Engine.0x..50
... None
2007-08-25 08:05:44,397 INFO sqlalchemy.engine.base.Engine.0x..50
... COMMIT
We see that the composite primary key and foreign key constraints are correctly generated. Although the foreign key constraints are ignored by SQLite, it is still useful to generate them, as SQLAlchemy can use this information to perform joins automatically based on the foreign key relationships between tables.
The Table constructor Table.__init__(self, name, metadata,*args, **kwargs), takes the following arguments:
name
The table name as known by the database (may be combined with the schema parameter).
metadata
The MetaData object to which to attach this table.
*args
The series of Column and Constraint objects to define for this table.
schema
The schema name for this table, if required by the database. In **kwargs, the default is None.
autoload
Indicates whether to reflect the columns from the database. In **kwargs, the default is False.
autoload_with
The Connectable used to autoload the columns. In **kwargs, the default is None.
include_columns
The list of column names (strings) to be reflected if autoload=True. If None, all columns are reflected. If not None, any columns omitted from the list will not be represented on the reflected Table object. In **kwargs, the default is None.
mustexist
Indicates that the table must already be defined elsewhere in the Python application (as part of this MetaData). An exception is raised if this is not true. In **kwargs, the default is False.
useexisting
Directs SQLAlchemy to use the previous Table definition for this table name if it exists elsewhere in the application. (SQLAlchemy disregards the rest of the constructor arguments if this is True.) in **kwargs, the default is False.
owner
Specifies the owning user of the table. This is useful for some databases (such as Oracle) to help with table reflection. In **kwargs, the default is None.
quote
Forces the table name to be escaped and quoted before being sent to the database (useful for table names that conflict with SQL keywords, for example). In **kwargs, the default is False.
quote_schema
Forces the schema name to be escaped and quoted before being sent to the database. In **kwargs, the default is False.
The Table constructor also supports database-specific keyword arguments. For instance, the MySQL driver supports the mysql_engine argument to specify the backend database driver (i.e., 'InnoDB' or 'MyISAM', for instance).
Table reflection
Tables can also be defined using reflection from an existing database. This requires a database connection, and so either a bound MetaData must be used, or a Connectable must be supplied via the autoload_with parameter:
db = create_engine('sqlite:///devdata.sqlite')
brand_table = Table('brand', metadata, autoload=True,
... autoload_with=db)
You can also override the reflected columns if necessary. This can be particularly useful when specifying custom column data types, or when the database’s introspection facilities fail to identify certain constraints.
brand_table = Table('brand', metadata,
Column('name', Unicode(255)), # override the reflected type
autoload=True)
If you want to reflect the entire database schema, you may do so by specifying reflect=True in the metadata constructor. Of course, in this case, the MetaData must be created as a bound MetaData. When reflecting an entire schema in this way, the individual tables can be accessed via the MetaData’s tables attribute:
db = create_engine('sqlite:///devdata.sqlite')
metadata = MetaData(bind=db, reflect=True)
brand_table = metadata.tables['brand']
You can also use the reflect() method of the MetaData to load the schema. MetaData.reflect(bind=None, schema=None, only=None) takes the following arguments:
bind
A Connectable used to access the database; required only when the MetaData is unbound. The default is None.
schema
Specifies an alternate schema from which to reflect tables. The default is None.
only
Directs the MetaData to load only a subset of the available tables. This can be specified either as a sequence of the names to be loaded or as a boolean callable that will be called for each available table with the parameters only(metadata, table name). If the callable returns True, the table will be reflected. The default is None.
The MetaData constructor itself has the definition MetaData.__init__(bind=None, reflect=None).
Column Definitions
The Column constructor Column.__init__(self, name, type_, *args, **kwargs) takes the following arguments:
name
The name of the column as it is known by the database.
type_
The TypeEngine for this column. This can also be None if the column is a ForeignKey, in which case the type will be the same as the referenced column.
*args
Constraint, ForeignKey, ColumnDefault, andSequence objects that apply to the column.
key
An alias for this column. If specified, the column will be identified everywhere in Python by this name rather than by its SQL-native name. In **kwargs, the default is None.
primary_key
If True, marks the column as part of the primary key. (Alternatively, the Table can have a PrimaryKeyConstraint defined.) In **kwargs, the default is False.
nullable
If set to False, this does not allow None as a value for the column. In **kwargs, the default is True, unless the column is a primary key.
default
A Python callable or a SQL expression language construct specifying a default value for this column. Note that this is an active (Python-generated) default when a callable is specified; the SQL has the generated value inserted as a literal. In **kwargs, the default is None.
index
Indicates that the column is indexed (with an autogenerated index name). Alternatively, use an Index object in the table declaration instead. In **kwargs, the default is False.
unique
Indicates that the column has a unique constraint. Alternatively, use a UniqueConstraint object in the table declation instead. In **kwargs, the default is False.
onupdate
Specifies an active default value (generated by SQLAlchemy rather than the database server) to be used when updating (but not inserting) a row in the table. In **kwargs, the default is None.
autoincrement
Indicates that integer-based primary keys should have autoincrementing behavior. This is applicable only if the column has no default value and is a type or subtype of Integer. In **kwargs, the default is True.
quote
This forces the column name to be escaped and quoted before being sent to the database (useful for column names that conflict with SQL keywords, for example). In **kwargs, the default is False.
Constraints
SQLAlchemy also supports a variety of constraints, both at the column level and at the table level. All constraints are derived from the Constraint class, and take an optional name parameter.
Note
If the name is not specified, SQLAlchemy auto-generates a suitable name if necessary.
Primary keys
The usual way to declare primary key columns is to specify primary_key=True in the Column constructor:
product_table = Table(
'product', metadata,
Column('brand_id', Integer, ForeignKey('brand.id'),
... primary_key=True),
Column('sku', Unicode(80), primary_key=True))
You can also specify primary keys using the PrimaryKeyConstraint object:
product_table = Table(
'product', metadata,
Column('brand_id', Integer, ForeignKey('brand.id')),
Column('sku', Unicode(80)),
PrimaryKeyConstraint('brand_id', 'sku', name='prikey'))
To see the SQL generated to create such a table, we can create it on the in-memory SQLite database:
>>> product_table.create(bind=create_engine('sqlite://', echo=True))
2007-08-25 14:26:56,706 INFO sqlalchemy.engine.base.Engine.0x..d0
CREATE TABLE product (
brand_id INTEGER,
sku VARCHAR(80),
CONSTRAINT prikey PRIMARY KEY (brand_id, sku),
FOREIGN KEY(brand_id) REFERENCES brand (id)
)
2007-08-25 14:26:56,706 INFO sqlalchemy.engine.base.Engine.0x..d0
... None
2007-08-25 14:26:56,707 INFO sqlalchemy.engine.base.Engine.0x..d0
... COMMIT
Foreign keys
Foreign keys are references from a row in one table to a row in another table. The usual way to specify simple (noncomplex) foreign keys is by passing a ForeignKey object to the Column constructor. The ForeignKey constructor ForeignKey.__init__(self, column, constraint=None,use_alter=False, name=None, onupdate=None, ondelete=None) takes the following parameters:
column
Either a Column object or a database-recognized string, such as tablename.columnname or schemaname.tablename.columnname, that specifies the referenced column.
constraint
The owning ForeignKeyConstraint, if any. If left unspecified, a new ForeignKeyConstraint will be created and added to the parent table. The default is None.
use_alter
Use an ALTER TABLE command to create the constraint (passed along to the owning ForeignKeyConstraint). Otherwise, the constraint will be created in the CREATE TABLE statement. The default is False.
name
The name of the constraint (passed along to the owning ForeignKeyConstraint). The default is None.
onupdate
Generates an ON UPDATE clause in the SQL for the constraint (e.g., onupdate='CASCADE' would generate “ON UPDATE CASCADE” to cascade changes in the referenced columns to the foreign key). Commonly supported values for ON UPDATE are RESTRICT (raise an error), CASCADE (shown previously), SET NULL (set the column to NULL), and SET DEFAULT (set the column to its passive default). The default for this parameter is None. Not supported for all database drivers/backends.
ondelete
Generates an ON DELETE clause in the SQL for the constraint (e.g., ondelete='CASCADE' would generate “ON DELETE CASCADE” to cascade deletions of the referenced row to the row with the foreign key). The default is None. Not supported for all database drivers/backends.
If you need to reference a compound primary key, SQLAlchemy provides the ForeignKeyConstraint class for increased flexibility. To use the ForeignKeyConstraint, simply pass a list of columns in the local table (the compound foreign key) and a list of columns in the referenced table (the compound primary key):
style_table = Table(
'style', metadata,
Column('brand_id', Integer, primary_key=True),
Column('sku', Unicode(80), primary_key=True),
Column('code', Unicode(80), primary_key=True),
ForeignKeyConstraint(
['brand_id', 'sku'],
['product.brand_id', 'product.sku']))
The ForeignKeyConstraint constructor ForeignKeyConstraint.__init__(self, columns, refcolumns, name=None, onupdate=None, ondelete=None, use_alter=False) takes the same parameters as the ForeignKey constructor except for columns and refcolumns:
columns
Either a list of Column objects or a list of database-recognized strings (such as tablename.columnname or schemaname.tablename.columnname) that specifies the referenced column in the local table (the compound foreign key)
refcolumns
Either a list of Column objects or a list of database-recognized strings (such as tablename.columnname or schemaname.tablename.columnname) that specifies the referenced column in the remote table (the compound primary key)
UNIQUE constraints
UniqueConstraint is a more flexible version of specifying unique=True in the Column definition, as it allows multiple columns to participate in a uniqueness constraint:
product_table = Table(
'product', metadata,
Column('id', Integer, primary_key=True),
Column('brand_id', Integer, ForeignKey('brand.id')),
Column('sku', Unicode(80)),
UniqueConstraint('brand_id', 'sku'))
The SQL generated is just as we would expect:
>>> product_table.create(bind=create_engine('sqlite://', echo=True))
2007-08-25 13:55:19,450 INFO sqlalchemy.engine.base.Engine.0x..50
CREATE TABLE product (
id INTEGER NOT NULL,
brand_id INTEGER,
sku VARCHAR(80),
PRIMARY KEY (id),
FOREIGN KEY(brand_id) REFERENCES brand (id),
UNIQUE (brand_id, sku)
)
2007-08-25 13:55:19,450 INFO sqlalchemy.engine.base.Engine.0x..50
... None
2007-08-25 13:55:19,451 INFO sqlalchemy.engine.base.Engine.0x..50
COMMIT
CHECK constraints
CheckConstraints can also be specified, either at the column level (in which case they should only refer to the column on which they are defined), or at the Table level (in which case they should refer only to any column in the table). CheckConstraints are specified with a text constraint that will be passed directly through to the underlying database implementation, so care should be taken if you want to maintain database independence in the presence of CheckConstraints. MySQL and SQLite, in particular, do not actively support such constraints.
For instance, if you wanted to validate that payments were always positive amounts, you might create a payment table similar to the following:
payment_table = Table(
'payment', metadata,
Column('amount', Numeric(10,2), CheckConstraint('amount > 0')))
To see the SQL generated, we can execute the table creation statements on SQLite (recognizing that SQLite will not enforce the CHECK constraint):
>>> payment_table.create(bind=create_engine('sqlite://', echo=True))
2007-08-25 14:13:13,132 INFO sqlalchemy.engine.base.Engine.0x..90
CREATE TABLE payment (
amount NUMERIC(10, 2) CHECK (amount > 0)
)
2007-08-25 14:13:13,133 INFO sqlalchemy.engine.base.Engine.0x..90
... None
2007-08-25 14:13:13,133 INFO sqlalchemy.engine.base.Engine.0x..90
... COMMIT
You can also generate CHECK constraints involving multiple columns:
>>> discount_table = Table(
... 'discount', metadata,
... Column('original', Numeric(10,2), CheckConstraint('original
... > 0')),
... Column('discounted', Numeric(10,2),
... CheckConstraint('discounted > 0')),
... CheckConstraint('discounted < original',
... name='check_constraint_1'))
>>>
>>> discount_table.create(bind=create_engine('sqlite://',
... echo=True))
2007-08-25 14:17:57,600 INFO sqlalchemy.engine.base.Engine.0x..d0
CREATE TABLE discount (
original NUMERIC(10, 2) CHECK (original > 0),
discounted NUMERIC(10, 2) CHECK (discounted > 0),
CONSTRAINT check_constraint_1 CHECK (discounted < original)
)
2007-08-25 14:17:57,601 INFO sqlalchemy.engine.base.Engine.0x..d0
... None
2007-08-25 14:17:57,602 INFO sqlalchemy.engine.base.Engine.0x..d0
... COMMIT
Defaults
SQLAlchemy provides several methods of generating default values for columns when inserting and updating rows. These default values fall into one of two categories: active defaults or passive defaults.
Active defaults
Active defaults are values that are generated by SQLAlchemy and then sent to the database in a separate statement. Active defaults include constants, Python callables, SQL expressions (including function calls) to be executed before the insert or update, or a pre-executed sequence. In all of these cases, SQLAlchemy manages the generation of the default value and the statement that actually sends the default to the database.
Active defaults are divided into two classes: insert defaults and the update defaults, which are specified separately (allowing a different default on insert and update, if that is desired). To specify an insert default, use the default parameter when creating the Column object. default can be a constant, a Python callable, an SQL expression, or an SQL sequence. For instance, to record the time at which a user record was created, you might use the following:
from datetime import datetime
user_table = Table(
'tf_user', MetaData(),
Column('id', Integer, primary_key=True),
Column('user_name', Unicode(16), unique=True, nullable=False),
Column('password', Unicode(40), nullable=False),
Column('first_name', Unicode(255), default=''),
Column('last_name', Unicode(255), default=''),
Column('created_apptime', DateTime, default=datetime.now),
Column('created_dbtime', DateTime,
default=func.current_timestamp(),
Column('modified', DateTime, onupdate=datetime.now)))
Here, we have created several defaults with constants, as well as two “created” defaults. One is the standard library function datetime.now( ), and the other is the SQL function CURRENT_TIMESTAMP. The created_apptime column, upon insertion, will contain the current time on the application’s machine, whereas the created_dbtime column will contain the database server’s current time. The SQL generated is illustrative:
>>> e=create_engine('sqlite://', echo=True)
>>> user_table.create(bind=e)
2007-08-25 14:52:17,595 INFO sqlalchemy.engine.base.Engine.0x..50
CREATE TABLE tf_user (
id INTEGER NOT NULL,
user_name VARCHAR(16) NOT NULL,
password VARCHAR(40) NOT NULL,
first_name VARCHAR(255),
last_name VARCHAR(255),
created_apptime TIMESTAMP,
created_dbtime TIMESTAMP,
modified TIMESTAMP,
PRIMARY KEY (id),
UNIQUE (user_name)
)
2007-08-25 14:52:17,596 INFO sqlalchemy.engine.base.Engine.0x..50
... None
2007-08-25 14:52:17,597 INFO sqlalchemy.engine.base.Engine.0x..50
... COMMIT
>>>
>>> e.execute(user_table.insert(), user_name='rick', password='foo')
2007-08-25 14:52:17,604 INFO sqlalchemy.engine.base.Engine.0x..50
... SELECT current_timestamp
2007-08-25 14:52:17,605 INFO sqlalchemy.engine.base.Engine.0x..50 []
2007-08-25 14:52:17,606 INFO sqlalchemy.engine.base.Engine.0x..50
... INSERT INTO tf_user (user_name, password, first_name,
... last_name, created_apptime, created_dbtime) VALUES (?,
... ?, ?, ?, ?, ?)
2007-08-25 14:52:17,606 INFO sqlalchemy.engine.base.Engine.0x..50
... ['rick', 'foo', '', '', '2007-08-25 14:52:17.604140',
... u'2007-08-25 18:52:17']
2007-08-25 14:52:17,607 INFO sqlalchemy.engine.base.Engine.0x..50
... COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x2aff31673690>
>>> e.execute(user_table.update(user_table.c.user_name=='rick'),
... password='secret')
2007-08-25 15:01:48,804 INFO sqlalchemy.engine.base.Engine.0x..50
... UPDATE tf_user SET password=?, modified=?
... WHERE
... tf_user.user_name = ?
2007-08-25 15:01:48,805 INFO sqlalchemy.engine.base.Engine.0x..50
... ['secret', '2007-08-25 15:01:48.774391', 'rick']
2007-08-25 15:01:48,805 INFO sqlalchemy.engine.base.Engine.0x..50
... COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x2adaf2551e50>
>>>
The SQL generated for the table creation had no reference to the default values. This is because these values were active defaults, as opposed to the passive defaults covered in the next section.
The current_timestamp is selected from the database for use in the insert statement.
Two different timestamps are sent to the database, one for created_apptime, and one for created_dbtime. In this case, the application machine’s native Python time resolution is greater than the current_timestamp provided by SQLite.
Though we did not specify an update to the modified column, SQLAlchemy provides an update value based on the onupdate parameter of the column definition.
Passive defaults
Passive defaults are default values provided by the database itself. If a column is marked with a PassiveDefault instance, then the column will have a database-level default value and SQLAlchemy will make the Engine aware of the passive default. The Engine will, in turn, mark theResultProxy as having passive default values. The ResultProxy is actually inspected by the object-relational mapping system to determine whether to refetch the row after an insert to get the default column values.
We can enhance the previous example by providing a passive default for the created_dbtime column:
from sqlalchemy import *
from datetime import datetime
user_table = Table(
'tf_user', MetaData(),
Column('id', Integer, primary_key=True),
Column('user_name', Unicode(16), unique=True, nullable=False),
Column('password', Unicode(40), nullable=False),
Column('first_name', Unicode(255), default=''),
Column('last_name', Unicode(255), default=''),
Column('created_apptime', DateTime, default=datetime.now),
Column('created_dbtime', DateTime, PassiveDefault('sysdate')),
Column('modified', DateTime, onupdate=datetime.now))
Again, it is illustrative to see the creation and manipulation SQL:
>>> e=create_engine('sqlite://', echo=True)
>>> user_table.create(bind=e)
2007-08-25 15:50:49,912 INFO sqlalchemy.engine.base.Engine.0x..50
CREATE TABLE tf_user (
id INTEGER NOT NULL,
user_name VARCHAR(16) NOT NULL,
password VARCHAR(40) NOT NULL,
first_name VARCHAR(255),
last_name VARCHAR(255),
created_apptime TIMESTAMP,
created_dbtime TIMESTAMP DEFAULT current_timestamp,
modified TIMESTAMP,
PRIMARY KEY (id),
UNIQUE (user_name)
)
2007-08-25 15:50:49,912 INFO sqlalchemy.engine.base.Engine.0x..50
... None
2007-08-25 15:50:49,913 INFO sqlalchemy.engine.base.Engine.0x..50
... COMMIT
>>>
>>> rs = e.execute(user_table.insert(), user_name='rick',
... password='foo')
2007-08-25 15:50:49,930 INFO sqlalchemy.engine.base.Engine.0x..50
... INSERT INTO tf_user (user_name, password, first_last_name,
... created_apptime) VALUES (?, ?, ?, ?, ?)
...
2007-08-25 15:50:49,931 INFO sqlalchemy.engine.base.Engine.0x..50
... ['rick', 'foo', '', '', '2007-08-25 15:50:49.930339']
2007-08-25 15:50:49,932 INFO sqlalchemy.engine.base.Engine.0x..50
... COMMIT
>>> if rs.lastrow_has_defaults():
... prikey = rs.last_inserted_ids()
... row = e.execute(user_table.select(
... user_table.c.id == prikey[0])).fetchone()
... print 'Created at', row.created_dbtime
...
2007-08-25 15:50:50,966 INFO sqlalchemy.engine.base.Engine.0x..50
... SELECT tf_user.id, tf_user.user_name, tf_user.password,
... tf_user.first_name, tf_user.last_name, tf_user.created_apptime,
... tf_user.created_dbtime, tf_user.modified
FROM tf_user
WHERE tf_user.id = ?
2007-08-25 15:50:50,966 INFO sqlalchemy.engine.base.Engine.0x..50
... [1]
Created at 2007-08-25 19:50:49
The SQL generated for the table creation does contain a reference to the default created_dbtime, unlike the active default example.
The created_dbtime is not provided to the database in the insert statement; it will be provided by the database itself.
The result set is flagged as having a passive default via the lastrow_has_defaults() function, and so we recognize the need to fetch the row back from the database.
Caution
PostgreSQL does not support passive defaults for primary keys. This is due to the fact that SQLAlchemy does not use the PostgreSQL OIDs to determine the identity of rows inserted (OIDs are actually disabled by default in PostgreSQL version 8.), and psycopg2’scursor.lastrowid() function only returns OIDs. Thus, the only way to know the primary key of a row that is being inserted is to provide it as an active default.
Defining Indexes
Once your database grows to a certain size, you will probably need to consider adding indexes to your tables to speed up certain selects. The easiest way to index a column is to simply specify index=True when defining the Column:
user_table = Table(
'tf_user', MetaData(),
Column('id', Integer, primary_key=True),
Column('user_name', Unicode(16), unique=True, nullable=False,
... index=True),
Column('password', Unicode(40), nullable=False),
Column('first_name', Unicode(255), default=''),
Column('last_name', Unicode(255), default='', index=True))
In this case, the index will be created with an auto-generated name. If a column is defined with both index=True and unique=True, then the UNIQUE constraint is created on the index rather than on the column. The SQL generated for the previous table definition is illustrative:
>>> e = create_engine('sqlite://', echo=True)
>>> user_table.create(bind=e)
2007-08-25 16:30:36,542 INFO sqlalchemy.engine.base.Engine.0x..90
CREATE TABLE tf_user (
id INTEGER NOT NULL,
user_name VARCHAR(16) NOT NULL,
password VARCHAR(40) NOT NULL,
first_name VARCHAR(255),
last_name VARCHAR(255),
PRIMARY KEY (id)
)
2007-08-25 16:30:36,542 INFO sqlalchemy.engine.base.Engine.0x..90
... None
2007-08-25 16:30:36,543 INFO sqlalchemy.engine.base.Engine.0x..90
... COMMIT
2007-08-25 16:30:36,544 INFO sqlalchemy.engine.base.Engine.0x..90
... CREATE UNIQUE INDEX ix_tf_user_user_name ON tf_user
... (user_name)
2007-08-25 16:30:36,544 INFO sqlalchemy.engine.base.Engine.0x..90
... None
2007-08-25 16:30:36,545 INFO sqlalchemy.engine.base.Engine.0x..90
... COMMIT
2007-08-25 16:30:36,546 INFO sqlalchemy.engine.base.Engine.0x..90
... CREATE INDEX ix_tf_user_last_name ON tf_user (last_name)
2007-08-25 16:30:36,546 INFO sqlalchemy.engine.base.Engine.0x..90
... None
2007-08-25 16:30:36,547 INFO sqlalchemy.engine.base.Engine.0x..90
... COMMIT
The Index object
Although the index=True syntax is convenient in column definition, SQLAlchemy also provides an independent Index object, which can be used to:
§ Define indexes on multiple columns
§ Define named indexes
§ Create indexes independently of the table (useful when adding an index to an existing table)
To create an index using the Index object, simply instantiate the object using the column attributes of the table.c object:
i = Index('idx_name', user_table.c.first_name,
... user_table.c.last_name,
unique=True)
If the index is defined before the table is created, then the index will be created along with the table. Otherwise, you can create the index independently via its own create() function:
>>> i = Index('idx_name', user_table.c.first_name,
... user_table.c.last_name,
... unique=True)
>>> i.create(bind=e)
2007-08-25 16:30:36,566 INFO sqlalchemy.engine.base.Engine.0x..90
... CREATE UNIQUE INDEX idx_name ON tf_user (first_name, last_name)
2007-08-25 16:30:36,566 INFO sqlalchemy.engine.base.Engine.0x..90
... None
2007-08-25 16:30:36,567 INFO sqlalchemy.engine.base.Engine.0x..90
... COMMIT
Index("idx_name", Column('first_name', Unicode(length=255),
... default=ColumnDefault('')),
... Column('last_name',Unicode(length=255),
... default=ColumnDefault('')), unique=True)
Creating Explicit Sequences
In our examples up to this point, to generate a unique integer key for inserted rows, we have simply specified that the table’s primary key was an integer value. In this case, SQLAlchemy does what is generally The Right Thing: it either generates a column with an auto-incrementing data type (AUTOINCREMENT, SERIAL, etc.) if one is available in the Dialect being used, or, if an auto-incrementing data type is not available (as in the case of PostgreSQL and Oracle), it implicitly generates a sequence and fetches values from that sequence.
SQLAlchemy also provides for the explicit use of a Sequence object to generate default values for columns (not just primary keys). To use such a sequence, simply add it to the parameter list of the Column object:
brand_table = Table(
'brand', metadata,
Column('id', Integer, Sequence('brand_id_seq'), primary_key=True),
Column('name', Unicode(255), unique=True, nullable=False))
The SQL generated to create this table is:
>>> e = create_engine('postgres://postgres:password@localhost/test',
... echo=True)
>>>
>>> brand_table.create(bind=e)
2007-08-25 18:25:40,624 INFO sqlalchemy.engine.base.Engine.0x..d0
... CREATE SEQUENCE brand_id_seq
2007-08-25 18:25:40,624 INFO sqlalchemy.engine.base.Engine.0x..d0
... None
2007-08-25 18:25:40,630 INFO sqlalchemy.engine.base.Engine.0x..d0
... COMMIT
2007-08-25 18:25:40,634 INFO sqlalchemy.engine.base.Engine.0x..d0
CREATE TABLE brand (
id INTEGER,
name VARCHAR(255) NOT NULL,
UNIQUE (name)
)
2007-08-25 18:25:40,635 INFO sqlalchemy.engine.base.Engine.0x..d0
... None
2007-08-25 18:25:40,659 INFO sqlalchemy.engine.base.Engine.0x..d0
... COMMIT
The parameters accepted by the Sequence constructor Sequence.__init__(name, start=None, increment=None, optional=False, quote=False, for_update=False) are as follows:
name
The name of the sequence to be created.
start
The initial value of the sequence being created (default None). This may be ignored, depending on the Dialect.
increment
The increment value of the sequence being created (default None). This may be ignored, depending on the Dialect.
optional
If True, this specifies that the sequence should be used only if it is necessary (e.g., if no other method of generating autoincrementing columns is possible). The default is False.
quote
This forces the sequence name to be escaped and quoted before being sent to the database (useful for names that conflict with SQL keywords, for example). The default is False.
for_update
Uses the sequence when updating the row, not just when inserting. The default is False.
MetaData Operations
SQLAlchemy uses the MetaData object internally for several purposes, particularly inside the object relational mapper (ORM), which is covered in Chapter 6. MetaData can also be used in connection with Engine and other Connectable instances to create or drop tables, indexes, and sequences from the database.
Binding MetaData
As mentioned previously, MetaData can be bound to a database Engine. This is done in one of three ways:
§ Specify the Engine URI in the MetaData constructor
§ Specify an actual Engine or other Connectable object in the MetaData constructor
§ Assign the bind attribute of an “unbound” MetaData to an Engine or other Connectable
The various ways of binding MetaData are illustrated in the following examples:
# Create a bound MetaData with an implicitly created engine
bound_meta2 = MetaData('sqlite:///test2.db')
# Create an engine and then a bound MetaData
db2 = MetaData('sqlite:///test1.db')
bound_meta1 = MetaData(db2)
# Create an unbound MetaData
unbound_meta = MetaData()
# Create an Engine and bind the MetaData to it
db1 = create_engine('sqlite://')
unbound_meta.bind = db1
Binding the MetaData object to an engine allows the MetaData and the objects attached to it (Tables, Indexes, Sequences, etc.) to perform database operations without explicitly specifying an Engine:
from sqlalchemy import *
metadata = MetaData('sqlite://')
user_table = Table(
'tf_user', metadata,
Column('id', Integer, primary_key=True),
Column('user_name', Unicode(16), unique=True, nullable=False,
... index=True),
Column('password', Unicode(40), nullable=False),
Column('first_name', Unicode(255), default=''),
Column('last_name', Unicode(255), default='', index=True))
user_table.create() # we can omit the bind parameter
Create/drop MetaData and schema objects
Bound and unbound MetaData objects can create and drop schema objects either by using the create() and drop() methods on the objects, or by using the MetaData methods create_all() and drop_all(). The schema objects’ (Table, Index, and Sequence) create() anddrop() and methods take the following keyword parameters:
bind
The Engine on which to execute the schema item creation (default is None).
checkfirst
Add an IF NOT EXISTS or IF EXISTS clause, whichever is appropriate to the SQL generated (not supported for Indexes). The default is False.
The MetaData object itself supports the following arguments to its create_all() and drop_all methods:
bind
The Engine on which to execute the operation. The default is None.
tables
The Table objects to create/drop. If not specified, create/drop all schema items known to the MetaData. The default is None.
checkfirst
Add an IF NOT EXISTS or IF EXISTS clause (whichever is appropriate to the SQL generated). The default is False.
Adapt Tables from one MetaData to another
A table that has been created against one MetaData can be adapted to another MetaData via the Table.tometadata (self, metadata, schema=None) method. This can be useful when working with the same schema in more than one Engine because it allows you to have bound MetaDataand Tables for both engines. You can also use the MetaData. table_iterator() method to reflect an entire schema into another engine, for example:
meta1 = MetaData('postgres://postgres:password@localhost/test',
... reflect=True)
meta2 = MetaData('sqlite://')
for table in meta1.table_iterator():
table.tometadata(meta2)
meta2.create_all()