Generating and Using Sequences - MySQL Cookbook (2007)

MySQL Cookbook (2007)

Chapter 11. Generating and Using Sequences

Introduction

A sequence is a set of integers (1, 2, 3, ...) that are generated in order on demand. Sequences are frequently used in databases because many applications require each row in a table to contain a unique value, and sequences provide an easy way to generate them. This chapter describes how to use sequences in MySQL. It covers the following topics:

Using AUTO_INCREMENT columns to create sequences

The AUTO_INCREMENT column is MySQL’s mechanism for generating a sequence over a set of rows. Each time you create a row in a table that contains an AUTO_INCREMENT column, MySQL automatically generates the next value in the sequence as the column’s value. This value serves as a unique identifier, making sequences an easy way to create items such as customer ID numbers, shipping package waybill numbers, invoice or purchase order numbers, bug report IDs, ticket numbers, or product serial numbers.

Retrieving sequence values

For many applications, it’s not enough just to create sequence values. It’s also necessary to determine the sequence value for a just-inserted row. A web application may need to redisplay to a user the contents of a row created from the contents of a form just submitted by the user. The value may also need to be retrieved so it can be stored in rows of a related table.

Resequencing techniques

This topic describes how to renumber a sequence that has holes in it due to row deletions, and also discusses reasons to avoid resequencing. Other topics include starting sequences at values other than 1 and adding a sequence column to a table that doesn’t have one.

Using an AUTO_INCREMENT column to create multiple sequences

In many cases, the AUTO_INCREMENT column in a table is independent of other columns, and its values increment throughout the table in a single monotonic sequence. However, if you create a multiple-column index that contains an AUTO_INCREMENT column, you can use it to generate multiple sequences. For example, if you run a bulletin board that categorizes messages into topics, you can number messages sequentially within each topic by tying an AUTO_INCREMENT column to a topic indicator column.

Managing multiple simultaneous AUTO_INCREMENT values

Special care is necessary when you need to keep track of multiple sequence values. This can occur when you issue a set of statements that affect a single table or when creating rows in multiple tables that each have an AUTO_INCREMENT column. This topic describes what to do in these cases.

Using single-row sequence generators

Sequences also can be used as counters. For example, if you serve banner ads on your web site, you might increment a counter for each impression (that is, for each time you serve an ad). The counts for a given ad form a sequence, but because the count itself is the only value of interest, there is no need to generate a new row to record each impression. MySQL provides a solution for this problem using a mechanism that enables a sequence to be easily generated within a single table row over time. To store multiple counters in the table, add a column that identifies each counter uniquely. The same mechanism also enables creation of sequences that increase by values other than one, by nonuniform values, or even by negative increments.

Numbering query output rows sequentially

This topic suggests ways to generate display-only sequences for the purpose of numbering the rows of output from a query.

The engines for most database systems provide sequence-generation capabilities, although the implementations tend to be engine-dependent. That’s true for MySQL as well, so the material in this section is almost completely MySQL-specific, even at the SQL level. In other words, the SQL for generating sequences is itself nonportable, even if you use an API such as DBI or JDBC that provides an abstraction layer. Abstract interfaces may help you process SQL statements portably, but they don’t make nonportable SQL portable.

Scripts related to the examples shown in this chapter are located in the sequences directory of the recipes distribution. For scripts that create the tables used here, look in the tables directory.

Creating a Sequence Column and Generating Sequence Values

Problem

You want to include a sequence column in a table.

Solution

Use an AUTO_INCREMENT column.

Discussion

This section provides the basic background on how AUTO_INCREMENT columns work, beginning with a short example that demonstrates the sequence-generation mechanism. The illustration centers around a bug-collection scenario: your son (eight-year-old Junior) is assigned the task of collecting insects for a class project at school. For each insect, Junior is to record its name (“ant,” “bee,” and so forth), and its date and location of collection. You have expounded the benefits of MySQL for record-keeping to Junior since his early days, so upon your arrival home from work that day, he immediately announces the necessity of completing this project and then, looking you straight in the eye, declares that it’s clearly a task for which MySQL is well-suited. Who are you to argue? So the two of you get to work. Junior already collected some specimens after school while waiting for you to come home and has recorded the following information in his notebook:

Name

Date

Origin

millipede

2006-09-10

driveway

housefly

2006-09-10

kitchen

grasshopper

2006-09-10

front yard

stink bug

2006-09-10

front yard

cabbage butterfly

2006-09-10

garden

ant

2006-09-10

back yard

ant

2006-09-10

back yard

millbug

2006-09-10

under rock

Looking over Junior’s notes, you’re pleased to see that even at his tender age, he has learned to write dates in ISO format. However, you also notice that he’s collected a millipede and a millbug, neither of which actually are insects. You decide to let this pass for the moment; Junior forgot to bring home the written instructions for the project, so at this point it’s unclear whether these specimens are acceptable.

As you consider how to create a table to store this information, it’s apparent that you need at least name, date, and origin columns corresponding to the types of information that Junior is required to record:

CREATE TABLE insect

(

name VARCHAR(30) NOT NULL, # type of insect

date DATE NOT NULL, # date collected

origin VARCHAR(30) NOT NULL # where collected

);

However, those columns are not enough to make the table easy to use. Note that the records collected thus far are not unique; both ants were collected at the same time and place. If you put the information into an insect table that has the structure just shown, neither ant row can be referred to individually, because there’s nothing to distinguish them from one another. Unique IDs would be helpful to make the rows distinct and to provide values that make each row easy to refer to. An AUTO_INCREMENT column is good for this purpose, so a better insect table has a structure like this:

CREATE TABLE insect

(

id INT UNSIGNED NOT NULL AUTO_INCREMENT,

PRIMARY KEY (id),

name VARCHAR(30) NOT NULL, # type of insect

date DATE NOT NULL, # date collected

origin VARCHAR(30) NOT NULL # where collected

);

Go ahead and create the insect table using this second definition. In Choosing the Data Type for a Sequence Column, we’ll discuss the specifics of why the id column is declared the way it is.

Now that you have an AUTO_INCREMENT column, you want to use it to generate new sequence values. One of the useful properties of an AUTO_INCREMENT column is that you don’t have to assign its values yourself: MySQL does so for you. There are two ways to generate newAUTO_INCREMENT values, demonstrated here using the id column of the insect table. First, you can explicitly set the id column to NULL. The following statement inserts the first four of Junior’s specimens into the insect table this way:

mysql>INSERT INTO insect (id,name,date,origin) VALUES

-> (NULL,'housefly','2006-09-10','kitchen'),

-> (NULL,'millipede','2006-09-10','driveway'),

-> (NULL,'grasshopper','2006-09-10','front yard'),

-> (NULL,'stink bug','2006-09-10','front yard');

Second, you can omit the id column from the INSERT statement entirely. In MySQL, you can create new rows without explicitly specifying values for columns that have a default value. MySQL assigns the default value to each missing column automatically, and the default for anAUTO_INCREMENT column happens to be the next sequence number. Thus, you can insert rows into the insect table without naming the id column at all. This statement adds Junior’s other four specimens to the insect table that way:

mysql>INSERT INTO insect (name,date,origin) VALUES

-> ('cabbage butterfly','2006-09-10','garden'),

-> ('ant','2006-09-10','back yard'),

-> ('ant','2006-09-10','back yard'),

-> ('millbug','2006-09-10','under rock');

Whichever method you use, MySQL determines the next sequence number for each row and assigns it to the id column, as you can verify for yourself:

mysql>SELECT * FROM insect ORDER BY id;

+----+-------------------+------------+------------+

| id | name | date | origin |

+----+-------------------+------------+------------+

| 1 | housefly | 2006-09-10 | kitchen |

| 2 | millipede | 2006-09-10 | driveway |

| 3 | grasshopper | 2006-09-10 | front yard |

| 4 | stink bug | 2006-09-10 | front yard |

| 5 | cabbage butterfly | 2006-09-10 | garden |

| 6 | ant | 2006-09-10 | back yard |

| 7 | ant | 2006-09-10 | back yard |

| 8 | millbug | 2006-09-10 | under rock |

+----+-------------------+------------+------------+

As Junior collects more specimens, you can add more rows to the table and they’ll be assigned the next values in the sequence (9, 10, ...).

The concept underlying AUTO_INCREMENT columns is simple enough in principle: each time you create a new row, MySQL generates the next number in the sequence and assigns it to the row. But there are certain subtleties to know about, as well as differences in how AUTO_INCREMENTsequences are handled for different storage engines. By being aware of these issues, you can use sequences more effectively and avoid surprises. For example, if you explicitly set the id column to a non-NULL value, one of two things happens:

§ If the value is already present in the table, an error occurs if the column cannot contain duplicates. For the insect table, the id column is a PRIMARY KEY, so duplicates are not allowed:

§ mysql>INSERT INTO insect (id,name,date,origin) VALUES

§ -> (3,'cricket','2006-09-11','basement');

ERROR 1062 (23000): Duplicate entry '3' for key 1

§ If the value is not present in the table, MySQL inserts the row using that value. In addition, if the value is larger than the current sequence counter, the table’s counter is reset to the value plus one. The insect table at this point has sequence values 1 through 8. If you insert a new row with the id column set to 20, that becomes the new maximum value. Subsequent inserts that automatically generate id values will begin at 21. The values 9 through 19 become unused, resulting in a gap in the sequence.

The next recipe looks in more detail at how to define AUTO_INCREMENT columns and how they behave.

Choosing the Data Type for a Sequence Column

Problem

You want to know more about how to define a sequence column.

Solution

Use the guidelines given here.

Discussion

You should follow certain guidelines when creating an AUTO_INCREMENT column. As an illustration, consider how the id column in the insect table was declared:

id INT UNSIGNED NOT NULL AUTO_INCREMENT,

PRIMARY KEY (id)

The AUTO_INCREMENT keyword informs MySQL that it should generate successive sequence numbers for the column’s values, but the other information is important, too:

§ INT is the column’s base data type. You need not necessarily useINT, but the column must be one of the integer types:TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT. It’s important to remember that AUTO_INCREMENT is a column attribute that should be applied only to integer types.

§ The column is declared as UNSIGNED to disallow negative values. This is not a requirement for an AUTO_INCREMENT column. However, there is no reason to allow negative values because sequences consist only of positive integers (normally beginning at 1). Furthermore, not declaring the column to be UNSIGNED cuts the range of your sequence in half. For example, TINYINT has a range of –128 to 127. Sequences include only positive values, so the range of a TINYINT sequence would be 1 to 127. The range of a TINYINT UNSIGNED column is 0 to 255, which increases the upper end of the sequence to 255. The maximum sequence value is determined by the specific integer type used, so you should choose a type that is big enough to hold the largest value you’ll need. The maximum unsigned value of each integer type is shown in the following table, which you can use to select an appropriate type.

Data type

Maximum unsigned value

TINYINT

255

SMALLINT

65,535

MEDIUMINT

16,777,215

INT

4,294,967,295

BIGINT

18,446,744,073,709,551,615

§ Sometimes people omit UNSIGNED so that they can create rows that contain negative numbers in the sequence column. (Using –1 to signify “has no ID” is an instance of this.) This is a bad idea. MySQL makes no guarantees about how negative numbers will be treated in anAUTO_INCREMENT column, so you’re playing with fire if you try to use them. For example, if you resequence the column, you’ll find that all your negative values get turned into positive sequence numbers.

§ AUTO_INCREMENT columns cannot contain NULL values, so id is declared as NOTNULL. (It’s true that you can specify NULL as the column value when you insert a new row, but for an AUTO_INCREMENT column, that really means “generate the next sequence value.”) MySQL automatically defines AUTO_INCREMENT columns as NOTNULL if you forget to.

§ AUTO_INCREMENT columns must be indexed. Normally, because a sequence column exists to provide unique identifiers, you use a PRIMARYKEY or UNIQUE index to enforce uniqueness. Tables can have only one PRIMARYKEY, so if the table already has some other PRIMARYKEY column, you can declare an AUTO_INCREMENT column to have a UNIQUE index instead:

§ id INT UNSIGNED NOT NULL AUTO_INCREMENT,

UNIQUE (id)

If the AUTO_INCREMENT column is the only column in the PRIMARY KEY or UNIQUE index, you can declare it as such in the column definition rather than in a separate clause. For example, these definitions are equivalent:

id INT UNSIGNED NOT NULL AUTO_INCREMENT,

PRIMARY KEY (id)

id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY

As are these:

id INT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE

id INT UNSIGNED NOT NULL AUTO_INCREMENT,

UNIQUE (id)

Using a separate clause to specify the index helps to emphasize that it’s not, strictly speaking, part of the column definition.

When you create a table that contains an AUTO_INCREMENT column, it’s also important to consider which storage engine to use (MyISAM, InnoDB, and so forth). The engine affects behaviors such as reuse of values that are deleted from the top of the sequence and whether you can set the initial sequence value. In general, MyISAM is the best storage engine for tables that contain AUTO_INCREMENT columns because it offers the greatest flexibility for sequence management. This will become apparent in the rest of the chapter.

The Effect of Row Deletions on Sequence Generation

Problem

You want to know what happens to a sequence when you delete rows from a table that contains an AUTO_INCREMENT column.

Solution

It depends on which rows you delete and on the storage engine.

Discussion

We have thus far considered how sequence values in an AUTO_INCREMENT column are generated for circumstances where rows are only added to a table. But it’s unrealistic to assume that rows will never be deleted. What happens to the sequence then?

Refer again to Junior’s bug-collection project, for which you currently have an insect table that looks like this:

mysql>SELECT * FROM insect ORDER BY id;

+----+-------------------+------------+------------+

| id | name | date | origin |

+----+-------------------+------------+------------+

| 1 | housefly | 2006-09-10 | kitchen |

| 2 | millipede | 2006-09-10 | driveway |

| 3 | grasshopper | 2006-09-10 | front yard |

| 4 | stink bug | 2006-09-10 | front yard |

| 5 | cabbage butterfly | 2006-09-10 | garden |

| 6 | ant | 2006-09-10 | back yard |

| 7 | ant | 2006-09-10 | back yard |

| 8 | millbug | 2006-09-10 | under rock |

+----+-------------------+------------+------------+

That’s about to change because after Junior remembers to bring home the written instructions for the project, you read through them and discover two things that affect the insect table’s contents:

§ Specimens should include only insects, not other insect-like creatures such as millipedes and millbugs.

§ The purpose of the project is to collect as many different specimens as possible, not just as many specimens as possible. This means that only one ant row is allowed.

These instructions require that a few rows be removed from the insect table—specifically those with id values 2 (millipede), 8 (millbug), and 7 (duplicate ant). Thus, despite Junior’s evident disappointment at the reduction in the size of his collection, you instruct him to remove those rows by issuing a DELETE statement:

mysql>DELETE FROM insect WHERE id IN (2,8,7);

This statement illustrates one reason why it’s useful to have unique ID values: they enable you to specify any row unambiguously. The ant rows are identical except for the id value. Without that column in the insect table, it would be more difficult to delete just one of them.

After the unsuitable rows have been removed, the resulting table contents become:

mysql>SELECT * FROM insect ORDER BY id;

+----+-------------------+------------+------------+

| id | name | date | origin |

+----+-------------------+------------+------------+

| 1 | housefly | 2006-09-10 | kitchen |

| 3 | grasshopper | 2006-09-10 | front yard |

| 4 | stink bug | 2006-09-10 | front yard |

| 5 | cabbage butterfly | 2006-09-10 | garden |

| 6 | ant | 2006-09-10 | back yard |

+----+-------------------+------------+------------+

The sequence in the id column now has a hole (row 2 is missing) and the values 7 and 8 at the top of the sequence are no longer present. How do these deletions affect future insert operations? What sequence number will the next new row get?

Removing row 2 created a gap in the middle of the sequence. This has no effect on subsequent inserts, because MySQL makes no attempt to fill in holes in a sequence. On the other hand, deleting rows 7 and 8 removes values at the top of the sequence, and the effect of this depends on the storage engine:

§ With BDB tables, the next sequence number always is the maximum integer currently present in the column plus one. If you delete rows containing values at the top of the sequence, those values will be reused. (Thus, after deleting rows with values 7 and 8, the next inserted row will be assigned the value 7.)

§ For MyISAM or InnoDB tables, values are not reused. The next sequence number is the smallest positive integer that has not previously been used. (For a sequence that stands at 8, the next row gets a value of 9 even if you delete rows 7 and 8 first.) If you require strictly monotonic sequences, you should use one of these storage engines.

If a table uses an engine that differs in value-reuse behavior from the behavior you require, use ALTER TABLE to change the table to a more appropriate engine. For example, if you want to change a BDB table to be a MyISAM table (to prevent sequence values from being reused after rows are deleted), do this:

ALTER TABLEtbl_name ENGINE = MyISAM;

If you don’t know what engine a table uses, consult INFORMATION_SCHEMA or use SHOW TABLE STATUS or SHOW CREATE TABLE to find out. For example, the following statement indicates that insect is a MyISAM table:

mysql>SELECT ENGINE FROM INFORMATION_SCHEMA.TABLES

-> WHERE TABLE_SCHEMA = 'cookbook' AND TABLE_NAME = 'insect';

+--------+

| ENGINE |

+--------+

| MyISAM |

+--------+

NOTE

In this chapter, you can assume that if a table’s definition has no explicit storage engine, it’s a MyISAM table.

If you want to clear out a table and reset the sequence counter, use TRUNCATE TABLE:

TRUNCATE TABLEtbl_name;

Retrieving Sequence Values

Problem

After creating a row that includes a new sequence number, you want to find out what that number is.

Solution

Issue a SELECT LAST_INSERT_ID() statement. If you’re writing a program, your MySQL API may provide a way to get the value directly without issuing a statement.

Discussion

Many applications need to determine the AUTO_INCREMENT value of a newly created row. For example, if you get ambitious and write a web-based frontend for entering rows into Junior’s insect table, you might have the application display each new row nicely formatted in a new page immediately after you hit the Submit button. To do this, you need to know the new id value so that you can retrieve the proper row. Another common situation in which the AUTO_INCREMENT value is needed occurs when you’re using multiple tables: after inserting a row in a master table, typically, you’ll need its ID so that you can create rows in other related tables that refer to the master row. (Using AUTO_INCREMENT Values to Relate Tables shows how to relate multiple tables using sequence numbers.)

When you generate a new AUTO_INCREMENT value, you can get the value from the server by issuing a statement that invokes the LAST_INSERT_ID() function. In addition, many MySQL APIs provide a client-side mechanism for making the value available without issuing another statement. This recipe discusses both methods and provides a comparison of their characteristics.

Using LAST_INSERT_ID() to obtain AUTO_INCREMENT values

The obvious (but incorrect) way to determine a new row’s AUTO_INCREMENT value is based on the fact that when MySQL generates the value, it becomes the largest sequence number in the column. Thus, you might try using the MAX() function to retrieve it:

SELECT MAX(id) FROM insect;

This is unreliable because it doesn’t take into account the multithreaded nature of the MySQL server. The SELECT statement does indeed return the maximum id value from the table, but it may not be the value that you generated. Suppose that you insert a row that generates an id value of 9. If another client inserts a row before you issue the SELECT statement, MAX(id) returns 10, not 9. Methods for solving this problem include grouping the INSERT and SELECT statements as a transaction or locking the table, but MySQL provides a LAST_INSERT_ID() function as a simpler way to obtain the proper value. It returns the most recent AUTO_INCREMENT value that you generated during the time you’ve been connected to the server, regardless of what other clients are doing. For example, you can insert a row into the insect table and then retrieve its id value like this:

mysql>INSERT INTO insect (name,date,origin)

-> VALUES('cricket','2006-09-11','basement');

mysql> SELECT LAST_INSERT_ID();

+------------------+

| LAST_INSERT_ID() |

+------------------+

| 9 |

+------------------+

Or you can use the new value to retrieve the entire row, without even knowing what the id is:

mysql>INSERT INTO insect (name,date,origin)

-> VALUES('moth','2006-09-14','windowsill');

mysql> SELECT * FROM insect WHERE id = LAST_INSERT_ID();

+----+------+------------+------------+

| id | name | date | origin |

+----+------+------------+------------+

| 10 | moth | 2006-09-14 | windowsill |

+----+------+------------+------------+

CAN OTHER CLIENTS CHANGE THE VALUE RETURNED BY LAST_INSERT_ID()?

You might ask this question if you’re concerned about the possibility of getting the wrong value from LAST_INSERT_ID() when other clients happen to generate AUTO_INCREMENT values at about the same time you do. There’s nothing to worry about. The value returned by LAST_INSERT_ID() is maintained by the server on a connection-specific basis. This property is important because it prevents clients from interfering with each other. When you generate an AUTO_INCREMENT value, LAST_INSERT_ID() returns that specific value, even when other clients generate new rows in the same table in the meantime. This behavior is by design.

Using API-specific methods to obtain AUTO_INCREMENT values

LAST_INSERT_ID() is an SQL function, so you can use it from within any client that understands how to issue SQL statements. On the other hand, you do have to issue a separate statement to get its value. If you’re writing your own programs, you may have another choice. Many MySQL interfaces include an API-specific extension that returns the AUTO_INCREMENT value without issuing another statement. Most of our APIs have this capability.

Perl

Use the mysql_insertid attribute to obtain the AUTO_INCREMENT value generated by a statement. This attribute is accessed through either a database handle or a statement handle, depending on how you issue the statement. The following example references it through the database handle:

$dbh->do ("INSERT INTO insect (name,date,origin)

VALUES('moth','2006-09-14','windowsill')");

my $seq = $dbh->{mysql_insertid};

If you’re using prepare() and execute(), access mysql_insertid as a statement handle attribute:

my $sth = $dbh->prepare ("INSERT INTO insect (name,date,origin)

VALUES('moth','2006-09-14','windowsill')");

$sth->execute ();

my $seq = $sth->{mysql_insertid};

If you find that the value of the mysql_insertid attribute is always undefined or zero, you probably have an old version of DBD::mysql that doesn’t support it. Try using the insertid attribute instead. (insertid is available only as a database handle attribute.)

Ruby

The Ruby DBI driver for MySQL exposes the client-side AUTO_INCREMENT value using the database handle func method that returns driver-specific values:

dbh.do("INSERT INTO insect (name,date,origin)

VALUES('moth','2006-09-14','windowsill')")

seq = dbh.func(:insert_id)

PHP

The native PHP interface for MySQL includes a function that returns the most recent AUTO_INCREMENT value, but the PEAR DB interface does not. On the other hand, PEAR DB does have its own sequence generation mechanism that you can use instead. See the PEAR documentation for details.

Python

The MySQLdb driver for DB-API provides an insert_id() connection object method for getting the sequence value after you execute a statement that generates an AUTO_INCREMENT value:

cursor = conn.cursor ()

cursor.execute ("""

INSERT INTO insect (name,date,origin)

VALUES('moth','2006-09-14','windowsill')

""")

seq = conn.insert_id ()

Java

The MySQL Connector/J JDBC driver provides a getLastInsertID() method for obtaining AUTO_INCREMENT values. It can be used with either Statement or PreparedStatement objects. This example uses a Statement:

Statement s = conn.createStatement ();

s.executeUpdate ("INSERT INTO insect (name,date,origin)"

+ " VALUES('moth','2006-09-14','windowsill')");

long seq = ((com.mysql.jdbc.Statement) s).getLastInsertID ();

s.close ();

Note that because getLastInsertID() is driver-specific, you access it by casting the Statement object to the com.mysql.jdbc.Statement type. If you’re using a PreparedStatement object, cast it to the com.mysql.jdbc.PreparedStatement type instead:

PreparedStatement s = conn.prepareStatement (

"INSERT INTO insect (name,date,origin)"

+ " VALUES('moth','2006-09-14','windowsill')");

s.executeUpdate ();

long seq = ((com.mysql.jdbc.PreparedStatement) s).getLastInsertID ();

s.close ();

Server-side and client-side sequence value retrieval compared

As mentioned earlier, the value of LAST_INSERT_ID() is maintained on a connection-specific basis on the server side of the MySQL connection. By contrast, the API-specific methods for accessing AUTO_INCREMENT values directly are implemented on the client side. Server-side and client-side sequence value retrieval methods have some similarities, but also some differences.

All methods, both server-side and client-side, require that you must access the AUTO_INCREMENT value using the same MySQL connection that was used to generate the value in the first place. If you generate an AUTO_INCREMENT value, and then disconnect from the server and reconnect before attempting to access the value, you’ll get zero. Within a given connection, the persistence of AUTO_INCREMENT values can be much longer on the server side of the connection:

§ After you issue a statement that generates an AUTO_INCREMENT value, the value remains available through LAST_INSERT_ID() even if you issue other statements, as long as none of those statements generate an AUTO_INCREMENT value.

§ The sequence value available on the client side typically is set for every statement, not just those that generate AUTO_INCREMENT values. If you issue an INSERT statement that generates a new value and then issue some other statement before accessing the client-side sequence value, it probably will have been set to zero. The precise behavior varies among APIs, but if you use the following general guideline, you should be safe: when a statement generates a sequence value that you won’t be using immediately, save the value in a variable that you can refer to later. Otherwise, you may find that the sequence value has been wiped out when you do try to access it.

Renumbering an Existing Sequence

Problem

You have gaps in a sequence column, and you want to resequence it.

Solution

Don’t bother. Or at least don’t do so without a good reason, of which there are very few.

Discussion

If you insert rows into a table that has an AUTO_INCREMENT column and never delete any of them, values in the column form an unbroken sequence. But if you delete rows, the sequence begins to have holes in it. For example, Junior’s insect table currently looks something like this, with gaps in the sequence (assuming that you’ve inserted the cricket and moth rows shown in the preceding section on retrieving sequence values):

mysql>SELECT * FROM insect ORDER BY id;

+----+-------------------+------------+------------+

| id | name | date | origin |

+----+-------------------+------------+------------+

| 1 | housefly | 2006-09-10 | kitchen |

| 3 | grasshopper | 2006-09-10 | front yard |

| 4 | stink bug | 2006-09-10 | front yard |

| 5 | cabbage butterfly | 2006-09-10 | garden |

| 6 | ant | 2006-09-10 | back yard |

| 9 | cricket | 2006-09-11 | basement |

| 10 | moth | 2006-09-14 | windowsill |

+----+-------------------+------------+------------+

MySQL won’t attempt to eliminate these gaps by filling in the unused values when you insert new rows. People who don’t like this behavior tend to resequence AUTO_INCREMENT columns periodically to eliminate the holes. The next few recipes show how to do that. It’s also possible to extend the range of an existing sequence, add a sequence column to a table that doesn’t currently have one, force deleted values at the top of a sequence to be reused, or specify an initial sequence value when creating or resequencing a table.

Before you decide to resequence an AUTO_INCREMENT column, consider whether you really want or need to do so. It’s unnecessary in most cases. In fact, renumbering a sequence sometimes can cause you real problems. For example, you should not resequence a column containing values that are referenced by another table. Renumbering the values destroys their correspondence to values in the other table, making it impossible to properly relate rows in the two tables to each other.

Here are reasons that I have seen advanced for resequencing a column:

Aesthetics

Sometimes the desire to renumber a column is for aesthetic reasons. People seem to prefer unbroken sequences to sequences with holes in them. If this is why you want to resequence, there’s probably not much I can say to convince you otherwise. Nevertheless, it’s not a particularly good reason.

Performance

The impetus for resequencing may stem from the notion that doing so “compacts” a sequence column by removing gaps and enables MySQL to run statements more quickly. This is not true. MySQL doesn’t care whether there are holes, and there is no performance gain to be had by renumbering an AUTO_INCREMENT column. In fact, resequencing affects performance negatively in the sense that the table remains locked while MySQL performs the operation—which may take a nontrivial amount of time for a large table. Other clients can read from the table while this is happening, but clients that are trying to insert new rows must wait until the operation is complete.

Running out of numbers

The upper limit of a sequence column is determined by the column’s data type. If an AUTO_INCREMENT sequence is approaching the upper limit of its data type, renumbering packs the sequence and frees up more values at the top. This may be a legitimate reason to resequence a column, but it is still unnecessary in many cases to do so. You may be able to expand the column’s range to increase its upper limit without changing the values stored in the column. (See Extending the Range of a Sequence Column.)

If you’re determined to resequence a column, despite my advice not to, it’s easy to do: drop the column from the table; then put it back. MySQL will renumber the values in the column in unbroken sequence. The following example shows how to renumber the id values in the insect table using this technique:

mysql>ALTER TABLE insect DROP id;

mysql> ALTER TABLE insect

-> ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT FIRST,

-> ADD PRIMARY KEY (id);

The first ALTER TABLE statement gets rid of the id column (and as a result also drops the PRIMARY KEY, because the column to which it refers is no longer present). The second statement restores the column to the table and establishes it as the PRIMARY KEY. (The FIRST keyword places the column first in the table, which is where it was originally. Normally, ADD puts columns at the end of the table.) When you add an AUTO_INCREMENT column to a table, MySQL automatically numbers all the rows consecutively, so the resulting contents of the insect table look like this:

mysql>SELECT * FROM insect ORDER BY id;

+----+-------------------+------------+------------+

| id | name | date | origin |

+----+-------------------+------------+------------+

| 1 | housefly | 2006-09-10 | kitchen |

| 2 | grasshopper | 2006-09-10 | front yard |

| 3 | stink bug | 2006-09-10 | front yard |

| 4 | cabbage butterfly | 2006-09-10 | garden |

| 5 | ant | 2006-09-10 | back yard |

| 6 | cricket | 2006-09-11 | basement |

| 7 | moth | 2006-09-14 | windowsill |

+----+-------------------+------------+------------+

One problem with resequencing a column using separate ALTER TABLE statements is that the table will be without that column for the interval between the two operations. This might cause difficulties for other clients that try to access the table during that time. To prevent this from happening, perform both operations with a single ALTER TABLE statement:

mysql>ALTER TABLE insect

-> DROP id,

-> ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT FIRST;

MySQL permits multiple actions to be done with ALTER TABLE (something not true for all database systems). However, notice that this multiple-action statement is not simply a concatenation of the two single-action ALTER TABLE statements. The difference is that it is unnecessary to reestablish the PRIMARY KEY: MySQL doesn’t drop it unless the indexed column is missing after all the actions specified in the ALTER TABLE statement have been performed.

Extending the Range of a Sequence Column

Problem

You want to avoid resequencing a column, but you’re running out of room for new sequence numbers.

Solution

Check whether you can make the column UNSIGNED , or change the column to use a larger integer type.

Discussion

Resequencing an AUTO_INCREMENT column changes the contents of potentially every row in the table. It’s often possible to avoid this by extending the range of the column, which changes the table’s structure rather than its contents:

§ If the data type is signed, make it UNSIGNED, and you’ll double the range of available values. Suppose that you have an id column that currently is defined like this:

id MEDIUMINT NOT NULL AUTO_INCREMENT

The upper range of a signed MEDIUMINT column is 8,388,607. This can be increased to 16,777,215 by making the column UNSIGNED with ALTER TABLE:

ALTER TABLEtbl_name MODIFY id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT;

§ If your column is already UNSIGNED and it is not already the largest integer type (BIGINT), converting it to a larger type increases its range. You can use ALTER TABLE for this, too. For example, the id column in the previous example can be converted from MEDIUMINT to BIGINT like so:

ALTER TABLEtbl_name MODIFY id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT;

Choosing the Data Type for a Sequence Column includes a table that shows the ranges for each integer data type. You might find it helpful in assessing which type to use.

Reusing Values at the Top of a Sequence

Problem

You’ve deleted rows at the top end of your sequence. Can you avoid resequencing the column but still reuse the values that have been deleted?

Solution

Yes, use ALTER TABLE to reset the sequence counter. MySQL will generate new sequence numbers beginning with the value that is one larger than the current maximum in the table.

Discussion

If you have removed rows only from the top of the sequence, those that remain will still be in order with no gaps. (For example, if you have rows numbered 1 to 100 and you remove the rows with numbers 91 to 100, the remaining rows are still in unbroken sequence from 1 to 90.) In this special case, it’s unnecessary to renumber the column. Instead, just tell MySQL to resume the sequence beginning with the value one larger that the highest existing sequence number. For BDB tables, that’s the default behavior anyway, so the deleted values are reused with no additional action on your part. For MyISAM or InnoDB tables, issue the following statement:

ALTER TABLEtbl_name AUTO_INCREMENT = 1;

This causes MySQL to reset the sequence counter down as far as it can for creating new rows in the future.

You can use ALTER TABLE to reset the sequence counter if a sequence column contains gaps in the middle, but doing so still will reuse only values deleted from the top of the sequence. It will not eliminate the gaps. Suppose that you have a table with sequence values from 1 to 10 and then delete the rows for values 3, 4, 5, 9, and 10. The maximum remaining value is 8, so if you use ALTER TABLE to reset the sequence counter, the next row will be given a value of 9, not 3. To resequence a table and eliminate the gaps as well, see Renumbering an Existing Sequence.

Ensuring That Rows Are Renumbered in a Particular Order

Problem

You resequenced a column, but MySQL didn’t number the rows the way you want.

Solution

Select the rows into another table, using an ORDER BY clause to place them in the order you want, and let MySQL number them as it performs the operation. Then the rows will be numbered according to the sort order.

Discussion

When you resequence an AUTO_INCREMENT column, MySQL is free to pick the rows from the table in any order, so it won’t necessarily renumber them in the order that you expect. This doesn’t matter at all if your only requirement is that each row have a unique identifier. But you might have an application for which it’s important that the rows be assigned sequence numbers in a particular order. For example, you may want the sequence to correspond to the order in which rows were created, as indicated by a TIMESTAMP column. To assign numbers in a particular order, use this procedure:

1. Create an empty clone of the table (see Cloning a Table).

2. Copy rows from the original into the clone using INSERT INTO ... SELECT. Copy all columns except the sequence column, using an ORDER BY clause to specify the order in which rows are copied (and thus assigned sequence numbers).

3. Drop the original table and rename the clone to have the original table’s name.

4. If the table is a large MyISAM table and has multiple indexes, it will be more efficient to create the new table initially with no indexes except the one on the AUTO_INCREMENT column. Then copy the original table into the new table and add the remaining indexes afterward.

An alternative procedure:

1. Create a new table that contains all the columns of the original table except the AUTO_INCREMENT column.

2. Use INSERT INTO ... SELECT to copy the non-AUTO_INCREMENT columns from the original table into the new table.

3. Delete the rows from the original table, and reset the sequence counter to 1 if necessary.

4. Copy rows from the new table back to the original table, using an ORDER BY clause to sort rows into the order in which you want sequence numbers assigned. MyISAM will assign sequence values to the AUTO_INCREMENT column.

Starting a Sequence at a Particular Value

Problem

Sequences start at 1, but you want to use a different starting value.

Solution

Add an AUTO_INCREMENT clause to your CREATE TABLE statement when you create the table. If the table has already been created, use an ALTER TABLE statement to set the starting value.

Discussion

By default, AUTO_INCREMENT sequences start at one:

mysql>CREATE TABLE t

-> (id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (id));

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> SELECT id FROM t ORDER BY id;

+----+

| id |

+----+

| 1 |

| 2 |

| 3 |

+----+

For MyISAM or InnoDB tables, you can begin the sequence at a specific initial value n by including an AUTO_INCREMENT = n clause at the end of the CREATE TABLE statement:

mysql>CREATE TABLE t

-> (id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (id))

-> AUTO_INCREMENT = 100;

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> SELECT id FROM t ORDER BY id;

+-----+

| id |

+-----+

| 100 |

| 101 |

| 102 |

+-----+

Alternatively, you can create the table and then set the initial sequence value with ALTER TABLE:

mysql>CREATE TABLE t

-> (id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (id));

mysql> ALTER TABLE t AUTO_INCREMENT = 100;

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> SELECT id FROM t ORDER BY id;

+-----+

| id |

+-----+

| 100 |

| 101 |

| 102 |

+-----+

To start a sequence at n for storage engines other than MyISAM or InnoDB, you can use a trick: insert a “fake” row with sequence value n -1, and then delete it after inserting one or more “real” rows. The following example illustrates how to start a sequence at 100 for a BDB table:

mysql>CREATE TABLE t

-> (id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (id))

-> ENGINE = BDB;

mysql> INSERT INTO t (id) VALUES(99);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> INSERT INTO t (id) VALUES(NULL);

mysql> DELETE FROM t WHERE id = 99;

mysql> SELECT * FROM t ORDER BY id;

+-----+

| id |

+-----+

| 100 |

| 101 |

| 102 |

+-----+

Remember that if you empty a table completely with TRUNCATE TABLE, the sequence may be reset to begin with 1, even for storage engines that normally do not reuse sequence values (The Effect of Row Deletions on Sequence Generation). In this case, you should reinitialize the sequence value explicitly after clearing the table if you don’t want it to begin with 1.

Sequencing an Unsequenced Table

Problem

You forgot to include a sequence column when you created a table. Is it too late to sequence the table rows?

Solution

No, just add an AUTO_INCREMENT column using ALTER TABLE. MySQL will create the column and number the rows automatically.

Discussion

To add a sequence to a table that doesn’t currently contain one, use ALTER TABLE to create an AUTO_INCREMENT column. Suppose that you have a table t that contains name and age columns, but no sequence column:

+----------+------+

| name | age |

+----------+------+

| boris | 47 |

| clarence | 62 |

| abner | 53 |

+----------+------+

You can add a sequence column named id to the table as follows:

mysql>ALTER TABLE t

-> ADD id INT NOT NULL AUTO_INCREMENT,

-> ADD PRIMARY KEY (id);

mysql> SELECT * FROM t ORDER BY id;

+----------+------+----+

| name | age | id |

+----------+------+----+

| boris | 47 | 1 |

| clarence | 62 | 2 |

| abner | 53 | 3 |

+----------+------+----+

MySQL numbers the rows for you automatically. It’s not necessary to assign the values yourself. Very handy.

By default, ALTER TABLE adds new columns to the end of the table. To place a column at a specific position, use FIRST or AFTER at the end of the ADD clause. The following ALTER TABLE statements are similar to the one just shown, but place the id column first in the table or after thename column, respectively:

ALTER TABLE t

ADD id INT NOT NULL AUTO_INCREMENT FIRST,

ADD PRIMARY KEY (id);

ALTER TABLE t

ADD id INT NOT NULL AUTO_INCREMENT AFTER name,

ADD PRIMARY KEY (id);

For MyISAM or InnoDB tables, you can specify the initial value for a new sequence column by including an AUTO_INCREMENT = n clause in the ALTER TABLE statement:

mysql>ALTER TABLE t

-> ADD id INT NOT NULL AUTO_INCREMENT FIRST,

-> ADD PRIMARY KEY (id),

-> AUTO_INCREMENT = 100;

mysql> SELECT * FROM t ORDER BY id;

+-----+----------+------+

| id | name | age |

+-----+----------+------+

| 100 | boris | 47 |

| 101 | clarence | 62 |

| 102 | abner | 53 |

+-----+----------+------+

Using an AUTO_INCREMENT Column to Create Multiple Sequences

Problem

You need to have sequencing behavior that is more complex than a single sequence of values. You need to tie different sequences to the values in other columns of the table.

Solution

Link the AUTO_INCREMENT column to those other columns, making them all part of the same index.

Discussion

When an AUTO_INCREMENT column is the only column in a PRIMARY KEY or UNIQUE index, it generates a single sequence 1, 2, 3, ... in which successive values increase by one each time you add a row, regardless of the contents of the rest of the row. For MyISAM or BDB tables, it’s possible to create an index that combines an AUTO_INCREMENT column with other columns to generate multiple sequences within a single table.

Here’s how it works: let’s say that Junior develops such a passion for bug collecting that he decides to keep it up even after the school project has been completed—except that when freed from the constraints of the teacher’s instructions, he’s perfectly content to include insect-like bugs such as millipedes, and even to collect multiple instances of any given creature. Junior happily goes outside and collects more specimens over the next few days:

Name

Date

Origin

ant

2006-10-07

kitchen

millipede

2006-10-07

basement

beetle

2006-10-07

basement

ant

2006-10-07

front yard

ant

2006-10-07

front yard

honeybee

2006-10-08

back yard

cricket

2006-10-08

garage

beetle

2006-10-08

front yard

termite

2006-10-09

kitchen woodwork

cricket

2006-10-10

basement

termite

2006-10-11

bathroom woodwork

honeybee

2006-10-11

garden

cricket

2006-10-11

garden

ant

2006-10-11

garden

After recording this information, he’s ready to enter it into the database but wants to number each kind of bug separately (ant 1, ant 2, ..., beetle 1, beetle 2, ..., cricket 1, cricket 2, and so forth). To that end, you look over the data (noting with some alarm Junior’s discovery of termites in the house and making a mental note to call the exterminator), and then design a bug table for Junior that looks like this:

CREATE TABLE bug

(

id INT UNSIGNED NOT NULL AUTO_INCREMENT,

name VARCHAR(30) NOT NULL, # type of bug

date DATE NOT NULL, # date collected

origin VARCHAR(30) NOT NULL, # where collected

PRIMARY KEY (name, id)

);

This is very similar to the insect table, but has one significant difference: the PRIMARY KEY comprises two columns, not one. As a result, the id column will behave somewhat differently than for the insect table. If the new set of specimens is entered into the bug table in the order in which Junior wrote them down, here’s what the resulting table looks like:

mysql>SELECT * FROM bug;

+----+-----------+------------+-------------------+

| id | name | date | origin |

+----+-----------+------------+-------------------+

| 1 | ant | 2006-10-07 | kitchen |

| 1 | millipede | 2006-10-07 | basement |

| 1 | beetle | 2006-10-07 | basement |

| 2 | ant | 2006-10-07 | front yard |

| 3 | ant | 2006-10-07 | front yard |

| 1 | honeybee | 2006-10-08 | back yard |

| 1 | cricket | 2006-10-08 | garage |

| 2 | beetle | 2006-10-08 | front yard |

| 1 | termite | 2006-10-09 | kitchen woodwork |

| 2 | cricket | 2006-10-10 | basement |

| 2 | termite | 2006-10-11 | bathroom woodwork |

| 2 | honeybee | 2006-10-11 | garden |

| 3 | cricket | 2006-10-11 | garden |

| 4 | ant | 2006-10-11 | garden |

+----+-----------+------------+-------------------+

Looking at the table that way, it appears that the id values are being assigned at random—but they’re not. Sort the table by name and id, and it becomes clear how MySQL assigns the values. Specifically, MySQL creates a separate id sequence for each distinct name value:

mysql>SELECT * FROM bug ORDER BY name, id;

+----+-----------+------------+-------------------+

| id | name | date | origin |

+----+-----------+------------+-------------------+

| 1 | ant | 2006-10-07 | kitchen |

| 2 | ant | 2006-10-07 | front yard |

| 3 | ant | 2006-10-07 | front yard |

| 4 | ant | 2006-10-11 | garden |

| 1 | beetle | 2006-10-07 | basement |

| 2 | beetle | 2006-10-08 | front yard |

| 1 | cricket | 2006-10-08 | garage |

| 2 | cricket | 2006-10-10 | basement |

| 3 | cricket | 2006-10-11 | garden |

| 1 | honeybee | 2006-10-08 | back yard |

| 2 | honeybee | 2006-10-11 | garden |

| 1 | millipede | 2006-10-07 | basement |

| 1 | termite | 2006-10-09 | kitchen woodwork |

| 2 | termite | 2006-10-11 | bathroom woodwork |

+----+-----------+------------+-------------------+

When you create a multiple-column AUTO_INCREMENT index, note the following points:

§ The order in which the CREATE TABLE statement defines the indexed columns does not matter. What is significant is the order in which the index definition names the columns. The AUTO_INCREMENT column must be named last, or the multiple-sequence mechanism will not work.

§ A PRIMARY KEY cannot contain NULL values, but a UNIQUE index can. If any of the non-AUTO_INCREMENT columns to be indexed might contain NULL values, you should create a UNIQUE index rather than a PRIMARY KEY.

For the bug table, the AUTO_INCREMENT index has two columns. The same technique can be extended to more than two columns, but the basic concept is the same: for an n-column index where the last one is an AUTO_INCREMENT column, MySQL generates an independent sequence for each unique combination of values in the non-AUTO_INCREMENT columns.

MySQL’s mechanism for multiple-column sequences can be easier to use than logically equivalent single-column values. Recall that in Sorting by Fixed-Length Substrings, we used a housewares table that contained rows with three-part product ID values composed of a three-character category abbreviation, a five-digit serial number, and a two-character code indicating country of manufacture:

+------------+------------------+

| id | description |

+------------+------------------+

| DIN40672US | dining table |

| KIT00372UK | garbage disposal |

| KIT01729JP | microwave oven |

| BED00038SG | bedside lamp |

| BTH00485US | shower stall |

| BTH00415JP | lavatory |

+------------+------------------+

The table was used in that chapter to demonstrate how to break apart the id values into their constituent parts and sort them separately, using LEFT(), MID(), and RIGHT(). That led to some fairly ugly ORDER BY clauses, and an issue that I didn’t even bring up in that chapter was the question of just how to generate the serial numbers in the middle of the values.

Sometimes you can replace this kind of multiple-part column with separate columns that are tied together as an AUTO_INCREMENT index. For example, another way to manage houseware id values like this is to represent them using category, serial, and country columns and tie them together in a PRIMARY KEY with the serial number as an AUTO_INCREMENT column. This causes serial numbers to increment independently for each combination of category and country. To create the table from scratch, you’d write the CREATE TABLE statement like this:

CREATE TABLE housewares

(

category VARCHAR(3) NOT NULL,

serial INT UNSIGNED NOT NULL AUTO_INCREMENT,

country VARCHAR(2) NOT NULL,

description VARCHAR(255),

PRIMARY KEY (category, country, serial)

);

Alternatively, assuming you have the original housewares table already created in the form used in the earlier chapter, you can convert it to the new structure “in place” as follows:

mysql>ALTER TABLE housewares

-> ADD category VARCHAR(3) NOT NULL FIRST,

-> ADD serial INT UNSIGNED NOT NULL AUTO_INCREMENT AFTER category,

-> ADD country VARCHAR(2) NOT NULL AFTER serial,

-> ADD PRIMARY KEY (category, country, serial);

mysql> UPDATE housewares SET category = LEFT(id,3);

mysql> UPDATE housewares SET serial = MID(id,4,5);

mysql> UPDATE housewares SET country = RIGHT(id,2);

mysql> ALTER TABLE housewares DROP id;

mysql> SELECT * FROM housewares;

+----------+--------+---------+------------------+

| category | serial | country | description |

+----------+--------+---------+------------------+

| DIN | 40672 | US | dining table |

| KIT | 372 | UK | garbage disposal |

| KIT | 1729 | JP | microwave oven |

| BED | 38 | SG | bedside lamp |

| BTH | 485 | US | shower stall |

| BTH | 415 | JP | lavatory |

+----------+--------+---------+------------------+

With the id values split into their separate parts, sorting operations become easier to specify because you can refer to individual columns directly rather than by pulling out substrings of the original id column. You can also make sorting more efficient by adding additional indexes for theserial and country columns. But a problem remains: how to display each product ID as a single string rather than as three separate values? Do that with CONCAT():

mysql>SELECT category, serial, country,

-> CONCAT(category,LPAD(serial,5,'0'),country) AS id

-> FROM housewares ORDER BY category, country, serial;

+----------+--------+---------+------------+

| category | serial | country | id |

+----------+--------+---------+------------+

| BED | 38 | SG | BED00038SG |

| BTH | 415 | JP | BTH00415JP |

| BTH | 485 | US | BTH00485US |

| DIN | 40672 | US | DIN40672US |

| KIT | 1729 | JP | KIT01729JP |

| KIT | 372 | UK | KIT00372UK |

+----------+--------+---------+------------+

You can even eliminate the need for LPAD() by declaring serial to be a zero-filled column for which values are displayed using five digits:

mysql>ALTER TABLE housewares

-> MODIFY serial INT(5) UNSIGNED ZEROFILL NOT NULL AUTO_INCREMENT;

Then MySQL supplies the leading zeros automatically, and the CONCAT() expression becomes simpler:

mysql>SELECT category, serial, country,

-> CONCAT(category,serial,country) AS id

-> FROM housewares ORDER BY category, country, serial;

+----------+--------+---------+------------+

| category | serial | country | id |

+----------+--------+---------+------------+

| BED | 00038 | SG | BED00038SG |

| BTH | 00415 | JP | BTH00415JP |

| BTH | 00485 | US | BTH00485US |

| DIN | 40672 | US | DIN40672US |

| KIT | 01729 | JP | KIT01729JP |

| KIT | 00372 | UK | KIT00372UK |

+----------+--------+---------+------------+

This example illustrates an important principle: you might think about values one way (id values as single strings), but that doesn’t mean you must necessarily represent them in the database that way. If an alternative representation (separate columns) is more efficient or easier to work with, it may well be worth using—even if you must reformat the underlying columns for display purposes to give them the appearance people expect.

If formatting multiple column values into an identifier involves complex calculations or you simply want to hide the details from applications, define a stored function that takes the relevant column values as arguments and returns the identifier. For example:

CREATE FUNCTION houseware_id(category VARCHAR(3),

serial INT UNSIGNED,

country VARCHAR(2))

RETURNS VARCHAR(10) DETERMINISTIC

RETURN CONCAT(category,LPAD(serial,5,'0'),country);

Use the function as follows. The result is the same as before, but the caller need not know how the identifiers are constructed:

mysql>SELECT category, serial, country,

-> houseware_id(category,serial,country) AS id

-> FROM housewares;

+----------+--------+---------+------------+

| category | serial | country | id |

+----------+--------+---------+------------+

| BED | 38 | SG | BED00038SG |

| BTH | 415 | JP | BTH00415JP |

| BTH | 485 | US | BTH00485US |

| DIN | 40672 | US | DIN40672US |

| KIT | 1729 | JP | KIT01729JP |

| KIT | 372 | UK | KIT00372UK |

+----------+--------+---------+------------+

For more information about writing stored functions, see Chapter 16.

Managing Multiple Simultaneous AUTO_INCREMENT Values

Problem

You’re working with two or more tables that contain AUTO_INCREMENT columns, and you’re having a hard time keeping track of the sequence values generated for each table.

Solution

Save the values in user-defined variables for later. If you’re using statements from within a program, save the sequence values in program variables. Alternatively, you might be able to issue the statements using separate connection or statement objects to keep them from getting mixed up.

Discussion

As described in Retrieving Sequence Values, the LAST_INSERT_ID() server-side sequence value indicator function is set each time a statement generates an AUTO_INCREMENT value, whereas client-side sequence indicators may be reset for every statement. What if you issue a statement that generates an AUTO_INCREMENT value, but you don’t want to refer to that value until after issuing a second statement that also generates an AUTO_INCREMENT value? In this case, the original value no longer will be accessible, either through LAST_INSERT_ID() or as a client-side value. To retain access to it, you should save the value first before issuing the second statement. There are several ways to do this:

§ At the SQL level, you can save the value in a user-defined variable after issuing a statement that generates an AUTO_INCREMENT value:

§ INSERT INTOtbl_name (id,...) VALUES(NULL,...);

SET @saved_id = LAST_INSERT_ID();

Then you can issue other statements without regard to their effect on LAST_INSERT_ID(). To use the original AUTO_INCREMENT value in a subsequent statement, refer to the @saved_id variable.

§ At the API level, you can save the AUTO_INCREMENT value in an API language variable. This can be done either by saving the value returned from LAST_INSERT_ID() or from any API-specific extension that might be available.

§ A third technique can be used from within APIs that enables you to maintain separate client-side AUTO_INCREMENT values. For example, statement handles in Perl have a mysql_insertid attribute, and the attribute value for one handle is unaffected by activity on another. In Java, use separate Statement or PreparedStatement objects.

Using AUTO_INCREMENT Values to Relate Tables

Problem

You’re using sequence values from one table as keys in a second table so that you can relate rows in the two tables to each other. But the associations aren’t being set up properly.

Solution

You’re probably not inserting rows in the proper order, or you’re losing track of the sequence values. Change the insertion order, or save the sequence values so that you can refer to them when you need them.

Discussion

Be careful with AUTO_INCREMENT values that are used to generate ID values in a master table if you also store those values in detail table rows for the purpose of linking the detail rows to the proper master table row. This kind of situation is quite common. Suppose that you have aninvoice table listing invoice information for customer orders, and an inv_item table listing the individual items associated with each invoice. Here, invoice is the master table and inv_item is the detail table. To uniquely identify each order, the invoice table could contain anAUTO_INCREMENT column inv_id. You’d also store the appropriate invoice number in each inv_item table row so that you can tell which invoice it goes with. The tables might look something like this:

CREATE TABLE invoice

(

inv_id INT UNSIGNED NOT NULL AUTO_INCREMENT,

PRIMARY KEY (inv_id),

date DATE NOT NULL

# ... other columns could go here

# ... (customer ID, shipping address, etc.)

);

CREATE TABLE inv_item

(

inv_id INT UNSIGNED NOT NULL, # invoice ID (from invoice table)

INDEX (inv_id),

qty INT, # quantity

description VARCHAR(40) # description

);

For these kinds of table relationships, it’s typical to insert a row into the master table first (to generate the AUTO_INCREMENT value that identifies the row), and then insert the detail rows using LAST_INSERT_ID() to obtain the master row ID. For example, if a customer buys a hammer, three boxes of nails, and (in anticipation of finger-bashing with the hammer) a dozen bandages, the rows pertaining to the order can be inserted into the two tables like so:

INSERT INTO invoice (inv_id,date)

VALUES(NULL,CURDATE());

INSERT INTO inv_item (inv_id,qty,description)

VALUES(LAST_INSERT_ID(),1,'hammer');

INSERT INTO inv_item (inv_id,qty,description)

VALUES(LAST_INSERT_ID(),3,'nails, box');

INSERT INTO inv_item (inv_id,qty,description)

VALUES(LAST_INSERT_ID(),12,'bandage');

The first INSERT adds a row to the invoice master table and generates a new AUTO_INCREMENT value for its inv_id column. The following INSERT statements each add a row to the inv_item detail table, using LAST_INSERT_ID() to get the invoice number. This associates the detail rows with the proper master row.

What if you need to process multiple invoices? There’s a right way and a wrong way to enter the information. The right way is to insert all the information for the first invoice and then proceed to the next. The wrong way is to add all the master rows into the invoice table and then add all the detail rows to the inv_item table. If you do that, all the detail rows in the inv_item table will contain the AUTO_INCREMENT value from the most recently entered invoice row. Thus, all will appear to be part of the same invoice, and rows in the two tables won’t have the proper associations.

If the detail table contains its own AUTO_INCREMENT column, you must be even more careful about how you add rows to the tables. Suppose that you want to number the rows in the inv_item table sequentially for each order. The way to do that is to create a multiple-columnAUTO_INCREMENT index that generates a separate sequence for the items in each invoice. (Using an AUTO_INCREMENT Column to Create Multiple Sequences discusses this type of index.) Create the inv_item table as follows, using a PRIMARY KEY that combines the inv_id column with an AUTO_INCREMENT column, seq:

CREATE TABLE inv_item

(

inv_id INT UNSIGNED NOT NULL, # invoice ID (from invoice table)

seq INT UNSIGNED NOT NULL AUTO_INCREMENT,

PRIMARY KEY (inv_id, seq),

qty INT, # quantity

description VARCHAR(40) # description

);

The inv_id column enables each inv_item row to be associated with the proper invoice table row, just as with the original table structure. In addition, the index causes the seq values for the items in each invoice to be numbered sequentially starting at 1. However, now that both tables contain an AUTO_INCREMENT column, you cannot enter information for an invoice the same way as before. To see why it doesn’t work, try it:

INSERT INTO invoice (inv_id,date)

VALUES(NULL,CURDATE());

INSERT INTO inv_item (inv_id,qty,description)

VALUES(LAST_INSERT_ID(),1,'hammer');

INSERT INTO inv_item (inv_id,qty,description)

VALUES(LAST_INSERT_ID(),3,'nails, box');

INSERT INTO inv_item (inv_id,qty,description)

VALUES(LAST_INSERT_ID(),12,'bandage');

These statements are the same as before, but now behave somewhat differently due to the change in the inv_item table structure. The INSERT into the invoice table works properly. So does the first INSERT into the inv_item table; LAST_INSERT_ID() returns the inv_id value from the master row in the invoice table. However, this INSERT also generates its own AUTO_INCREMENT value (for the seq column), which changes the value of LAST_INSERT_ID() and causes the master row inv_id value to be “lost.” The result is that subsequent inserts into the inv_itemstore the preceding row’s seq value into the inv_id column. This causes the second and following rows to have incorrect inv_id values.

To avoid this difficulty, save the sequence value generated by the insert into the master table and use the saved value for the inserts into the detail table. To save the value, you can use a user-defined variable in SQL or a variable maintained by your program.

Use a user-defined variable

Save the master row AUTO_INCREMENT value in a user-defined variable for use when inserting the detail rows:

INSERT INTO invoice (inv_id,date)

VALUES(NULL,CURDATE());

SET @inv_id = LAST_INSERT_ID();

INSERT INTO inv_item (inv_id,qty,description)

VALUES(@inv_id,1,'hammer');

INSERT INTO inv_item (inv_id,qty,description)

VALUES(@inv_id,3,'nails, box');

INSERT INTO inv_item (inv_id,qty,description)

VALUES(@inv_id,12,'bandage');

Use an API variable

This method is similar to the previous one, but applies only from within an API. Insert the master row, and then save the AUTO_INCREMENT value into an API variable for use when inserting detail rows. For example, in Ruby, you can access the AUTO_INCREMENT using the database handle insert_id attribute, so the invoice-entry procedure looks something like this:

dbh.do("INSERT INTO invoice (inv_id,date) VALUES(NULL,CURDATE())")

inv_id = dbh.func(:insert_id)

sth = dbh.prepare("INSERT INTO inv_item (inv_id,qty,description)

VALUES(?,?,?)")

sth.execute(inv_id, 1, "hammer")

sth.execute(inv_id, 3, "nails, box")

sth.execute(inv_id, 12, "bandage")

Using Sequence Generators as Counters

Problem

You’re interested only in counting events, so there’s no point in creating a table row for each sequence value.

Solution

Use a sequence-generation mechanism that uses just one row per counter.

Discussion

AUTO_INCREMENTcolumns are useful for generating sequences across a set of individual rows. But for some applications, you’re interested only in a count of the number of times an event occurs, and there’s no value in creating a separate row for each event. Instances include web page or banner ad hit counters, a count of items sold, or the number of votes in a poll. For such applications, you need only a single row to hold the count as it changes over time. MySQL provides a mechanism for this that enables counts to be treated like AUTO_INCREMENT values so that you can not only increment the count, but retrieve the updated value easily.

To count a single type of event, you can use a trivial table with a single row and column. For example, if you’re selling copies of a book, you can create a table to record sales for it like this:

CREATE TABLE booksales (copies INT UNSIGNED);

However, if you’re counting sales for multiple book titles, that method won’t work so well. You certainly don’t want to create a separate single-row counting table per book. Instead, you can count them all within a single table if you include a column that provides a unique identifier for each book. The following table, booksales, does this using a title column for the book title in addition to a copies column that records the number of copies sold:

CREATE TABLE booksales

(

title VARCHAR(60) NOT NULL, # book title

copies INT UNSIGNED NOT NULL, # number of copies sold

PRIMARY KEY (title)

);

To record sales for a given book, different approaches are possible:

§ Initialize a row for the book with a copies value of 0:

INSERT INTO booksales (title,copies) VALUES('The Greater Trumps',0);

Then increment the copies value for each sale:

UPDATE booksales SET copies = copies+1 WHERE title = 'The Greater Trumps';

This method requires that you remember to initialize a row for each book or the UPDATE will fail.

§ Use INSERT with ON DUPLICATE KEY UPDATE, which initializes the row with a count of 1 for the first sale and increments the count for subsequent sales:

§ INSERT INTO booksales (title,copies)

§ VALUES('The Greater Trumps',1)

ON DUPLICATE KEY UPDATE copies = copies+1;

This is simpler because you can use the same statement for initializing and updating the sales count.

To retrieve the sales count (so that you can display a message to the customer such as “you just purchased copy n of this book”), issue a SELECT query for the same book title:

SELECT copies FROM booksales WHERE title = 'The Greater Trumps';

Unfortunately, this is not quite correct. Suppose that between the times when you update and retrieve the count, some other person buys a copy of the book (and thus increments the copies value). Then the SELECT statement won’t actually produce the value you incremented the sales count to, but rather its most recent value. In other words, other clients can affect the value before you have time to retrieve it. This is similar to the problem discussed earlier that can occur if you try to retrieve the most recent AUTO_INCREMENT value from a column by invoking MAX( col_name )rather than LAST_INSERT_ID().

There are ways around this (such as by grouping the two statements as a transaction or by locking the table), but MySQL provides a different solution based on LAST_INSERT_ID(). If you call LAST_INSERT_ID() with an expression argument, MySQL treats it like an AUTO_INCREMENTvalue. To use this feature with the booksales table, modify the count-incrementing statement slightly:

INSERT INTO booksales (title,copies)

VALUES('The Greater Trumps',LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE copies = LAST_INSERT_ID(copies+1);

The statement uses the LAST_INSERT_ID( expr ) construct both to initialize and to increment the count. With an expression argument to LAST_INSERT_ID(), MySQL treats the expression like an AUTO_INCREMENT value. Then you can invoke LAST_INSERT_ID() with no argument to retrieve the value:

SELECT LAST_INSERT_ID();

By setting and retrieving the copies column this way, you can always get back the value that you set it to, even if some other client has updated it in the meantime. If you’re issuing the INSERT statement from within an API that provides a mechanism for fetching the most recentAUTO_INCREMENT value directly, you need not even issue the SELECT query. For example, in Python, you can update a count and get the new value using the insert_id() method:

cursor = conn.cursor ()

cursor.execute ("""

INSERT INTO booksales (title,copies)

VALUES('The Greater Trumps',LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE copies = LAST_INSERT_ID(copies+1)

""")

count = conn.insert_id ()

In Java, the operation looks like this:

Statement s = conn.createStatement ();

s.executeUpdate (

"INSERT INTO booksales (title,copies)"

+ " VALUES('The Greater Trumps',LAST_INSERT_ID(1))"

+ " ON DUPLICATE KEY UPDATE copies = LAST_INSERT_ID(copies+1)");

long count = ((com.mysql.jdbc.Statement) s).getLastInsertID ();

s.close ();

The use of LAST_INSERT_ID( expr ) for sequence generation has certain other properties that differ from true AUTO_INCREMENT sequences:

§ AUTO_INCREMENT values increment by one each time, whereas counter values generated by LAST_INSERT_ID(expr) can be incremented by whatever value you want. For example, to produce the sequence 10, 20, 30, ..., increment the count by 10 each time. You need not even increment the counter by the same value each time. If you sell a dozen copies of a book rather than a single copy, update its sales count as follows:

§ INSERT INTO booksales (title,copies)

§ VALUES('The Greater Trumps',LAST_INSERT_ID(12))

ON DUPLICATE KEY UPDATE copies = LAST_INSERT_ID(copies+12);

§ You can start the sequence at any integer, including negative values. It’s also possible to produce decreasing sequences by using a negative increment. (For a column that is used to generate a sequence that includes negative values, you should omit UNSIGNED from the column definition.)

§ To reset a counter, simply set it to the desired value. Suppose that you want to report to book buyers the sales for the current month, rather than the total sales (for example, to display messages like “you’re the nth buyer this month”). To clear the counters to zero at the beginning of each month, run this statement:

UPDATE booksales SET copies = 0;

§ One property that’s not so desirable is that the value generated by LAST_INSERT_ID( expr ) is not uniformly available via client-side retrieval methods under all circumstances. You can get it after UPDATE or INSERT statements, but not for SET statements. If you generate a value as follows (in Ruby), the client-side value returned by insert_id will be 0, not 48:

§ dbh.do("SET @x = LAST_INSERT_ID(48)")

seq = dbh.func(:insert_id)

To get the value in this case, ask the server for it:

seq = dbh.select_one("SELECT LAST_INSERT_ID()")[0]

See Also

Web Page Access Counting revisits the single-row sequence-generation mechanism, where it serves as the basis for implementing web page hit counters.

Generating Repeating Sequences

Problem

You need to create a sequence that contains cycles.

Solution

Generate a sequence, and produce the cyclic elements using the division and modulo operators.

Discussion

Some sequence-generation problems require values that go through cycles. Suppose that you’re manufacturing items such as pharmaceutical products or automobile parts, and you must be able to track them by lot number if manufacturing problems are discovered later that require items sold within a particular lot to be recalled. Suppose also that you pack and distribute items 12 units to a box and 6 boxes to a case. In this situation, item identifiers are three-part values: the unit number (with a value from 1 to 12), the box number (with a value from 1 to 6), and a lot number (with a value from 1 to whatever the highest case number happens to be currently).

This item-tracking problem appears to require that you maintain three counters, so you might think about generating the next identifier value using an algorithm like this:

retrieve most recently used case, box, and unit numbers

unit = unit + 1 # increment unit number

if (unit > 12) # need to start a new box?

{

unit = 1 # go to first unit of next box

box = box + 1

}

if (box > 6) # need to start a new case?

{

box = 1 # go to first box of next case

case = case + 1

}

store new case, box, and unit numbers

You could indeed implement an algorithm that way. However, it’s also possible simply to assign each item a sequence number identifier and derive the corresponding case, box, and unit numbers from it. The identifier can come from an AUTO_INCREMENT column or a single-row sequence generator. The formulas for determining the case, box, and unit numbers for any item from its sequence number look like this:

unit_num = ((seq - 1) % 12) + 1

box_num = (int ((seq - 1) / 12) % 6) + 1

case_num = int ((seq - 1)/(6 * 12)) + 1

The following table illustrates the relationship between some sample sequence numbers and the corresponding case, box, and unit numbers:

seq

case

box

unit

1

1

1

1

12

1

1

12

13

1

2

1

72

1

6

12

73

2

1

1

144

2

6

12

Numbering Query Output Rows Sequentially

Problem

You want to number the rows of a query result.

Solution

If you’re writing your own program, just add the row numbers yourself.

Discussion

A type of sequence that has nothing to do with the contents of your database is to number output rows from a query. When working within an API, you can number the rows by maintaining a counter and displaying its current value with each row’s contents. Here is an example in Python, using the insects table. It displays a simple numbered list of the distinct values in the origin column of the table:

cursor = conn.cursor ()

cursor.execute ("SELECT DISTINCT origin FROM insect")

count = 1

for row in cursor.fetchall ():

print count, row[0]

count = count + 1

cursor.close ()

See Also

The mysql program provides no explicit row-numbering facilities, although you can use a user-defined variable to include an extra row number column in a query’s output. Another way to produce results that may be suitable for your purposes is to filter mysql output through another program that adds row numbers. Numbering Query Output Lines describes these techniques.