Data Manipulation Statements and Functions - SQL Statements and Functions - MySQL in a Nutshell (2008)

MySQL in a Nutshell (2008)

Part II. SQL Statements and Functions

Chapter 6. Data Manipulation Statements and Functions

This chapter explains SQL statements in MySQL related to data manipulation: adding, changing, and deleting data, as well as retrieving selected data. Statements that create and alter databases and tables are covered in the previous chapter. In essence, this chapter covers SQL statements used when manipulating the data itself, not when developing a database. The two modes involve fairly distinct mindsets and are sometimes conducted by different people.

Statements and Functions Grouped by Characteristics

The following SQL statements are covered in this chapter.

Data Manipulation Statements

Here is a list of SQL statements and clauses used in MySQL for data manipulation:

DELETE, DO, EXPLAIN, HANDLER, HELP, INSERT, JOIN, LIMIT, LOAD DATA INFILE, REPLACE, SELECT, SET, SHOW ERRORS, SHOW WARNINGS, TRUNCATE, UNION, UPDATE, USE.

Transaction Statements

Transactions are a set of SQL statements that the server has to execute as a unit: either all succeed or all fail. If the server detects that all have succeeded, it commits the transaction; if any statement fails, the server rolls back the previous statements. Transactions are supported by the InnoDB, BDB, and NDB Cluster storage engines, as well as some new storage engines for MySQL that are under development. Statements that manipulate transactions are ignored if executed against a storage engine that doesn’t support transactions, notably MyISAM.

The following is a list of SQL statements that are specifically related to transactions. They work only with tables that use a transactional storage engine (e.g., InnoDB, BDB, and NDB Cluster):

BEGIN, COMMIT, RELEASE SAVEPOINT, ROLLBACK, ROLLBACK TO SAVEPOINT, SAVEPOINT, SET TRANSACTION, START TRANSACTION, XA.

Related Functions

The following functions are also covered in this chapter because they relate to data manipulation. They are explained at the end of this chapter:

ANALYSE(), BENCHMARK(), DATABASE(), FOUND_ROWS(), LAST_INSERT_ID(), ROW_COUNT(), SCHEMA().

Statements and Clauses in Alphabetical Order

The following is a list of MySQL statements and clauses related to data manipulation, in alphabetical order. To understand how this book presents SQL syntax and describes SQL statements, as well as for information related to examples, please see the introduction to Part II. Many of the examples in this chapter involve the activities of the departments of a fictitious company: its human resources department and employee data, its sales department and client contact information, and its internal IT department’s work requests.

Name

BEGIN

Synopsis

BEGIN [WORK]

Use this statement to start a transaction. Transaction statements are currently supported by the InnoDB, NDB Cluster, and BDB storage engines and are ignored if used with MyISAM tables. The WORK keyword is optional. Don’t confuse the BEGIN statement with the BEGIN...ENDcompound statement used in stored procedures and triggers (see Chapter 9). To eliminate confusion on this point, it’s recommended you use the alias START TRANSACTION instead of BEGIN.

A transaction is permanently recorded when the session issues a COMMIT statement, starts another transaction, or terminates the connection. You can reverse a transaction by issuing a ROLLBACK statement if the transaction has not yet been committed. See the explanations of COMMIT andROLLBACK later in this chapter for more information on transactions. The SAVEPOINT and ROLLBACK TO SAVEPOINT statements may also be useful.

Here is an example of the BEGIN statement’s use in context:

BEGIN;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

COMMIT;

In this example, if there is a problem after the batch of orders is inserted into the orders table, the ROLLBACK statement could be issued instead of the COMMIT statement shown here. ROLLBACK would remove the data imported by the INSERT DATA INFILE statement.

Name

COMMIT

Synopsis

COMMIT [WORK] [AND [NO] CHAIN] [[NO] RELEASE]

Use this statement to commit transactions, which are SQL statements that have changed data and have been entered into MySQL but are not yet saved. Transaction statements are currently supported by the InnoDB, NDB Cluster, and BDB storage engines and are ignored if used with MyISAM tables.

If AUTOCOMMIT is enabled, it must be disabled for this statement to be meaningful. You can disable it explicitly with the statement:

SET AUTOCOMMIT = 0;

Normally, AUTOCOMMIT is disabled by a START TRANSACTION statement and reinstated with the COMMIT statement.

The WORK keyword is optional and has no effect on the results. It’s available for compatibility with its counterpart, BEGIN WORK. Use the AND CHAIN clause to complete one transaction and start another, thus making it unnecessary to use START TRANSACTION again. Use the AND RELEASEclause to end the current client session after completing the transaction.

Add the keyword NO to indicate explicitly that a new transaction is not to begin (when used with CHAIN) or that the client session is not to end (when used with RELEASE). This is necessary only when the system variable completion_type is set so that the server assumes that a COMMITstatement indicates the start of another transaction or releases a session.

Here is a basic example of this statement:

START TRANSACTION;

LOCK TABLES orders WRITE;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

SELECT ...;

COMMIT;

UNLOCK TABLES;

In this example, after inserting a batch of orders into the orders table, an administrator enters a series of SELECT statements to check the integrity of the data. They are omitted here to save space. If there is a problem, the ROLLBACK statement could be issued rather than the COMMITstatement shown here. ROLLBACK would remove the data imported by the INSERT DATA INFILE statement.

The following statements also cause a transaction to be committed: ALTER EVENT, ALTER FUNCTION, ALTER PROCEDURE, ALTER TABLE, BEGIN, CREATE DATABASE, CREATE EVENT, CREATE FUNCTION, CREATE INDEX, CREATE PROCEDURE, CREATE TABLE, DROP DATABASE,DROP EVENT, DROP FUNCTION, DROP INDEX, DROP PROCEDURE, DROP TABLE, LOAD DATA INFILE, LOCK TABLES, RENAME TABLE, SET AUTOCOMMIT=1, START TRANSACTION, TRUNCATE, and UNLOCK TABLES.

Name

DELETE

Synopsis

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM table

[WHERE condition]

[ORDER BY column [ASC|DESC][,...]]

[LIMIT row_count]

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] table[, table]

FROM table[,...]

[WHERE condition]

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM table[, table]

USING table[,...]

[WHERE condition]

Use this statement to delete rows of data from a given table. Three basic syntax structures are allowed. The first one shown here is restricted to a single table, whereas the other two can handle multiple tables. For all three, the LOW_PRIORITY keyword instructs the server to wait until there are no queries on the table named before deleting rows. This keyword works only with storage engines that allow table-level locking (i.e., MyISAM, MEMORY, MERGE). The QUICK keyword can be used with MyISAM tables to make deletions faster by not merging leaves in the index’s tree.The IGNORE keyword instructs MySQL to continue even if it encounters errors. You can retrieve error messages afterward with the SHOW WARNINGS statement.

Use the WHERE clause to specify which rows are to be deleted based on a given condition. You can use the DELETE statement in conjunction with the JOIN clause, which is explained later in this chapter.

Here is a simple example of this statement:

DELETE LOW_PRIORITY FROM workreq

WHERE client_id = '1076'

AND status <> 'DONE';

In this example, the client 1076 has closed its account, and management has decided just to delete all of its incomplete work requests. If a WHERE clause is not given, all of the rows for the table would be deleted permanently.

If you want to delete all of the data in a table, you can use this statement without the WHERE clause, but it’s slow because deletions are performed one row at a time. The same result can be obtained faster with the TRUNCATE statement. However, the TRUNCATE statement doesn’t return the number of rows deleted, so use DELETE if that’s important to you.

To delete only a certain number of rows in a table, use the LIMIT clause to specify the number of rows to delete. To delete a specific number of rows for a particular range of column values, use the ORDER BY clause along with the LIMIT clause. For example, suppose an account executive informs the database administrator that the last four work requests she entered for a particular client (1023) need to be deleted. The database administrator could enter the following to delete those rows:

DELETE FROM workreq

WHERE client_id = '1023'

ORDER BY request_date DESC

LIMIT 4;

In this example, the rows are first ordered by the date of the work request, in descending order (latest date first). Additional columns may be given in a comma-separated list for the ordering. The LIMIT clause is used to limit the number of deletions to the first four rows of the results of theWHERE clause and the ORDER BY clause.

The second syntax for this statement allows other tables to be referenced. In the first example shown here, the database administrator wants to delete rows representing a particular client from the work request table, but she doesn’t know the client account number. However, she knows the client’s name begins with Cole, so she could enter the following to delete the records:

DELETE workreq FROM workreq, clients

WHERE workreq.client_id = clients.client_id

AND client_name LIKE 'Cole%';

In this example, the table in which rows will be deleted is given after the DELETE keyword. It’s also given in the list of tables in the FROM clause, which specifies the table from which information will be obtained to determine the rows to delete. The two tables are joined in the WHERE clause on the client identification number column in each. Using the LIKE keyword, the selection of rows is limited to clients with a name beginning with Cole. Incidentally, if more than one client has a name beginning with Cole, the rows for all will be deleted from the work request table. You can delete rows in more than one table with a single statement by listing the tables in a comma-separated list after the DELETE keyword. For example, suppose that we decide to delete not only the work requests for the client, but also the row for the client in the clients table:

DELETE workreq, clients FROM workreq, clients

WHERE workreq.clientid = clients.clientid

AND client_name LIKE 'Cole%';

Notice that the only syntactical difference between this statement and the one in the previous example is that this statement lists both tables for which rows are to be deleted after the DELETE keyword and before the FROM clause. Deletions are permanent, so take care which tables you list for deletion.

The third syntax operates in the same way as the second one, but it offers a couple of keywords that may be preferred for clarity. If the previous example were entered with this third syntax, it would look like this:

DELETE FROM workreq USING workreq, clients

WHERE workreq.clientid = clients.clientid

AND client_name LIKE 'Cole%';

Notice that the table from which rows will be deleted is listed in the FROM clause. The tables that the statement will search for information to determine which rows to delete are listed in the USING clause. The results of statements using this syntax structure and those using the previous one are the same. It’s just a matter of style preference and compatibility with other database systems.

Although MySQL will eventually reuse space allocated for deleted rows, you can compact a table that has had many rows deleted by using the OPTIMIZE TABLE statement or the myisamchk utility.

Name

DO

Synopsis

DO expression[,...] | (statement)

This statement suppresses the display of an expression’s results. Multiple expressions may be given in a comma-separated list. As of version 4.1 of MySQL, subqueries may be given. Here is an example:

DO (SET @company = 'Van de Lay Industries' );

This statement creates the @company variable with the value given, but without displaying any results.

Name

EXPLAIN

Synopsis

EXPLAIN table

EXPLAIN [EXTENDED|PARTITIONS] SELECT...

Use this statement to display information about the columns of a given table or the handling of a SELECT statement. For the first usage, the statement is synonymous with the DESCRIBE and SHOW COLUMNS statements. For the latter usage, EXPLAIN shows which index the statement will use and, when multiple tables are queried, the order in which the tables are used. This can be helpful in determining the cause of a slow query. Here is an example involving a simple subquery in which we are retrieving a list of our top clients and counting the number of work request tickets they’ve generated, and then querying those results to order them by the number of tickets:

EXPLAIN

SELECT * FROM

(SELECT client_name, COUNT(*) AS tickets

FROM work_req

JOIN clients USING(client_id)

WHERE client_type = 1

AND DATEDIFF(NOW(), request_date) < 91

GROUP BY client_id) AS derived1

ORDER BY tickets DESC;

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: <derived2>

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 8

Extra: Using filesort

*************************** 2. row ***************************

id: 2

select_type: DERIVED

table: clients

type: ALL

possible_keys: PRIMARY

key: NULL

key_len: NULL

ref: NULL

rows: 94

Extra: Using where; Using temporary; Using filesort

*************************** 3. row ***************************

id: 2

select_type: DERIVED

table: work_req

type: ref

possible_keys: client_id,workreq_date_key

key: workreq_date_key

key_len: 5

ref: company_database.clients.client_id

rows: 1

Extra: Using where; Using index

We can discern plenty from these results, such as which indexes were used, if any. For example, the possible_keys field in the third row lists the indexes that might have been used to find the data, whereas the key field indicates that the index workreq_date_key was actually used. (That index covers the client_id and request_date columns.) If the possible_keys field showed a value of NULL, then no index was used or could have been used. This would indicate that you should consider adding an index to the table.

Basically, this statement tells you what MySQL does when it executes the given SQL statement. It doesn’t tell you what to do differently to improve performance. For that, you will need to use your judgment. See Table 6-1 for a list of possible select_types.

Table 6-1. select_type for EXPLAIN statement results

Type

General meaning

SIMPLE

Indicates a simple SELECT statement, without a subquery or a UNION.

PRIMARY

When using a subquery, this is the main SELECT statement.

UNION

When using a UNION, this is not the first SELECT statement.

DEPENDENT UNION

When using a UNION, this is not the first SELECT statement that is dependent on the main query.

UNION RESULT

The result of a UNION.

SUBQUERY

The first SELECT statement in a subquery.

DEPENDENT SUBQUERY

The first SELECT statement in a subquery that is dependent on the main query.

DERIVED

The table derived from the subquery.

UNCACHEABLE SUBQUERY

Indicates a subquery in which the results cannot be cached and therefore must be reevaluated for each row of the main query.

UNCACHEABLE UNION

The UNION of a subquery in which the results cannot be cached and therefore must be reevaluated for each row of the main query.

Name

HANDLER

Synopsis

HANDLER table OPEN [AS handle]

HANDLER handle READ index { = | >= | <= | < } (value,...)

[WHERE condition] [LIMIT ...]

HANDLER handle READ index {FIRST|NEXT|PREV|LAST}

[WHERE condition] [LIMIT ...]

HANDLER handle READ {FIRST|NEXT}

[WHERE condition] [LIMIT ...]

HANDLER handle CLOSE

A handle provides direct access to a table, as opposed to working from a results set. Handles can be faster than SELECT statements when reading large numbers of rows from a table. MyISAM and InnoDB tables currently support handlers.

A handle is usable only by the session (connection thread) that established it. The table is still accessible by other sessions, though, and is not locked by this statement. Because of this, and because the method provides direct table access, the data in the table can change and even be incomplete as the handler performs successive reads.

Create a handler by issuing a HANDLER statement with the OPEN clause to establish a handle for the table, much like a file handle in a programming language such as Perl. The AS clause and handle name are optional. If an alias is not given, the table name is used as the handler name for subsequent HANDLER statements.

You can then use HANDLER statement formats with READ clauses to read data from a table. Finish by issuing HANDLER with a CLOSE clause.

Here are a couple of basic examples of the HANDLER statement:

HANDLER clients OPEN AS clients_handle;

HANDLER clients_handle READ FIRST;

The first line creates the table handle called clients_handle, based on the clients table. The next SQL statement retrieves the first row of data from the table. The result of this statement is the same as running a SELECT to retrieve all columns of the table and then picking off the first row in the results set. To continue retrieving results in the same way as a results set from a SELECT, issue the following:

HANDLER clients_handle READ NEXT;

Every time the statement is run with the NEXT keyword, the pointer is advanced and the next row in the table is displayed until the end of the table is reached. To retrieve more than one row, you can use the LIMIT clause like this:

HANDLER clients_handle READ NEXT LIMIT 3;

This statement displays the next three rows from the table.

The WHERE clause may be used with a HANDLER...READ statement in the same way as with the SELECT statement. Here is an example:

HANDLER clients_handle READ FIRST

WHERE state = 'MA' LIMIT 5;

This statement displays the first five rows in which the client is located in the state of Massachusetts. Note that no ORDER BY clause is available for HANDLER...READ statements. Therefore, the first five rows are based on the order in which they are stored in the table.

To extract data based on an index, use one of the READ clauses that specify indexes. Here is an example like the previous one, but with the addition of an index:

HANDLER clients_handle READ cid PREV

WHERE state = 'MA' LIMIT 2;

This example retrieves two rows matching the condition of the WHERE clause; the rows come from the previous batch of rows displayed thanks to the PREV keyword. Performance could benefit from the use of the cid index, if it was based on the state column. To retrieve the next set of rows using this syntax, replace PREV with NEXT.

The LAST keyword searches for and retrieves rows starting from the last row of the table. Here is another example using an index:

HANDLER clients_handle READ name = ('NeumeyerGera');

The name index is a combination of the name_last and the name_first column, but only the first four characters of the first name are used by the index. Given the sample database used for this book, this statement displays the row for the client Gerard Neumeyer. The values for each column may be separated with commas (e.g., 'Neumeyer', 'Gera'), or spliced together as shown. This feature, a condition for a multicolumn index, would be a difficult contortion with a SELECT statement.

Name

HELP

Synopsis

HELP [{'command | reserve_word'}]

You can use this statement to access built-in documentation. Enter HELP alone to display a list of MySQL commands for which you may display documentation. Typing HELP contents displays a table of contents for this internal documentation. For quick reference, you can also give an SQL statement or clause:

HELP SELECT;

This displays the syntax for the SELECT statement along with a brief description of some of the clauses. Similarly, entering HELP SHOW gives you a list of SQL statements beginning with SHOW.

Name

INSERT

Synopsis

INSERT [LOW_PRIORITY|DELAYED|HIGH_PRIORITY] [IGNORE]

[INTO] table

SET column={expression|DEFAULT}, ...

[ON DUPLICATE KEY UPDATE column=expression, ...]

INSERT [LOW_PRIORITY|DELAYED|HIGH_PRIORITY] [IGNORE]

[INTO] table [(column, ...)]

VALUES ({expression|DEFAULT},...),(...),...

[ON DUPLICATE KEY UPDATE column=expression, ...]

INSERT [LOW_PRIORITY|HIGH_PRIORITY] [IGNORE]

[INTO] table [(column, ...)]

SELECT...

[ON DUPLICATE KEY UPDATE column=expression, ...]

Use this statement to add rows of data to a table. The first format shown can insert only one row of data per statement. The second format can handle one or more rows in a single statement. The columns and their order are specified once, but values for multiple rows may be given. Each row of values is to be contained in its own set of parentheses, separated by commas. The third format inserts columns copied from rows in other tables. Explanations of the specifics of each type of statement, their various clauses and keywords, and examples of their uses follow in the next three subsections of this SQL statement.

A few parameters are common to two formats, and a few are common to all formats.

You can use the LOW_PRIORITY keyword to instruct the server to wait until all other queries related to the table in which data is to be added are finished before running the INSERT statement. When the table is free, it is locked for the INSERT statement and will prevent concurrent inserts.

The DELAYED keyword is available for the first two syntaxes and indicates the same priority status, but it releases the client so that other queries may be run and so that the connection may be terminated. A DELAYED query that returns without an error message does not guarantee that the inserts will take place; it confirms only that the query is received by the server to be processed. If the server crashes, the data additions may not be executed when the server restarts and the user won’t be informed of the failure. To confirm a DELAYED insert, the user must check the table later for the inserted content with a SELECT statement. The DELAYED option works only with MyISAM and InnoDB tables. It’s also not applicable when the ON DUPLICATE KEY UPDATE clause is used.

Use the HIGH_PRIORITY keyword to override a --low-priority-updates server option and to disable concurrent inserts.

The IGNORE keyword instructs the server to ignore any errors encountered and suppress the error messages. In addition, for multiple row insertions, the statement continues to insert rows after encountering errors on previous rows. Warnings are generated that the user can display with theSHOW WARNINGS statement.

The INTO keyword is optional and only for compatibility with other database engines.

The DEFAULT keyword can be given for a column for the first two syntax formats to instruct the server to use the default value for the column. You can set the default value either with the CREATE TABLE statement when the table is created or with the ALTER TABLE statement for existing tables.

The ON DUPLICATE KEY UPDATE clause tells an INSERT statement how to handle an insert when an index in the table already contains a specified value in a column. With this clause, the statement updates the data in the existing row to reflect the new values in the given columns. Without this clause, the statement generates an error. An example appears in the next section.

Single-row insertion with the SET clause

INSERT [LOW_PRIORITY|DELAYED|HIGH_PRIORITY] [IGNORE]

[INTO] table

SET column={expression|DEFAULT}, ...

[ON DUPLICATE KEY UPDATE column=expression, ...]

This variant of the INSERT statement allows only one row of data to be inserted into a table per SQL statement. The SET clause lists one or more column names, each followed by the value to which it is to be set. The value given can be a static value or an expression. Here is an example:

INSERT INTO clients

SET client_name = 'Geoffrey & Company',

city = 'Boston', state = 'MA';

This example lists three columns along with the values to be set in a row entry in the clients table. Other columns in the newly inserted row will be handled in a default manner. For instance, an AUTO_INCREMENT column will be set to the next number in sequence.

As mentioned earlier, the ON DUPLICATE KEY UPDATE clause allows an INSERT statement to handle rows that already contain specified values. Here is an example:

CREATE UNIQUE INDEX client_phone

ON clients(client_name,telephone);

ALTER TABLE clients

ADD COLUMN new_telephone TINYINT(1)

AFTER telephone;

INSERT INTO clients

SET client_name = 'Marie & Associates',

new_telephone = 0

telephone = '504-486-1234'

ON DUPLICATE KEY UPDATE

new_client = 1;

This example starts by creating an index on the client_phone column in the clients table. The index type is UNIQUE, which means that duplicate values for the combination of client_name and telephone columns are not allowed. With the second SQL statement, we add a column to flag new telephone numbers for existing clients. The INSERT statement tries to insert the specified client name and telephone number. But it indicates that if there is already a row in the table for the client, a new row is not to be added. Instead, the existing row is to be updated per the UPDATEclause, setting the original entry’s telephone column to the value given in the SET clause. The assumption is that the new data being inserted either is for a new client or is an update to the existing client’s telephone number. Instead of using a column value after the equals sign, a literal value or an expression may be given.

Multiple-row insertions

INSERT [LOW_PRIORITY|DELAYED|HIGH_PRIORITY] [IGNORE]

[INTO] table [(column,...)]

VALUES ({expression|DEFAULT},...), (...)

[ON DUPLICATE KEY UPDATE column=expression,...]

This format of the INSERT statement allows one SQL statement to insert multiple rows. The columns in which data is to be inserted may be given in parentheses in a comma-separated list. If no columns are specified, the statement must include a value for each column in each row, in the order that they appear in the table. In the place reserved for an AUTO_INCREMENT column, specify NULL and the server will insert the correct next value in the column. To specify default values for other columns, use the DEFAULT keyword. NULL may also be given for any other column that permits NULL and that you wish to leave NULL. The VALUES clause lists the values of each row to be inserted into the table. The values for each row are enclosed in parentheses; each row is separated by a comma. Here is an example:

INSERT INTO clients (client_name, telephone)

VALUES('Marie & Associates', '504-486-1234'),

('Geoffrey & Company', '617-522-1234'),

('Kenneth & Partners', '617-523-1234');

In this example, three rows are inserted into the clients table with one SQL statement. Although the table has several columns, only two columns are inserted for each row here. The other columns are set to their default value or to NULL. The order of the values for each row corresponds to the order that the columns are listed.

Normally, if a multiple INSERT statement is entered and one of the rows to be inserted is a duplicate, an error is triggered and an error message is displayed. The statement is terminated and no rows are inserted. The IGNORE keyword, however, instructs the server to ignore any errors encountered, suppress the error messages, and insert only the non-duplicate rows. The results of this statement display like so:

Query OK, 120 rows affected (4.20 sec)

Records: 125 Duplicates: 5 Warnings: 0

These results indicate that 125 records were to be inserted, but only 120 rows were affected or successfully inserted. There were five duplicates in the SQL statement, but there were no warnings because of the IGNORE keyword. Entering the SHOW WARNINGS statement will display the suppressed warning messages.

Inserting rows based on a SELECT

INSERT [LOW_PRIORITY|HIGH_PRIORITY] [IGNORE]

[INTO] table [(column,...)]

SELECT...

[ON DUPLICATE KEY UPDATE column=expression,...]

This method of the INSERT statement allows for multiple rows to be inserted in one SQL statement, based on data retrieved from another table by way of a SELECT statement. If no columns are listed (i.e., an asterisk is given instead), the SELECT will return the values of all columns in the order in which they are in the selected table and will be inserted (if possible without error) in the same order in the table designated for inserting data into. If you don’t want to retrieve all of the columns of the selected table, or if the columns in both tables are not the same, then you must list the columns to retrieve in the SELECT statement and provide a matching ordered list of the columns of the table that data is to be inserted into.

For the following example, suppose that the employees table contains a column called softball to indicate whether an employee is a member of the company’s softball team. Suppose further that it is decided that a new table should be created to store information about members of the softball team and that the team’s captain will have privileges to this new table (softball_team), but no other tables. The employee names and telephone numbers need to be copied into the new table because the team’s captain will not be allowed to do a query on the employees table to extract that information. Here are the SQL statements to set up the new table with its initial data:

CREATE TABLE softball_team

(player_id INT KEY, player_name VARCHAR(50),

position VARCHAR(20), telephone CHAR(8));

INSERT INTO softball_team

(player_id, player_name, telephone)

SELECT emp_id, CONCAT(name_first, ' ', name_last),

RIGHT(telephone_home, 8)

FROM employees

WHERE softball = 'Y';

The first SQL statement creates the new table. The columns are very simple: one column as a row identifier, one column for both the first and last names of the player, another for the player’s home telephone number, and yet another for the player’s position, to be filled in later by the team’s captain. Normally, we wouldn’t include a column like the one for the player’s name because that would be duplicating data in two tables. However, the team captain intends to change many of the player’s names to softball nicknames (e.g., Slugger Johnson).

In the second SQL statement, the INSERT statement uses an embedded SELECT statement to retrieve data from the employees table where the softball column for the row is set to 'Y'. The CONCAT() function is used to put together the first and last names, separated by a space. This will go into the name column in the new table. The RIGHT() function is used to extract only the last eight characters of the telephone_home column because all of the employees on the softball team are from the same telephone dialing area. See Chapter 11 for more information on these functions. Notice that we’ve listed the three columns that data is to go into, although there are four in the table. Also notice that the SELECT statement has three columns of the same data types but with different names.

Name

JOIN

Synopsis

SELECT...|UPDATE...|DELETE...

table [INNER|CROSS] JOIN table [ON condition|USING (column[,...])] |

table STRAIGHT_JOIN table ON condition |

table LEFT [OUTER] JOIN table {ON condition|USING (column[,...])} |

table NATURAL [LEFT [OUTER]] JOIN table |

[OJ table LEFT OUTER JOIN table ON condition] |

table RIGHT [OUTER] JOIN table {ON condition|USING (column[,...])} |

table NATURAL [RIGHT [OUTER]] JOIN table

The JOIN clause is common to several SQL statements (SELECT, UPDATE, DELETE) and is complex; therefore, it is listed here as its own entry in the chapter. Use JOIN to link tables together based on columns with common data for purposes of selecting, updating, or deleting data. The JOINclause is entered at the place in the relevant statement that specifies the tables to be referenced. This precludes the need to join the tables based on key columns in the WHERE clause.

The ON keyword is used to indicate the pair of columns by which the tables are to be joined (indicated with the equals sign operator). As an alternative method, the USING keyword may be given along with a comma-separated list of columns both tables have in common, contained within parentheses. The columns must exist in each table that is joined. To improve performance, you can also provide index hints to MySQL (see the last subsection of this clause definition, Index hints”).

Here is an example of a JOIN:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name

FROM employees

JOIN branches ON employees.branch_id = branches.branch_id

WHERE location = 'New Orleans';

This statement displays a list of employees from the employees table who are located in the New Orleans branch office. The problem solved by the JOIN is that the employees table doesn’t indicate New Orleans by name as the branch; that table just has a numeric identifier. The branchestable is used to retrieve the branch name for the WHERE clause. The location column is a column in the branches table. Nothing is actually displayed from the branches table here. Since the record identification column for branches is branch_id in both tables, the USING keyword can be used instead of ON to create the same join:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name

FROM employees

JOIN branches USING (branch_id)

WHERE location = 'New Orleans';

This will join the two tables on the branch_id column in each table. Since these tables have only one column in common, it’s not necessary to specify that row; instead, you can use the NATURAL keyword. Here is the same statement with this change:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name

FROM employees

NATURAL JOIN branches

WHERE location = 'New Orleans';

Notice that the USING keyword and the column for linking are omitted. MySQL will assume that branch_id in both columns are the same and will naturally join the tables on them. The results of this SQL statement will be the same as those of the previous two.

When joining two tables in a simple join, as shown in the previous examples, if no rows in the second table match rows from the first table, no row will be displayed for the unmatched data. For example, if the branches table lists a branch office for which there are no employees listed in theemployees table belonging to that branch, the results set would not show a row for that supposedly empty branch office. Sometimes, though, it can be useful to display a record regardless. In our example, this would tell us that something’s wrong with the data: either one or more employees are marked with the wrong branch_id, or some employee records are missing from the employees table. Conversely, if an employee has a branch_id value that does not exist in the branches table, we would want to see it in the results so that we can correct the data.

To list a row for each employee including stray ones, the LEFT keyword may be given in front of the JOIN keyword to indicate that records from the first table listed on the left are to be displayed regardless of whether there is a matching row in the table on the right:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,

location AS Branch

FROM employees

LEFT JOIN branches USING (branch_id)

ORDER BY location;

This SQL statement lists a row for each employee along with the employee’s location. If a row for an employee has either a NULL value for the branch_id, or a branch number that is not in the branches table, the employee name will still be displayed but with the branch name reading as NULL. Again, this can be useful for spotting errors or inconsistencies in the data between related tables.

In contrast to LEFT JOIN, the RIGHT JOIN clause includes all matching entries from the table on the right even if there are no matches from the table on the left. Here is an example using a RIGHT JOIN:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,

location AS Branch

FROM employees

RIGHT JOIN branches USING (branch_id)

ORDER BY location;

This example displays branches for which there are no matching employee records. For both the LEFT and RIGHT JOIN methods, the OUTER keyword is optional and has no effect on the results. It’s just a matter of preference and compatibility with other database engines.

The JOIN clause has a few other options. The STRAIGHT_JOIN keyword explicitly instructs MySQL to read the tables as listed, from left to right. The keywords INNER and CROSS have no effect on the results, as of recent versions of MySQL. They cannot be used in conjunction with the keywords LEFT, RIGHT, or NATURAL. The syntax starting with the OJ keyword is provided for compatibility with Open Database Connectivity (ODBC).

You can use the AS keyword to introduce aliases for tables. Several examples of aliasing are provided earlier in the explanation of this clause.

Index hints

SELECT...|UPDATE...|DELETE...

table...JOIN table

USE {INDEX|KEY} [{FOR {JOIN|ORDER BY|GROUP BY}] ([index[,...]]) |

FORCE {INDEX|KEY} [{FOR {JOIN|ORDER BY|GROUP BY}] (index[,...]) |

IGNORE {INDEX|KEY} [{FOR {JOIN|ORDER BY|GROUP BY}] (index[,...])

When MySQL joins and searches tables, indexes can be used to increase the speed of the SQL statements. Use the EXPLAIN statement to analyze a joined SQL statement to see which indexes are being used and in which order, as well as whether there are other indexes available that aren’t being used in the join. MySQL may not always choose the best index available. To hint to MySQL which index it should check first, and perhaps which index to ignore, or even to force it to use a particular index, you can employ index hints.

To tell MySQL to use a particular index, add the USE INDEX clause to the JOIN along with the names of the indexes in a comma-separated list, within parentheses. To present an example of this method, let’s start with a JOIN statement that may execute in a suboptimal manner:

SELECT client_name, COUNT(*) AS tickets

FROM work_req

JOIN clients USING(client_id)

WHERE client_type = 1

AND DATEDIFF(NOW(), request_date) < 91

GROUP BY client_id

This statement retrieves a list of support clients and a count of the number of support tickets that they have created in the last 90 days. It gets the count of tickets from work_req and the client name from the clients table. To tweak the performance of the statement, let’s examine the indexes for the work_req table:

SHOW INDEXES FROM work_req \G

*************************** 1. row ***************************

Table: work_req

Non_unique: 0

Key_name: PRIMARY

Seq_in_index: 1

Column_name: wr_id

Collation: A

Cardinality: 115

Sub_part: NULL

Packed: NULL

Null:

Index_type: BTREE

Comment:

*************************** 2. row ***************************

Table: work_req

Non_unique: 1

Key_name: workreq_date_key

Seq_in_index: 1

Column_name: wr_id

Collation: A

Cardinality: 217337

Sub_part: NULL

Packed: NULL

Null: YES

Index_type: BTREE

Comment:

*************************** 3. row ***************************

Table: work_req

Non_unique: 1

Key_name: workreq_date_key

Seq_in_index: 2

Column_name: request_date

Collation: A

Cardinality: 217337

Sub_part: NULL

Packed: NULL

Null:

Index_type: BTREE

Comment:

The results show us that the table work_req has two indexes: a primary one based on the wr_id (see row 1) and a second one called workreq_date_key (see the Key_name field in rows 2 and 3) based on wr_id and request_date together. To suggest to MySQL in our JOIN statement that this second index should be used, enter the statement like so:

SELECT client_name, COUNT(*) AS tickets

FROM work_req

JOIN clients

USE INDEX FOR JOIN (workreq_date_key)

USING(client_id)

WHERE client_type = 1

AND DATEDIFF(NOW(), request_date) < 91

GROUP BY client_id;

The FORCE INDEX option instructs MySQL to attempt to limit its search to the specified index; others, however, will be used if the requested columns make it necessary:

SELECT client_name, COUNT(*) AS tickets

FROM work_req

JOIN clients

FORCE INDEX FOR JOIN (workreq_date_key)

USING(client_id)

WHERE client_type = 1

AND DATEDIFF(NOW(), request_date) < 91

GROUP BY client_id;

To instruct MySQL not to use certain indexes, list them with the IGNORE INDEX option in the same manner:

SELECT client_name, COUNT(*) AS tickets

FROM work_req

JOIN clients

IGNORE INDEX FOR JOIN (workreq_date_key)

USING(client_id)

WHERE client_type = 1

AND DATEDIFF(NOW(), request_date) < 91

GROUP BY client_id;

It’s also permitted to use combinations of these three index hint clauses, separated only by a space.

Name

LIMIT

Synopsis

...

LIMIT count |

LIMIT [offset,] count |

LIMIT count OFFSET offset

Use the LIMIT clause to limit the number of rows the server will process to satisfy the given SQL statement. For the SELECT statement, it limits the number of rows returned in the results set. In an UPDATE statement, it limits the number of rows changed. With the DELETE statement, it limits the number of rows deleted. The DELETE statement permits only the first syntax shown, whereas the other statements allow all three.

The LIMIT clause accepts only literal values, not expressions or variables. Nor will it accept a negative value. The most straightforward method of limiting the number of rows is to specify the maximum row count to be displayed, like this:

SELECT * FROM employees

LIMIT 5;

To begin listing rows after a specific number of records, an offset may be given, where the offset for the first row is 0. Two syntaxes accomplish this. One gives the amount of the offset, followed by a comma and then the maximum count of rows to display. The other specifies the count followed by the OFFSET keyword, followed by the amount of the offset. Here is an example of the first structure, which is preferred:

SELECT * FROM employees

LIMIT 10, 5;

In this example, after the 10th record is reached, the next 5 records will be returned—in other words, results 11 through 15 are returned. The offset and count for the LIMIT clause are based on the rows in the results set, not necessarily on the rows in the tables. So the amount of the offset is related to the order of the rows retrieved from the tables based on clauses, such as the WHERE clause and the ORDER BY clause.

Name

LOAD DATA INFILE

Synopsis

LOAD DATA [LOW_PRIORITY|CONCURRENT] [LOCAL] INFILE '/path/file'

[REPLACE|IGNORE] INTO TABLE table

[CHARACTER SET character_set]

[FIELDS [TERMINATED BY 'character']

[[OPTIONALLY] ENCLOSED BY 'character'] [ESCAPED BY 'character']]

[LINES [STARTING BY 'string'] [TERMINATED BY 'string']]

[IGNORE count LINES]

[(column,...)]

[SET column = expression,...]

You can use this statement to import organized data from a text file into a table in MySQL. The file can be either on the server or on the client.

For a file on the server, if you use a bare filename (such as input.txt) or a relative path (such as ../), the file is found relative to the directory of the database into which the data is to be imported. If the file is not located in the directory’s database, the file permissions must be set so it can be read for all filesystem users.

For a file on the client, the LOCAL keyword must be given. This feature must be enabled on both the client and the server by using the startup option of --local-infile=1. See Chapter 15 for more information on server and client settings.

If a data text file contains rows of data duplicating some of the rows in the table into which it’s being imported, an error will occur and the import may end without importing the remaining data. Duplicate rows are those that have the same values for key columns or other unique columns. To instruct the server to ignore any errors encountered and to continue loading other rows, use the IGNORE keyword. Use the SHOW WARNINGS statement to retrieve the error messages that would have been displayed. To instruct the server to replace any duplicate rows with the ones being imported, use the REPLACE keyword. This will completely replace the values of all columns in the row, even when the new record contains no data for a column and the existing one does.

Here is a basic example of LOAD DATA INFILE:

LOAD DATA INFILE '/tmp/catalog.txt'

INTO TABLE catalog

FIELDS TERMINATED BY '|'

LINES TERMINATED BY '\n';

In this example, the file to be loaded is in the /tmp directory and is called catalog.txt. The data contained in the file is to be inserted into the catalog table in the current database in use. Each field in the text file is terminated with a vertical bar character. The rows of data in the text file are on separate lines. They are separated by a newline character (\n). This is the default for a Unix text file. For DOS or Windows systems, lines are usually terminated with \n\r, signifying a newline and a Return character. If the rows start with a special character, you can identify that character with the LINES STARTED BY clause.

This statement also offers the ENCLOSED BY clause to specify a character that can start and terminate a field, such as a quotation mark. You can use the OPTIONALLY keyword to indicate that the character is used for enclosing columns containing string data, but optional for numeric data. Numeric fields may then include or omit the given character. For example, if the optional character is an apostrophe (single quote), a numeric value for a field may be given as '1234' or 1234, so MySQL should expect and accept both.

The ESCAPED BY clause indicates the character used in the input file to escape special characters. The backslash (\) is the default value.

Some data text files contain one or more lines of column headings that should not be imported. To omit these initial lines from the import, use the IGNORE count LINES clause, where count is the number of lines to ignore.

For some data text files, the fields of data are not in the same order as the columns of the receiving table. Sometimes there are fewer fields in the text file than in the table. For both of these situations, to change the order and number of columns, add a list of columns and their order in the text file to the end of the statement within parentheses. Here is an example of such a scenario:

LOAD DATA LOW_PRIORITY INFILE '/tmp/catalog.txt' IGNORE

INTO TABLE catalog

FIELDS TERMINATED BY '|'

LINES TERMINATED BY '\n'

IGNORE 1 LINES

(cat_id, description, price);

The first line of the text file contains column headings describing the data, but that line will not be imported because of the IGNORE 1 LINES clause here. The catalog table has several more columns than the three that are being imported, and they are in a different order. Finally, because this import is not critical, the LOW_PRIORITY keyword near the beginning of the statement instructs the server to handle other queries on the catalog table before running this statement. If this was replaced with CONCURRENT, the import would be performed even if other clients were querying the same table.

As of version 5.0.3 of MySQL, the list of fields can contain column names and user variables. Also, SET may be added to set or change the value to be imported. Here is an example:

LOAD DATA LOW_PRIORITY INFILE '/tmp/catalog.txt' IGNORE

INTO TABLE catalog

FIELDS TERMINATED BY '|'

LINES TERMINATED BY '\n'

IGNORE 1 LINES

(cat_id, @discarded, description, @mfg_price)

SET price = @mfg_price * .9;

In this example, the table receiving the data has five columns. The second one is to be ignored and stored in a discarded user variable. The third column is the price. Since the company sells the manufacturer’s products at ten percent less than the manufacturer’s suggested retail price, the statement receives the raw value in the user variable @mfg_price and then we use SET to adjust that value for the column when loaded.

Name

RELEASE SAVEPOINT

Synopsis

RELEASE SAVEPOINT identifier

This statement instructs the server to release a savepoint named earlier with the SAVEPOINT statement for the current transaction. The statement does not commit the transaction, nor does it roll back the transaction to the savepoint. Instead, it merely eliminates the savepoint as a possible rollback point. See the SAVEPOINT statement for more information. Here is an example of RELEASE SAVEPOINT:

START TRANSACTION;

LOCK TABLES orders WRITE;

INSERT DATA INFILE '/tmp/customer_info.sql'

INTO TABLE orders;

SAVEPOINT orders_import;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

SAVEPOINT orders_import1;

INSERT DATA INFILE '/tmp/customer_orders1.sql'

INTO TABLE orders;

SELECT...

RELEASE SAVEPOINT orders_import1;

In this example, the database administrator imports a customer information file and two files containing customer orders and sets up two savepoints. After running a few SELECT statements (not fully shown here), he decides that the results of the second batch of orders look all right and he releases the savepoint for that batch. He hasn’t yet decided if the first batch was imported properly. If he decides that there was a problem, he can still roll back all of the orders imported, but he can no longer roll back just the second batch.

Name

REPLACE

Synopsis

REPLACE [LOW_PRIORITY|DELAYED] [INTO] table [(column,...)]

VALUES ({expression|DEFAULT},...)[, (...)]

REPLACE [LOW_PRIORITY|DELAYED] [INTO] table

SET column={expression|DEFAULT}[, ...]

REPLACE [LOW_PRIORITY|DELAYED] [INTO] table [(column,...)]

SELECT...

Use this statement to insert new rows of data and to replace existing rows where the PRIMARY KEY or UNIQUE index key is the same as the new record being inserted. This statement requires INSERT and DELETE privileges because it is potentially a combination of both.

The LOW_PRIORITY keyword instructs the server to wait until there are no queries on the table named, including reads, and then to lock the table for exclusive use by the thread so that data may be inserted and replaced. When the statement is finished, the lock is released, automatically. For busy servers, a client may be waiting for quite a while. The DELAYED keyword will free the client by storing the statement in a buffer for processing when the table is not busy. The client won’t be given notice of the success of the statement, just that it’s buffered. If the server crashes before the changes to the data are processed, the client will not be informed and the buffer contents will be lost. The INTO keyword is optional and is a matter of style preference and compatibility with other database engines.

The REPLACE statement has three basic formats. The first contains the values for each row in parentheses after the VALUES keyword. If the order and number of values do not match the columns of the table named, the columns have to be listed in parentheses after the table name in the order in which the values are arranged. Here is an example of the REPLACE statement using this syntax:

REPLACE INTO workreq (wr_id, client_id, description)

VALUES(5768,1000,'Network Access Problem'),

(5770,1000,'Network Access Problem');

Notice that this statement is able to insert two rows without the column names being listed twice. In this example, the first row already exists before this statement is to be executed. Once it’s run, the row represented by the work request identifier 5768 is completely replaced with this data. Columns that are not included in the list of columns here are reset to their default values or to NULL, depending on the column.

The second syntax does not allow multiple rows. Instead of grouping the column names in one part of the statement and the values in another part, column names and values are given in a column=value pair. To enter the REPLACE statement from the preceding example in this format, you would have to enter the following two statements:

REPLACE INTO workreq

SET wr_id = 5768, client_id = 1000,

description = 'Network Access Problem';

REPLACE INTO workreq

SET wr_id = 5770, client_id = 1000,

description = 'Network Access Problem';

The third syntax involves a subquery, which is available as of version 4.1 of MySQL. With a subquery, data can be retrieved from another table and inserted into the table referenced in the main query for the statement. Here is an example:

REPLACE INTO workreq (wr_id, client_id, status)

SELECT wr_id, client_id, 'HOLD'

FROM wk_schedule

WHERE programmer_id = 1000;

Work requests assigned to a particular programmer are being changed to a temporarily on-hold status. The values for two of the columns are taken from the work schedule table, and the fixed string of HOLD is inserted as the value of the third column. Currently, the table for which replacement data is being inserted cannot be used in the subquery.

Name

ROLLBACK

Synopsis

ROLLBACK [WORK] [AND [NO] CHAIN] [[NO] RELEASE]

Use this statement with transactional tables to reverse transactions that have not yet been committed. Transaction statements are currently supported by the InnoDB, NDB Cluster, and BDB storage engines and are ignored if used with MyISAM tables.

If AUTOCOMMIT is enabled, it must be disabled for this statement to be meaningful, which can be done as follows:

SET AUTOCOMMIT = 0;

AUTOCOMMIT is also disabled when a transaction is started with the START TRANSACTION statement. It is reinstated with the execution of the COMMIT statement, the ending of the current session, and several other statements that imply that a commit is desired. See the explanation of COMMITearlier in this chapter for a list of statements that imply a commit.

The WORK keyword is optional and has no effect on the results. It’s available for compatibility with its counterparts, BEGIN WORK and COMMIT WORK. Use the AND CHAIN clause to indicate that the transaction is to be rolled back and another is starting, thus making it unnecessary to execute the START TRANSACTION statement again. Use the AND RELEASE clause to end the current client session after rolling back the transaction. Add the keyword NO to indicate explicitly that a new transaction is not to begin (when used with CHAIN) or the client session is not to end (when used with RELEASE)—these are the default settings, though. It’s necessary to specify NO only when the system variable completion_type is set to something other than the default setting.

Here is an example of this statement’s use in context:

START TRANSACTION;

LOCK TABLES orders WRITE;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

SELECT ...;

ROLLBACK;

UNLOCK TABLES;

In this example, after the batch of orders is inserted into the orders table, the administrator manually enters a series of SELECT statements (not shown) to check the integrity of the data. If everything seems all right, the COMMIT statement would be issued to commit the transactions, instead of the ROLLBACK statement shown here. In this case, a problem leads the administrator to issue ROLLBACK to remove the data imported by the INSERT DATA INFILE statement.

A rollback will not undo the creation or deletion of databases. It also cannot be performed on changes to table schema (e.g., ALTER TABLE, CREATE TABLE, or DROP TABLE statements). Transactions cannot be reversed with the ROLLBACK statement if they have been committed. Commits are caused by the COMMIT statement as well as several other implicit commit statements. See the explanation of COMMIT for a list of statements that imply a commit.

Name

ROLLBACK TO SAVEPOINT

Synopsis

ROLLBACK TO SAVEPOINT identifier

This statement instructs the server to reverse SQL statements for the current transaction back to a point marked in the transaction by the SAVEPOINT statement. Any transactions for the session made after the savepoint are undone. This is in contrast to ROLLBACK by itself, which undoes all changes since the start of the transaction. Transaction statements are currently supported by the InnoDB, NDB Cluster, and BDB storage engines and are ignored if used with MyISAM tables. Multiple savepoints may be set up during a transaction. Here is an example:

START TRANSACTION;

LOCK TABLES orders WRITE;

INSERT DATA INFILE '/tmp/customer_info.sql'

INTO TABLE orders;

SAVEPOINT orders_import;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

SELECT...

SAVEPOINT orders_import1;

INSERT DATA INFILE '/tmp/customer_orders1.sql'

INTO TABLE orders;

SELECT...

ROLLBACK TO SAVEPOINT orders_import1;

In this example, the database administrator has imported a customer information file and two files containing customer orders and has set up two savepoints. After running a few SELECT statements (not fully shown here), he decides that there was a problem loading the second batch of orders, so he rolls back the transaction to the savepoint, eliminating the data that was imported from the customer_orders1.sql file. If he wants, he can still roll back all of the orders imported, as well as the whole transaction. When he’s finished, he can commit the transactions by executing theCOMMIT statement. See that statement earlier in this chapter for more information on committing transactions explicitly and implicitly.

Name

SAVEPOINT

Synopsis

SAVEPOINT identifier

Use this statement to identify a point in a transaction to which SQL statements may potentially be undone later. It’s used in conjunction with the ROLLBACK TO SAVEPOINT statement. It may be released with the RELEASE SAVEPOINT statement. You can use any unreserved word to identify a savepoint and can create several savepoints during a transaction. If an additional SAVEPOINT statement is issued with the same name, the previous point will be replaced with the new point for the name given. Here is an example:

START TRANSACTION;

LOCK TABLES orders WRITE;

INSERT DATA INFILE '/tmp/customer_info.sql'

INTO TABLE orders;

SAVEPOINT orders_import;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

At this point in this example, the administrator can check the results of the orders imported before committing the transactions. If the administrator decides that the orders imported have problems (the /tmp/customer_orders.sql file), but not the client information that was first imported (the/tmp/customer_info.sql file), the following statement could be entered:

ROLLBACK TO SAVEPOINT orders_import;

If the administrator decides that the customer information that was imported also has problems, the ROLLBACK statement can be issued to undo the entire transaction.

As of version 5.0.17 of MySQL, if a stored function or trigger is used, a new savepoint level is set up and the previous savepoints are suspended. When the stored function or trigger is finished, any savepoints it created are released and the original savepoint level resumes.

Name

SELECT

Synopsis

SELECT [flags] {*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition]

[GROUP BY {column|expression|position}[ASC|DESC], ...

[WITH ROLLUP]]

[HAVING condition]

[ORDER BY {column|expression|position}[ASC|DESC] , ...]

[LIMIT {[offset,] count|count OFFSET offset}]

[PROCEDURE procedure(arguments)]

options

Use this statement to retrieve and display data from tables within a database. It has many clauses and options, but for simple data retrieval many of them can be omitted. The basic syntax for the statement is shown. After the SELECT keyword, some keywords to control the whole operation may be given. Next comes an asterisk to retrieve all columns, a list of columns to retrieve, or expressions returning values to display, separated by commas.

Data can be retrieved from one or more tables, given in a comma-separated list. If multiple tables are specified, other clauses must define how the tables are joined. The remaining clauses may be called on to refine the data to be retrieved, to order it, and so forth. These various keywords, options, and clauses are detailed in subsections of this statement explanation. To start, here is a simple example of how you can use the SELECT statement:

SELECT name_first, name_last, telephone_home,

DATEDIFF(now( ), last_review)

AS 'Days Since Last Review'

FROM employees;

In this example, three columns and the results of an expression based on a fourth column are to be displayed. The first and last name of each employee, each employee’s home telephone number, and the difference between the date of the employee’s last employment review and the date now are listed. This last field has the addition of the AS keyword to set the column heading of the results set, and to name an alias for the field. An alias may be referenced in subsequent clauses of the same statement (e.g., the ORDER BY clause). To select all columns in the table, the wildcard * can be given instead of the column names.

SELECT statement keywords

SELECT

[ALL|DISTINCT|DISTINCTROW]

[HIGH_PRIORITY] [STRAIGHT_JOIN]

[SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]

[SQL_CACHE|SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]

{*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition] [other clauses] [options]

Between the initial SELECT keyword and list of columns and expressions, several keywords may be given. They are shown in the preceding syntax, with the other components of the statement abbreviated.

When a WHERE clause is used with the SELECT statement, rows in the results may contain duplicate data. If you want all rows that meet the selection conditions to be displayed, you may include the ALL keyword. This is the default, so it’s not necessary to give this keyword. If you want to display only the first occurrence of a row, include the DISTINCT keyword or its synonym DISTINCTROW. Here is an example:

SELECT DISTINCT dept

FROM employees;

This statement will list the names of all departments for which we have employees listed in the employees table. Even though there are several employees in the same department, it will list only one row for each department.

By default, any UPDATE statements that are issued have priority over SELECT statements submitted by other client sessions at the same time; the updates are run first. To give a particular SELECT statement higher priority than any UPDATE statements, use the HIGH_PRIORITY keyword.

Multiple tables may be selected with the SELECT statement. The column on which they should be joined is given with the WHERE clause or the JOIN clause. The JOIN clause is described earlier in this chapter. For the purposes of this section, you just need to know that in order to optimize retrieval, MySQL might not join tables in the order that they are listed in the SQL statement. To insist on joining in the order given, you must use the STRAIGHT_JOIN keyword.

When you know that the results of a SELECT statement using the DISTINCT keyword or the GROUP BY clause (discussed later) will be small, you can use the SQL_SMALL_RESULT keyword. This will cause MySQL to use temporary tables, with a key based on the GROUP BY clause elements, to sort the results and possibly make for faster data retrieval. If you expect the results to be large, you can use the SQL_BIG_RESULT keyword. This will cause MySQL to use temporary tables on the filesystem. Regardless of whether you use DISTINCT or GROUP BY, theSQL_BUFFER_RESULT keyword may be given for any SELECT statement to have MySQL use a temporary table to buffer the results. You can use only one of the SQL_*_RESULT keywords in each statement.

If the MySQL server is not using the query cache by default, you can force its use by including the SQL_CACHE keyword. If the server does use the query cache by default, you can use the SQL_NO_CACHE to instruct MySQL not to use the cache for this particular SELECT statement. To determine whether the server uses query cache by default, enter SHOW VARIABLES LIKE 'query_cache_type';. A value of ON indicates that it is in use.

The last keyword available is SQL_CALC_FOUND_ROWS, which counts the number of rows that meet the conditions of the statement. This is not affected by a LIMIT clause. The results of this count must be retrieved in a separate SELECT statement with the FOUND_ROWS() function. See the end of this chapter for information on this function:

SELECT SQL_CALC_FOUND_ROWS

name_first, name_last, telephone_home,

DATEDIFF(now( ), last_review)

AS 'Days Since Last Review'

FROM employees

WHERE dept = 'sales'

ORDER BY last_review DESC

LIMIT 10;

SELECT FOUND_ROWS();

The first statement retrieves a list of sales people to review, limited to the 10 who have gone the longest without a performance review. The second gets a count of how many employees there are to review in the sales department.

Exporting SELECT results

SELECT [flags] {*|columns|expression}[, ...]

[INTO OUTFILE '/path/filename'

[FIELDS TERMINATED BY 'character']

[FIELDS ENCLOSED BY 'character']

[ESCAPED BY 'character' ]

[LINES [STARTING BY 'character'] [TERMINATED BY 'character']]

|INTO DUMPFILE '/path/filename'

|INTO 'variable'[, ...]

[FOR UPDATE|LOCK IN SHARE MODE]]

FROM table[, ...]

[WHERE condition]

[other clauses] [options]

The INTO clause is used to export data from a SELECT statement to an external text file or a variable. Only the results will be exported, not the column names or other information.

Various clauses set delimiter and control characters in the output:

ESCAPED BY

Character used to escape special characters in the output. The default is a backslash.

FIELDS ENCLOSED BY

Character to use before and after each field. By default, no character is used.

FIELDS TERMINATED BY

Character with which to separate fields. The default is a tab.

LINES STARTING BY

Character used to start each line. By default, no character is used.

LINES TERMINATED BY

Character used to end each line. The default is a newline character.

FILE privilege is necessary to use the INTO clause of the SELECT statement. This statement and clause combination is essentially the counterpart of the LOAD DATA INFILE statement. See the explanation of that statement earlier in this chapter for more details on the options for this clause. Here is an example of this clause and these options:

SELECT * FROM employees

INTO OUTFILE '/tmp/employees.txt'

FIELDS TERMINATED BY '|'

LINES TERMINATED BY '\n'

ESCAPED BY '\\';

The text file created by this SQL statement will contain a separate line for each row selected. Each field will end with a vertical bar. Any special characters (e.g., an apostrophe) will be preceded by a backslash. Because a backslash is an escape character within an SQL statement, two backslashes are needed in the ESCAPE BY clause because the first escapes the second. To import the resulting data text file, use the FOUND_ROWS() statement.

The second syntax uses the clause INTO DUMPFILE and exports only one row into an external text file. It does not allow any field or line terminators like the INTO OUTFILE clause. Here is an example of its use:

SELECT photograph

INTO DUMPFILE '/tmp/bobs_picture.jpeg'

FROM employees

WHERE emp_id = '1827';

This statement exports the contents of the photograph column for an employee’s record. It’s a BLOB type column and contains an image file. The result of the exported file is a complete and usable image file.

You can also use the INTO clause to store a value in a user variable or a system variable for reuse. Here’s an example:

SET @sales = 0;

SELECT SUM(total_order) AS Sales

INTO @sales

FROM orders

WHERE YEAR(order_date) = YEAR(CURDATE());

This example creates the user variable @sales. Then we calculate the total sales for the current year and store it into that variable for reuse in subsequent statements in the session.

Grouping SELECT results

SELECT [flags] {*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition]

[GROUP BY {column|expression|position} [ASC|DESC], ...

[WITH ROLLUP]]

[other clauses] [options]

A SELECT statement sometimes produces more meaningful results if you group together rows containing the same value for a particular column. The GROUP BY clause specifies one or more columns by which MySQL is to group the data retrieved. This is used with aggregate functions so that the values of numeric columns for the rows grouped will be aggregated.

For instance, suppose that a SELECT statement is to list the sales representatives for a business and their orders for the month. Without a GROUP BY clause, one line would be displayed for each sales representative for each order. Here’s an example of how this might be resolved:

SELECT CONCAT(name_first, ' ', name_last) AS 'Sales Rep.',

SUM(total_order) AS 'Sales for Month'

FROM orders, employees

WHERE employees.emp_id = sales_rep

AND MONTH(order_date) = MONTH(CURDATE( ))

GROUP BY sales_rep;

This statement concatenates the first and last name of each sales representative who placed an order for a customer during the current month. The GROUP BY clause groups together the rows found for each sales representative. The SUM() function adds the values of the total_order column for each row within each group. See Chapter 10 for more information on the SUM() function and other aggregate functions.

You can specify multiple columns in the GROUP BY clause. Instead of stating a column’s name, you can state its position in the table, where a value of 1 represents the first column in the table. Expressions may be given as well.

The GROUP BY clause does its own sorting and cannot be used with the ORDER BY clause. To set the sorting to ascending order explicitly for a column, enter the ASC keyword after the column in the clause that is to be set. This is not necessary, though, since it is the default setting. To sort in descending order, add DESC after each column that is to be sorted in reverse.

When grouping rows by one column, it may be desirable not only to have a total of the values for certain columns, but also to display a total for all of the grouped rows at the end of the results set. To do this, use the WITH ROLLUP keyword. Here is an example:

SELECT location AS Branch,

CONCAT(name_first, ' ', name_last) AS 'Sales Rep.',

SUM(total_order) AS 'Sales for Month'

FROM orders, employees, branches

WHERE sales_rep = employees.emp_id

AND MONTH(order_date) = MONTH(CURDATE( ))

AND employees.branch_id = branches.branch_id

GROUP BY Branch, sales_rep WITH ROLLUP;

+---------------+-----------------+-----------------+

| Branch | Sales Rep. | Sales for Month |

+---------------+-----------------+-----------------+

| Boston | Ricky Adams | 2472 |

| Boston | Morgan Miller | 1600 |

| Boston | Morgan Miller | 4072 |

| New Orleans | Marie Dyer | 1750 |

| New Orleans | Tom Smith | 6407 |

| New Orleans | Simone Caporale | 5722 |

| New Orleans | Simone Caporale | 13879 |

| San Francisco | Geoffrey Dyer | 500 |

| San Francisco | Kenneth Dyer | 500 |

| San Francisco | Kenneth Dyer | 1000 |

| NULL | Kenneth Dyer | 18951 |

+---------------+-----------------+-----------------+

This statement groups and adds up the total for each sales representative. When there aren’t any more sales representatives for a branch, a row in the display for the subtotal is generated. It displays the branch name and the name of the last representative. When there are no more branches, a row for the grand total of sales is generated. The branch shows NULL. For clarity, I’ve boldfaced the subtotals and the grand total in the results set.

Having SELECT results

SELECT [flags] {*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition]

[GROUP BY condition]

[HAVING condition]

[other clauses] [options]

The HAVING clause is similar to the WHERE clause, but it is used for conditions returned by aggregate functions (e.g., AVG(), MIN(), and MAX()). For older versions of MySQL, you must use aliases for aggregate functions in the main clause of the SELECT statement. Here is an example of how you can use this clause:

SELECT CONCAT(name_first, ' ', name_last) AS 'Name', total_order

FROM orders

JOIN employees ON sales_rep = emp_id

JOIN branches USING (branch_id)

WHERE location = 'New Orleans'

GROUP BY sales_rep

HAVING MAX(total_order);

This SQL statement retrieves from the employees table a list of employee names for all employees located in the New Orleans branch office. From this list, the statement refines the results by grouping the data for each representative together and determines the sum of each one’stotal_order column. Because of the MAX() function, it displays data only for the row with the maximum number. The JOIN clause is described in its own section earlier in this chapter.

Ordering SELECT results

SELECT [flags] {*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition]

[ORDER BY {column|expression|position} [ASC|DESC], ...]

[other clauses] [options]

The results of a SELECT statement, by default, are displayed in the order in which the rows of data are found in the table, which may be the order in which they were entered into the table. To change the order of a results set, use the ORDER BY clause. As a basis for ordering the results, list one or more columns separated by commas. The order in which columns are listed is the order in which sorts will be conducted. You can also use aliases for columns, column combinations, or expressions that were established earlier in the same SELECT statement. Instead of stating a column’s name, you can also state its position, where a value of 1 represents the first column in the table. Here is an example of a SELECT statement using the ORDER BY clause:

SELECT CONCAT(name_first, ' ', name_last) AS Name,

MONTH(birth_date) AS 'Birth Month', email_address

FROM employees

ORDER BY 'Birth Month' ASC, Name ASC;

Here a list of employees, the months in which they were born, and their email addresses are extracted. For the name, the CONCAT() function is used to put the first and last name together, separated by a space. The AS clause establishes an alias of Name. The MONTH() function is used to extract the month from the birth_date column, and the AS clause sets up the alias Birth Month. In the ORDER BY clause, the alias for the birth date is used for the initial sort and the name is used for the secondary sort. The result will be that all of the employees who have a birth date in the same month will be listed together and in alphabetical order by name. Both aliases are followed by the ASC keyword to indicate that the results should be sorted in ascending order. This is unnecessary, as ascending order is the default. However, to change an ordering method to descending, use the DESC keyword.

You can also order the results using expressions, which may be based on columns or aliases. Here is an example of a SELECT statement using an expression for ordering:

SELECT CONCAT(name_first, ' ', name_last) AS name,

pay_rate, hours

FROM employees

ORDER BY pay_rate * hours DESC;

In this example, the first and last names are selected and concatenated together under the name column heading in the results set. The pay_rate column lists the hourly dollar rate an employee is paid, and the hours column contains the typical number of hours a week that an employee works. In the ORDER BY clause, the product of the hourly pay rate and the number of hours is determined for the ordering of the results set. The rows are to be listed in descending order per the DESC keyword based on the expression.

Limiting SELECT results

SELECT [flags] {*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition]

[other clauses]

[LIMIT {[offset,] count|count OFFSET offset}]

[PROCEDURE procedure(arguments)]

[FOR UPDATE|LOCK IN SHARE MODE]]

[other clauses] [options]

The LIMIT clause is used to limit the number of rows displayed by the SELECT statement. The most straightforward method of limiting the number of rows is to specify the maximum row count to be displayed, like this:

SELECT * FROM employees

LIMIT 5;

To begin listing rows after a specific number of records, an offset may be given. The offset for the first row is 0. Two formats accomplish this. One gives the amount of the offset, followed by a comma and then the maximum count of rows to display. The other syntax structure specifies the count, followed by the OFFSET keyword, followed by the amount of the offset. Here is an example of the first structure, which is preferred:

SELECT * FROM employees

LIMIT 10, 5;

In this example, after the 10th record is reached, the next 5 records will be displayed—in other words, results 11 through 15 are returned. The offset and count for the LIMIT clause are based on the rows in the results set, not necessarily on the rows in the tables. So the amount of the offset is related to the order of the rows retrieved from the tables based on clauses, such as the WHERE clause and the ORDER BY clause. See the description of the LIMIT clause earlier in this chapter for more details.

Other SELECT clauses and options

SELECT [flags] {*|column|expression}[, ...]

FROM table[, ...]

[WHERE condition]

[other clauses]

[PROCEDURE procedure(arguments)]

[LOCK IN SHARE MODE|FOR UPDATE]

To send the results of a SELECT statement as standard input to a procedure, use the PROCEDURE clause. The PROCEDURE keyword is followed by the name of the procedure, which can be followed by parentheses containing parameters to be passed to the procedure. Here is an example:

SELECT * FROM employees

PROCEDURE ANALYSE(10, 225);

In this statement, the results of the SELECT statement are sent to the built-in function ANALYSE() along with two numeric parameters. See ANALYSE() near the end of this chapter for more information on this function.

To lock the rows that are being selected from a table, LOCK IN SHARE MODE may be given at the end of the SELECT statement. This prevents other clients from changing the data while the SELECT statement is running. The FOR UPDATE option instructs MySQL to invoke a temporary write lock on the rows being selected. Both of these locks will be terminated when the statement is finished running.

Name

SET

Synopsis

SET [GLOBAL|@@global.|SESSION|@@session.] variable = expression

Use this statement to set a system or user variable for global or session use. System variables can be either global variables, which makes them visible to all users, or session variables (also called local variables), which are available only to the connection thread that creates the variable. To make a system variable global, use the GLOBAL keyword or precede the variable name by @@global. System variables are limited to the current session by default, but you can document that behavior by using the SESSION keyword or preceding the variable name with @@session or just @@(or use the synonyms LOCAL and @@local). To mark a user variable, place a single @ in front of the variable name. Here is an example of creating a user variable:

SET @current_quarter = QUARTER(CURDATE( ));

This statement uses the CURDATE() function to determine the current date. It’s wrapped in the QUARTER() function, which determines the quarter for the date given. The result is a number from one to four depending on the date. The number is stored in the user variable,@current_quarter. For examples involving system variables, see the explanation of the SET statement in Chapter 7.

Here’s a more complete example of how this statement and a user variable may be used:

SET @row = 0;

SELECT @row := @row + 1 AS Row,

client_name AS Client

FROM clients

ORDER BY client_id LIMIT 3;

+------+--------------------+

| Row | Client |

+------+--------------------+

| 1 | Geoffrey & Company |

| 2 | Kenneth & Partners |

| 3 | Marie & Associates |

+------+--------------------+

In this example, the user variable @row is set to 0 and then used in a SELECT statement with the := operator to increment the value by 1 with each row retrieved. This gives us a nice row numbering in the results.

Name

SET TRANSACTION

Synopsis

SET [GLOBAL|SESSION] TRANSACTION ISOLATION LEVEL

{READ UNCOMMITTED|READ COMMITTED|REPEATABLE READ|SERIALIZABLE}

Use this statement to set an isolation level for the current transaction, for a transaction that’s about to be started, or globally. Use the keyword SESSION to set the level for the current session. Use GLOBAL to set it for all subsequent transactions (this does not affect existing ones). If neither of these two keywords is included, the level is set for the next transaction of the current session. This statement applies only to InnoDB tables at this time.

The level READ UNCOMMITTED is known as a dirty read because SELECT statements are executed in a nonlocking manner. Thus, queries by one transaction can be affected by ongoing, uncommitted updates in another transaction, or old data may be used, thus making the results inconsistent.READ COMMITTED is a more consistent read, similar to Oracle’s isolation level. However, changes that are committed in one transaction will be visible to another. The result is that the same query in the same transaction could return different results.

REPEATABLE READ is the default. It makes all reads consistent for a transaction.

In the safest level, SERIALIZABLE, changes are not allowed in other transactions if a transaction has executed a simple SELECT statement. Basically, queries are performed with LOCK IN SHARE MODE.

Here is an example of how you can use this statement:

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

START TRANSACTION;

...

Name

SHOW ERRORS

Synopsis

SHOW ERRORS [LIMIT [offset,] count]

SHOW COUNT(*) ERRORS

Use this statement to display error messages. The results are only for the previous statement that has been executed. To see the number of error messages generated by an SQL statement, use COUNT(*). To limit the number of error messages displayed, use the LIMIT clause. An offset can be given along with the count to specify a starting point for displaying error messages.

This statement is available as of version 4.1 of MySQL. It will not display warnings or notes—just error messages. Use SHOW WARNINGS to get all three types of messages.

Here are a couple of examples of this statement, which were entered after an INSERT statement was entered and encountered a problem:

SHOW COUNT(*) ERRORS;

+-----------------------+

| @@session.error_count |

+-----------------------+

| 1 |

+-----------------------+

SHOW ERRORS;

+-------+------+-------------------------------------------------+

| Level | Code | Message |

+-------+------+-------------------------------------------------+

| Error | 1136 | Column count doesn't match value count at row 2 |

+-------+------+-------------------------------------------------+

The first statement returns the number of error messages generated by the INSERT statement. Notice that the results are stored in the session variable error_count, which is updated by each statement issued in the session. The second statement displays the error messages. This statement is perhaps more meaningful when used with an API program in which you would like to capture the error messages for a specific purpose or analysis.

Name

SHOW WARNINGS

Synopsis

SHOW WARNINGS [LIMIT [offset,] count]

SHOW COUNT(*) WARNINGS

Use this statement to display warning messages, error messages, and notes for previous SQL statements for the current session. This statement is available as of version 4.1 of MySQL. To find out the number of such messages generated by the previous statement in the session, useCOUNT(*). Use the LIMIT clause to limit the number of messages displayed. An offset can be given along with the limit to specify a starting point for displaying messages. Here are a couple of examples of how you can use this statement:

INSERT INTO clients (client_name, telephone)

VALUES('Marie & Associates', '504-486-1234');

Query OK, 1 row affected, 1 warning (0.00 sec)

SHOW COUNT(*) WARNINGS;

+-------------------------+

| @@session.warning_count |

+-------------------------+

| 1 |

+-------------------------+

SHOW WARNINGS;

+---------+------+--------------------------------------------------+

| Level | Code | Message |

+---------+------+--------------------------------------------------+

| Warning | 1265 | Data truncated for column 'client_name' at row 1 |

+---------+------+--------------------------------------------------+

In this example, we enter the name of a client and her telephone number in the table clients, but in the results we see that one warning is issued. The second statement returns the number of messages; of course, the last line of the results from the INSERT already told us this. Notice that the results are stored in the session variable warning_count. The third SQL statement displays the warning message. These statements are perhaps more meaningful when used with an API program in which you would like to capture the number of errors generated or the error messages for a specific purpose or analysis.

Name

START TRANSACTION

Synopsis

START TRANSACTION [WITH CONSISTENT SNAPSHOT]

Use this statement to start a transaction. Transaction statements are currently supported by the InnoDB, NDB Cluster, and BDB storage engines and are ignored if used with MyISAM tables. The purpose of a transaction is to be able to undo SQL statements if need be. You can reverse a transaction if you have not yet committed it with a COMMIT statement, implicitly by starting another transaction, or by terminating the connection. In earlier versions of MySQL, BEGIN or BEGIN WORK were used instead of START TRANSACTION. See the explanations of the COMMIT andROLLBACK statements earlier in this chapter for more information on transactions. The SAVEPOINT statement and the ROLLBACK TO SAVEPOINT statement may also be useful.

Here is an example of this statement’s use in context:

START TRANSACTION;

INSERT DATA INFILE '/tmp/customer_orders.sql'

INTO TABLE orders;

COMMIT;

In this example, after the batch of orders is inserted into the orders table, the user decides everything went properly and issues the COMMIT statement to actually enter the data in the database and to end the transaction started with the START TRANSACTION statement. If there had been a problem, the ROLLBACK statement could be issued instead of COMMIT. ROLLBACK would remove the data imported by the INSERT DATA INFILE statement.

The WITH CONSISTENT SNAPSHOT clause initiates a consistent read. It does not change the current transaction isolation level. Therefore, it provides consistent data only if the current isolation level allows consistent reading (i.e., REPEATABLE READ or SERIALIZABLE). At this time, it only works with InnoDB tables. See the SET TRANSACTION statement earlier in this chapter for more information on isolation levels.

Name

TRUNCATE

Synopsis

TRUNCATE [TABLE] table

Use this statement to delete the contents of a table rapidly. It’s similar to the DELETE statement in that it will delete all of the data contained in a given table. The TRUNCATE statement does its job by dropping the table and then recreating it without data. As a result, it does not report the number of rows deleted. Another drawback is that the value for an AUTO_INCREMENT column will be lost along with the data. The statement does preserve file partitions and partition parameters if the table was originally partitioned.

This statement is not transaction-safe. As of version 5.1.16 of MySQL, DROP privileges are required for this statement. Previously, DELETE privileges were required.

Name

UNION

Synopsis

SELECT... UNION [ALL|DISTINCT] SELECT...[, UNION...]

The UNION keyword unites the results of multiple SELECT statements into one results set. The SELECT statements can retrieve data from the same table or from different tables. If different tables are used, the results set generated by each SQL statement should match in column count and the order of column types. The column names do not need to be the same, but the data sent to the respective fields in the results set needs to match.

Don’t confuse this statement with the JOIN clause or a subquery, which are used to merge columns of data from multiple tables into rows in the results of a SELECT statement. In contrast, the UNION clause is used to merge together the results tables of separate and distinct SELECT statements into one results table.

Here is an example of a UNION used to merge the results of two SELECT statements:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,

telephone_work AS Telephone

FROM employees

UNION

SELECT location, telephone FROM branches

ORDER BY Name;

This statement presents a list of employees and branch office locations in one column, with the telephone number for each in the second. The column headings used for the results set will be the ones used for the first SELECT statement. Because of the ORDER BY clause, the results will be sorted by the values for the alias Name. Otherwise, the names of employees would be listed before the names of offices. The example shown merges the results of only two SELECT statements. You can merge several SELECT statements, entering the UNION keyword before each additionalSELECT statement.

If the results set is to be sorted based on a column, the table name must not be specified in the ORDER BY clause (i.e., table.column is not accepted). To resolve ambiguity, use an alias for the columns to order by. If an alias has been given for a column that is to be part of the ORDER BYclause, that alias must be used instead of the column name. The use of column position has been deprecated.

The keyword DISTINCT indicates that any duplicated rows (rows where all of the data of all columns is the same as a previous row) are not included in the results. This is the default, so it’s not necessary to include the keyword DISTINCT. Including the keyword ALL, though, will instruct MySQL to include all rows, including duplicates.

To limit the results of a union, add the LIMIT clause to the end of the SQL statement:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,

telephone_work AS Telephone

FROM employees

UNION

SELECT location, telephone FROM branches

ORDER BY Name

LIMIT 10;

To limit the results of one table in a union and not the final results set, put parentheses around the individual SELECT statements and add the LIMIT clause to the end of the SELECT statement or statements that you want to limit:

( SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,

telephone_work AS Telephone FROM employees LIMIT 10 )

UNION

( SELECT location, telephone FROM branches )

ORDER BY Name;

This statement limits the results to only 10 employees, but allows all of the branches to be displayed. You can put limits on each SELECT statement if you want, and limit the final results by adding the LIMIT clause to the end of the full SQL statement.

Name

UPDATE

Synopsis

UPDATE [LOW_PRIORITY] [IGNORE] table

SET column=expression[, ...]

[WHERE condition]

[ORDER BY {column|expression|position} [ASC|DESC], ...]

[LIMIT {[offset,] count|count OFFSET offset}]

UPDATE [LOW_PRIORITY] [IGNORE] table_reference

SET column=expression[, ...]

[WHERE condition]

This statement changes existing rows of data in a table. The first syntax shown updates only one table per statement. The second syntax can be used to update or reference data in multiple tables from one statement. Explanations of both types of statements and examples of their use follow.

Single table UPDATE

UPDATE [LOW_PRIORITY] [IGNORE] table

SET column=expression[, ...]

[WHERE condition]

[ORDER BY {column|expression|position} [ASC|DESC], ...]

[LIMIT {[offset,] count|count OFFSET offset}]

This syntax changes a single table. The SET clause specifies each column that should change and the value to which it is to be set, separated by an equals sign. The value can be a static value or an expression. If a column in a table is defined as NOT NULL, and if an UPDATE statement then sets its value to NULL, the default value for the column will be used if it is available; otherwise, an error is generated.

The LOW_PRIORITY keyword may be used to instruct the server to wait until all other queries related to the table in which data is to be added are completed before running the UPDATE statement. When the table is free, it will be locked for the UPDATE statement and thereby prevent concurrent data updates or inserts.

Normally, if one of the updates would create a duplicate row (a row that shares the same value as an existing row in a column declared to be unique), the statement reports an error. The statement is then terminated and no more rows are updated. If the table is InnoDB, BDB, or NDB, the entire transaction is reversed or rolled back; if not, the rows that were updated before the error will remain updated. However, if the IGNORE keyword is used, the server ignores any errors encountered, suppresses error messages, and continues updating nonduplicate rows.

The results of such a statement will display like this:

Query OK, 120 rows affected (4.20 sec)

Records: 125 Duplicates: 5 Warnings: 0

Notice that only 120 rows were updated, although 125 would have been updated if there had been no duplication problem.

Here is an example of the UPDATE statement using this syntax:

UPDATE clients

SET client_name = 'Geoffrey & Company',

city = 'Boston', state = 'MA'

WHERE client_name LIKE 'Geoffrey%';

This example sets the values of two columns for any rows (probably only one in this case) that meet the condition of the WHERE clause using the LIKE operator. Only these two columns will be updated in the matching rows. If there are several rows with the column client_name containing a starting value of Geoffrey, all of them will be changed.

The number of rows that are updated can be limited by using the LIMIT clause. As of version 4.0.13 of MySQL, the LIMIT clause is based on the number of rows matched, not necessarily the number changed. Starting with version 4.0.0 of MySQL, you can also choose to UPDATE only the first few rows found in a certain order by using the ORDER BY clause. See the SELECT statement earlier in this chapter for details about the ORDER BY and the LIMIT clauses. Here is an example of an UPDATE statement using both of these clauses:

UPDATE clients

SET client_terms = client_terms + 15

WHERE client_city = 'Boston'

AND YEAR(date_opened) < 2005

ORDER BY date_opened

LIMIT 50;

This example indicates that we’ve decided to somewhat arbitrarily upgrade the client terms (i.e., allow 15 additional days to pay their invoices) for any clients located in Boston who opened an account before the year 2005, but only for the first 50 clients based on the date order in which their account was opened. Notice that the value of the column client_terms is set with an expression that refers to the value of the column before the UPDATE statement is executed. Expressions are calculated from left to right, so the results of one expression could affect the results of those that follow within the same statement.

Multiple table UPDATE

UPDATE [LOW_PRIORITY] [IGNORE] table_reference

SET column=expression[, ...]

[WHERE condition]

This syntax of the UPDATE statement, available as of version 4.0.4 of MySQL, allows for multiple tables to be updated or referenced in one SQL statement. A SET clause specifies each column that should change and the value to which it is to be set, separated by an equals sign. The value can be a static value or an expression. The keywords LOW_PRIORITY and IGNORE are handled the same way as in the first syntax for the UPDATE statement. The ORDER BY and the LIMIT clauses are not available with the multiple-table syntax.

The columns by which tables are joined may be given in the WHERE clause (e.g., WHERE clients.branch_id=branches.client_id), or with the JOIN clause.

Here is an example using the JOIN clause:

UPDATE clients JOIN branches USING (branch_id)

SET client_terms = client_terms + 60

WHERE location = 'New Orleans';

In this example, only one table is being changed, but two are joined to determine which clients belong to the New Orleans branch in order to be able to give them 60 additional days to pay their bills due to a recent hurricane. See the JOIN clause earlier in this chapter for details on joining tables.

Name

USE

Synopsis

USE database

This statement sets the default database that MySQL is to use for the current session. This allows the name of the default database to be omitted from statements. For instance, db1.table1 can be written as just table1, and db1 is assumed.

USE company_database;

The semicolon may be omitted from the statement since it’s mysql client-related. You can specify a default database at startup with the --database or --D option.

Name

XA

Synopsis

XA {START|BEGIN} 'identifier' [JOIN|RESUME]

XA PREPARE 'identifier'

XA COMMIT 'identifier' [ONE PHASE]

XA ROLLBACK 'identifier'

XA RECOVER

XA END 'identifier' [SUSPEND [FOR MIGRATE]]

This statement is used for XA distributed transactions. These are transactions in which multiple, separate transactional resources may be involved in a global transaction. In MySQL, this is currently available only with InnoDB tables.

The XA START statement starts an XA transaction, assigning an identifier to be used in subsequent statements, and puts the transaction into an ACTIVE state. Implicit commits cannot be made while the transaction is in ACTIVE state. This statement is synonymous with XA BEGIN. The JOINand RESUME keywords are not supported.

Once you’ve entered all of the SQL statements for a particular session, mark the transaction as PREPARED by executing an XA PREPARE. XA RECOVER lists all transactions in a prepared state. Use the XA COMMIT ONE PHASE statement to mark the XA transaction just given as prepared and committed. XA COMMIT without the ONE PHASE keyword will commit and end the entire transaction. Use XA ROLLBACK to undo the specified XA transaction and terminate it. XA END ends the specified transaction and puts it into an IDLE state.

Functions in Alphabetical Order

This section describes special functions that are closely related to the data manipulation SQL statements in this chapter. Functions for the formatting and retrieval of column data are covered in other chapters.

Name

ANALYSE()

Synopsis

ANALYSE([maximum_elements[, maximum_memory]])

This function returns an analysis of a results table from a SELECT statement. Use this function only as part of a PROCEDURE clause. The first parameter is the maximum number of unique values that may be analyzed for each column; the default is 256. The second parameter is the maximum memory that should be allocated for each column during analysis; the default is 8,192 bytes (8 MB). Here is an example:

SELECT col1

FROM table1

PROCEDURE ANALYSE( ) \G

*************************** 1. row ***************************

Field_name: table1.col1

Min_value: 1

Max_value: 82

Min_length: 1

Max_length: 2

Empties_or_zeros: 0

Nulls: 0

Avg_value_or_avg_length: 42.8841

Std: 24.7600

Optimal_fieldtype: TINYINT(2) UNSIGNED NOT NULL

Name

BENCHMARK()

Synopsis

BENCHMARK(number, expression)

Use this function to evaluate the performance of a MySQL server. The expression given as the second argument of the function is repeated the number of times given in the first argument. The results are always 0. It’s the processing time reported that is meaningful. This function is meant to be used from within the mysql client. Here is an example:

SELECT BENCHMARK(1000000,PI( ));

+--------------------------+

| BENCHMARK(1000000,PI( )) |

+--------------------------+

| 0 |

+--------------------------+

1 row in set (0.04 sec)

Name

DATABASE()

Synopsis

DATABASE()

This function returns the name of the database currently in use for the session. There are no arguments. If no database has been set to default yet, it returns NULL; prior to version 4.1.1 of MySQL, it returns an empty string. Here is an example:

SELECT DATABASE( );

+--------------------+

| DATABASE( ) |

+--------------------+

| company_database |

+--------------------+

As of version 5.0.2 of MySQL, SCHEMA() has been introduced as a synonym for DATABASE().

Name

FOUND_ROWS()

Synopsis

FOUND_ROWS()

Use this function in conjunction with the SQL_CALC_FOUND_ROWS option of a SELECT statement to determine the number of rows an SQL statement using a LIMIT clause would have generated without the limitation. There are no arguments for the function. It’s available as of version 4 of MySQL. Here is an example:

SELECT SQL_CALC_FOUND_ROWS

name_first, name_last, telephone_home,

DATEDIFF(now( ), last_review)

AS 'Days Since Last Review'

FROM employees

WHERE dept = 'sales'

ORDER BY last_review DESC

LIMIT 10;

SELECT FOUND_ROWS();

In the first statement, we retrieve a list of sales people to review, limited to the 10 who have gone the longest without a performance review. In the second SQL statement, we’re getting a total count of how many employees there are to review in the sales department.

Name

LAST_INSERT_ID()

Synopsis

LAST_INSERT_ID([expression])

This function returns the identification number of the last row inserted using the MySQL connection. The identification number for rows inserted by other clients will not be returned. Identification numbers that are set manually when rows are inserted, without the aid of AUTO_INCREMENT, won’t register and therefore won’t be returned by LAST_INSERT_ID(). If multiple rows are inserted by one SQL statement, LAST_INSERT_ID() returns the identification number for the first row inserted.

Here is an example:

SELECT LAST_INSERT_ID( );

+-------------------+

| LAST_INSERT_ID( ) |

+-------------------+

| 1039 |

+-------------------+

As of version 5.1.12 of MySQL, an expression may be given to adjust the results. For instance, if you insert multiple rows of data, the result would be the value of the first row inserted, not the last. By giving an expression to include adding the number of rows, the results will be for the last row.

Name

ROW_COUNT()

Synopsis

ROW_COUNT()

This function returns the number of rows changed by the previous SQL statement executed. If the previous statement was not one that could potentially change data rows—in other words, it wasn’t an INSERT, UPDATE, DELETE, or other such statement—this function will return –1. Here is an example:

SELECT ROW_COUNT();

+-------------+

| ROW_COUNT() |

+-------------+

| 4 |

+-------------+

The results here show that four rows were changed.

Name

SCHEMA()

Synopsis

SCHEMA()

This function returns the name of the database currently in use for the session. There are no arguments. If no database has been set as the default, it returns NULL. Here is an example:

SELECT SCHEMA( );

+--------------------+

| DATABASE( ) |

+--------------------+

| company_database |

+--------------------+

Introduced in version 5.0.2 of MySQL, SCHEMA() is a synonym for DATABASE().