Selecting Data from Tables - MySQL Cookbook (2007)

MySQL Cookbook (2007)

Chapter 3. Selecting Data from Tables

Introduction

This chapter focuses on the SELECT statement, which retrieves database information. It shows how to use SELECT to tell MySQL what you want to see. You should find the chapter helpful if your SQL background is limited or if you want to find out about the MySQL-specific extensions toSELECT syntax.

There are so many ways to write SELECT statements that we’ll look at only a few of them. Consult the MySQL Reference Manual or a general MySQL text for more information about SELECT syntax and the functions and operators that you can use to extract and manipulate data.

SELECT gives you control over several aspects of row retrieval:

§ Which table to use

§ Which columns and rows to retrieve from the table

§ How to name the output columns

§ How to sort the rows

Many useful queries are quite simple and don’t specify all those things. For example, some forms of SELECT don’t even name a table—a fact used earlier in Using mysql as a Calculator, which discusses how to use mysql as a calculator. Other nontable-based queries are useful for purposes such as determining what version of the server you’re running or the name of the default database:

mysql>SELECT VERSION(), DATABASE();

+------------+------------+

| VERSION() | DATABASE() |

+------------+------------+

| 5.0.27-log | cookbook |

+------------+------------+

To answer more involved questions, normally you’ll need to pull information from one or more tables. Many of the examples in this chapter use a table named mail, which contains rows that track mail message traffic between users on a set of hosts. The mail table definition looks like this:

CREATE TABLE mail

(

t DATETIME, # when message was sent

srcuser CHAR(8), # sender (source user and host)

srchost CHAR(20),

dstuser CHAR(8), # recipient (destination user and host)

dsthost CHAR(20),

size BIGINT, # message size in bytes

INDEX (t)

);

And its contents look like this:

+---------------------+---------+---------+---------+---------+---------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+---------+

| 2006-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |

| 2006-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |

| 2006-05-12 15:02:49 | phil | mars | phil | saturn | 1048 |

| 2006-05-13 13:59:18 | barb | saturn | tricia | venus | 271 |

| 2006-05-14 09:31:37 | gene | venus | barb | mars | 2291 |

| 2006-05-14 11:52:17 | phil | mars | tricia | saturn | 5781 |

| 2006-05-14 14:42:21 | barb | venus | barb | venus | 98151 |

| 2006-05-14 17:03:01 | tricia | saturn | phil | venus | 2394482 |

| 2006-05-15 07:17:48 | gene | mars | gene | saturn | 3824 |

| 2006-05-15 08:50:57 | phil | venus | phil | venus | 978 |

| 2006-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |

| 2006-05-15 17:35:31 | gene | saturn | gene | mars | 3856 |

| 2006-05-16 09:00:28 | gene | venus | barb | mars | 613 |

| 2006-05-16 23:04:19 | phil | venus | barb | venus | 10294 |

| 2006-05-17 12:49:23 | phil | mars | tricia | saturn | 873 |

| 2006-05-19 22:21:51 | gene | saturn | gene | venus | 23992 |

+---------------------+---------+---------+---------+---------+---------+

To create and load the mail table, change location into the tables directory of the recipes distribution, and run this command:

%mysql cookbook < mail.sql

This chapter also uses other tables from time to time. Some of these were used in previous chapters, while others are new. For any table that you need to create, do so the same way as for the mail table, using the appropriate script in the tables directory. In addition, the text for many of the scripts and programs used in this chapter can be found in the select directory. The files in that directory enable you to try the examples more easily.

You can execute many of the statements shown here by running them from within the mysql program, which is discussed in Chapter 1. A few of the examples involve issuing statements from within the context of a programming language. See Chapter 2 for background on programming techniques.

Specifying Which Columns to Select

Problem

You want to display some or all of the columns from a table.

Solution

Use SELECT * as a shortcut that selects all columns. However, with SELECT *, you always get all columns, and you can’t assume anything about the order in which they’ll appear. To retrieve only some of the columns, or require that they appear in a certain order, either name the columns explicitly in the desired display order or retrieve them into a data structure that makes their order irrelevant.

Discussion

To indicate what kind of information you want to select from a table, name a column or a list of columns and the table to use. The easiest way to display columns from a table is to use SELECT * FROM tbl_name. The * specifier is a shortcut for naming all the columns in a table:

mysql>SELECT * FROM mail;

+---------------------+---------+---------+---------+---------+---------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+---------+

| 2006-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |

| 2006-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |

| 2006-05-12 15:02:49 | phil | mars | phil | saturn | 1048 |

| 2006-05-13 13:59:18 | barb | saturn | tricia | venus | 271 |

...

Using * is easy, but you cannot specify the column display order. An advantage of naming columns explicitly is that you can display them in whatever order you want. Suppose that you want hostnames to appear before usernames, rather than after. To accomplish this, name the columns as follows:

mysql>SELECT t, srchost, srcuser, dsthost, dstuser, size FROM mail;

+---------------------+---------+---------+---------+---------+---------+

| t | srchost | srcuser | dsthost | dstuser | size |

+---------------------+---------+---------+---------+---------+---------+

| 2006-05-11 10:15:08 | saturn | barb | mars | tricia | 58274 |

| 2006-05-12 12:48:13 | mars | tricia | venus | gene | 194925 |

| 2006-05-12 15:02:49 | mars | phil | saturn | phil | 1048 |

| 2006-05-13 13:59:18 | saturn | barb | venus | tricia | 271 |

...

Another advantage of naming the columns compared to using * is that you can name just those columns you want to see and omit those in which you have no interest:

mysql>SELECT t, srcuser, srchost, size FROM mail;

+---------------------+---------+---------+---------+

| t | srcuser | srchost | size |

+---------------------+---------+---------+---------+

| 2006-05-11 10:15:08 | barb | saturn | 58274 |

| 2006-05-12 12:48:13 | tricia | mars | 194925 |

| 2006-05-12 15:02:49 | phil | mars | 1048 |

| 2006-05-13 13:59:18 | barb | saturn | 271 |

...

The preceding examples use the mysql program to illustrate the differences between using * versus a list of names to specify output columns when issuing SELECT statements. These differences also can be significant when issuing statements from within programs that you write yourself, depending on how you fetch result set rows. If you select output columns using *, the server returns them using the order in which they are listed in the table definition—an order that may change if you change the definition with ALTER TABLE. If you fetch rows into an array that is indexed by column number, this indeterminate output column order makes it impossible to know which column each array element corresponds to. By naming output columns explicitly, you can fetch rows into an array with confidence that the columns will appear in the array in the same order that you named them in the statement.

Alternatively, your API may allow you to fetch rows into a structure containing elements that are accessed by name. For example, in Perl or Ruby, you can use a hash; in PHP, you can use an associative array or an object. If you use this approach, you can issue a SELECT * query and then use column names to access structure members. In this case, there is effectively no difference between selecting columns with * or by naming them explicitly: being able to access values by name within your program makes their order within result set rows irrelevant. This fact makes it tempting to take the easy way out by using SELECT * for all your queries. Nevertheless, even if you’re not actually going to use every column, it’s more efficient to name specifically only the columns you want so that the server doesn’t send you information that you’re just going to ignore. (An example that explains in more detail why you might want to avoid retrieving certain columns is given in Using Table Structure Information in Applications, in the section “Selecting All Except Certain Columns.”)

Specifying Which Rows to Select

Problem

You want to see only those rows that match certain criteria.

Solution

To specify which rows to return, add a WHERE clause to identify the rows that you want to see, such as customers that live in a particular city or tasks that have a status of “finished.”

Discussion

Unless you qualify or restrict a SELECT query in some way, it retrieves every row in your table, which in many cases is a lot more information than you really want to see. To be more precise about which rows to select, provide a WHERE clause that specifies one or more conditions that rows must match.

Conditions can perform tests for equality, inequality, or relative ordering. For some types of data, such as strings, you can use pattern matches. The following statements select columns from rows from the mail table containing srchost values that are exactly equal to the string 'venus' or that begin with the letter 's':

mysql>SELECT t, srcuser, srchost FROM mail WHERE srchost = 'venus';

+---------------------+---------+---------+

| t | srcuser | srchost |

+---------------------+---------+---------+

| 2006-05-14 09:31:37 | gene | venus |

| 2006-05-14 14:42:21 | barb | venus |

| 2006-05-15 08:50:57 | phil | venus |

| 2006-05-16 09:00:28 | gene | venus |

| 2006-05-16 23:04:19 | phil | venus |

+---------------------+---------+---------+

mysql> SELECT t, srcuser, srchost FROM mail WHERE srchost LIKE 's%';

+---------------------+---------+---------+

| t | srcuser | srchost |

+---------------------+---------+---------+

| 2006-05-11 10:15:08 | barb | saturn |

| 2006-05-13 13:59:18 | barb | saturn |

| 2006-05-14 17:03:01 | tricia | saturn |

| 2006-05-15 17:35:31 | gene | saturn |

| 2006-05-19 22:21:51 | gene | saturn |

+---------------------+---------+---------+

The LIKE operator in the previous query performs a pattern match, where % acts as a wildcard that matches any string. Pattern Matching with SQL Patterns discusses pattern matching further.

A WHERE clause can test multiple conditions and different conditions can test different columns. The following statement finds messages sent by barb to tricia:

mysql>SELECT * FROM mail WHERE srcuser = 'barb' AND dstuser = 'tricia';

+---------------------+---------+---------+---------+---------+-------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+-------+

| 2006-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |

| 2006-05-13 13:59:18 | barb | saturn | tricia | venus | 271 |

+---------------------+---------+---------+---------+---------+-------+

Giving Better Names to Query Result Columns

Problem

You don’t like the names of the columns in a query result.

Solution

Use column aliases to supply names of your own choosing.

Discussion

When you retrieve a result set, MySQL gives every output column a name. (That’s how the mysql program gets the names that you see displayed as the initial row of column headers in result set output.) By default, MySQL assigns the column names specified in the CREATE TABLE or ALTERTABLE statement to output columns, but if these defaults are not suitable, you can use column aliases to specify your own names.

This section explains aliases and shows how to use them to assign column names in statements. If you’re writing a program that needs to retrieve information about column names (that is, column metadata), see Obtaining Result Set Metadata.

If an output column in a result set comes directly from a table, MySQL uses the table column name for the output column name. For example, the following statement selects three table columns, the names of which become the corresponding output column names:

mysql>SELECT t, srcuser, size FROM mail;

+---------------------+---------+---------+

| t | srcuser | size |

+---------------------+---------+---------+

| 2006-05-11 10:15:08 | barb | 58274 |

| 2006-05-12 12:48:13 | tricia | 194925 |

| 2006-05-12 15:02:49 | phil | 1048 |

| 2006-05-13 13:59:18 | barb | 271 |

...

If you generate a column by evaluating an expression, the expression itself is the column name. This can produce rather long and unwieldy names in result sets, as illustrated by the following statement that uses an expression to reformat the t column of the mail table:

mysql>SELECT

-> CONCAT(MONTHNAME(t),' ',DAYOFMONTH(t),', ',YEAR(t)),

-> srcuser, size FROM mail;

+-----------------------------------------------------+---------+---------+

| CONCAT(MONTHNAME(t),' ',DAYOFMONTH(t),', ',YEAR(t)) | srcuser | size |

+-----------------------------------------------------+---------+---------+

| May 11, 2006 | barb | 58274 |

| May 12, 2006 | tricia | 194925 |

| May 12, 2006 | phil | 1048 |

| May 13, 2006 | barb | 271 |

...

The query in the preceding example is specifically contrived to illustrate how awful-looking column names can be. The reason it’s contrived is that you probably wouldn’t really write the query that way; the same result can be produced more easily using the DATE_FORMAT() function. But even if you use DATE_FORMAT(), the column header is still ugly:

mysql>SELECT

-> DATE_FORMAT(t,'%M %e, %Y'),

-> srcuser, size FROM mail;

+----------------------------+---------+---------+

| DATE_FORMAT(t,'%M %e, %Y') | srcuser | size |

+----------------------------+---------+---------+

| May 11, 2006 | barb | 58274 |

| May 12, 2006 | tricia | 194925 |

| May 12, 2006 | phil | 1048 |

| May 13, 2006 | barb | 271 |

...

To give an output column a name of your own choosing, use an AS name clause to specify a column alias (the keyword AS is optional). The following statement retrieves the same result as the previous one, but renames the first column to date_sent:

mysql>SELECT

-> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,

-> srcuser, size FROM mail;

+--------------+---------+---------+

| date_sent | srcuser | size |

+--------------+---------+---------+

| May 11, 2006 | barb | 58274 |

| May 12, 2006 | tricia | 194925 |

| May 12, 2006 | phil | 1048 |

| May 13, 2006 | barb | 271 |

...

The alias makes the column name more concise, easier to read, and more meaningful. If you want to use a descriptive phrase, an alias can consist of several words. Aliases can be fairly arbitrary, although they are subject to a few restrictions; for example, they must be quoted if they are SQL keywords, contain spaces or other special characters, or are entirely numeric. The following statement retrieves the same data values as the preceding one but uses phrases to name the output columns:

mysql>SELECT

-> DATE_FORMAT(t,'%M %e, %Y') AS 'Date of message',

-> srcuser AS 'Message sender', size AS 'Number of bytes' FROM mail;

+-----------------+----------------+-----------------+

| Date of message | Message sender | Number of bytes |

+-----------------+----------------+-----------------+

| May 11, 2006 | barb | 58274 |

| May 12, 2006 | tricia | 194925 |

| May 12, 2006 | phil | 1048 |

| May 13, 2006 | barb | 271 |

...

You can apply an alias to any output column, not just those that come from tables:

mysql>SELECT '1+1+1' AS 'The expression', 1+1+1 AS 'The result';

+----------------+------------+

| The expression | The result |

+----------------+------------+

| 1+1+1 | 3 |

+----------------+------------+

Here, the value of the first column is '1+1+1' (quoted so that it is treated as a string), and the value of the second column is 1+1+1 (without quotes so that MySQL treats it as an expression and evaluates it). The aliases are descriptive phrases that help to clarify the relationship between the two column values.

If you use a single-word alias, and MySQL complains about it, the word probably is reserved. Quoting the alias should make it legal:

mysql>SELECT 1 AS INTEGER;

You have an error in your SQL syntax near 'INTEGER' at line 1

mysql> SELECT 1 AS 'INTEGER';

+---------+

| INTEGER |

+---------+

| 1 |

+---------+

Using Column Aliases to Make Programs Easier to Write

Problem

You’re trying to refer to a column by name from within a program, but the column is calculated from an expression. Consequently, its name is difficult to use.

Solution

Use an alias to give the column a simpler name.

Discussion

Giving Better Names to Query Result Columns shows how column aliases make query results more meaningful when you’re issuing queries interactively. Aliases also are useful for programming purposes. If you’re writing a program that fetches rows into an array and accesses them by numeric column indexes, the presence or absence of column aliases makes no difference because aliases don’t change the positions of columns within the result set. However, aliases make a big difference if you’re accessing output columns by name because aliases change those names. You can exploit this fact to give your program easier names to work with. For example, if your query displays reformatted message time values from the mail table using the expression DATE_FORMAT(t,'%M %e, %Y'), that expression is also the name you’d have to use when referring to the output column. That’s not very convenient. If you use ASdate_sent to give the column an alias, you can refer to it a lot more easily using the name date_sent. Here’s an example that shows how a Perl DBI script might process such values. It retrieves rows into a hash and refers to column values by name:

$sth = $dbh->prepare ("SELECT srcuser,

DATE_FORMAT(t,'%M %e, %Y') AS date_sent

FROM mail");

$sth->execute ();

while (my $ref = $sth->fetchrow_hashref ())

{

printf "user: %s, date sent: %s\n", $ref->{srcuser}, $ref->{date_sent};

}

In Java, you’d do something like this, where the argument to getString() names the column containing the value that you want to access:

Statement s = conn.createStatement ();

s.executeQuery ("SELECT srcuser,"

+ " DATE_FORMAT(t,'%M %e, %Y') AS date_sent"

+ " FROM mail");

ResultSet rs = s.getResultSet ();

while (rs.next ()) // loop through rows of result set

{

String name = rs.getString ("srcuser");

String dateSent = rs.getString ("date_sent");

System.out.println ("user: " + name + ", date sent: " + dateSent);

}

rs.close ();

s.close ();

In Ruby, rows can be fetched as objects in which columns are accessible by either position or name. In PHP, the PEAR DB module enables you to retrieve rows as associative arrays or objects, both of which are data structures that contain named elements. With Python, use a cursor class that causes rows to be returned as dictionaries containing key/value pairs where the keys are the column names.

See Also

Issuing Statements and Retrieving Results shows for each of our programming languages how to fetch rows into data structures that enable you to access columns values by column name. Also, the select directory of the recipes directory has examples that show how to do this for the mailtable.

Combining Columns to Construct Composite Values

Problem

You want to display values that are constructed from multiple table columns.

Solution

One way to do this is to use CONCAT(). You can also give the column a nicer name by using an alias.

Discussion

Column values can be combined to produce composite output values. For example, this expression concatenates srcuser and srchost values into email address format:

CONCAT(srcuser,'@',srchost)

Such expressions tend to produce ugly column names, but you can use column aliases to provide better ones. The following statement uses the aliases sender and recipient to name output columns that are constructed by combining usernames and hostnames into email addresses:

mysql>SELECT

-> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,

-> CONCAT(srcuser,'@',srchost) AS sender,

-> CONCAT(dstuser,'@',dsthost) AS recipient,

-> size FROM mail;

+--------------+---------------+---------------+---------+

| date_sent | sender | recipient | size |

+--------------+---------------+---------------+---------+

| May 11, 2006 | barb@saturn | tricia@mars | 58274 |

| May 12, 2006 | tricia@mars | gene@venus | 194925 |

| May 12, 2006 | phil@mars | phil@saturn | 1048 |

| May 13, 2006 | barb@saturn | tricia@venus | 271 |

...

WHERE Clauses and Column Aliases

Problem

You want to refer to a column alias in a WHERE clause.

Solution

Sorry, you cannot. But there is a workaround.

Discussion

You cannot refer to column aliases in a WHERE clause. Thus, the following statement is illegal:

mysql>SELECT t, srcuser, dstuser, size/1024 AS kilobytes

-> FROM mail WHERE kilobytes > 500;

ERROR 1054 (42S22): Unknown column 'kilobytes' in 'where clause'

The error occurs because an alias names an output column, whereas a WHERE clause operates on input columns to determine which rows to select for output. To make the statement legal, replace the alias in the WHERE clause with the column or expression that the alias represents:

mysql>SELECT t, srcuser, dstuser, size/1024 AS kilobytes

-> FROM mail WHERE size/1024 > 500;

+---------------------+---------+---------+-----------+

| t | srcuser | dstuser | kilobytes |

+---------------------+---------+---------+-----------+

| 2006-05-14 17:03:01 | tricia | phil | 2338.3613 |

| 2006-05-15 10:25:52 | gene | tricia | 975.1289 |

+---------------------+---------+---------+-----------+

Debugging Comparison Expressions

Problem

You’re curious about how a comparison in a WHERE clause works. Or perhaps about why it doesn’t seem to be working.

Solution

Display the result of the comparison to get more information about it. This is a useful diagnostic or debugging technique.

Discussion

Normally, you put comparison operations in the WHERE clause of a query and use them to determine which rows to display:

mysql>SELECT * FROM mail WHERE srcuser < 'c' AND size > 5000;

+---------------------+---------+---------+---------+---------+-------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+-------+

| 2006-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |

| 2006-05-14 14:42:21 | barb | venus | barb | venus | 98151 |

+---------------------+---------+---------+---------+---------+-------+

But sometimes it’s desirable to see the result of the comparison itself (for example, if you’re not sure that the comparison is working the way you expect it to). To do this, just remove the WHERE clause, and put the comparison expression in the output column list, perhaps also including the values that you’re comparing:

mysql>SELECT srcuser, srcuser < 'c', size, size > 5000 FROM mail;

+---------+---------------+---------+-------------+

| srcuser | srcuser < 'c' | size | size > 5000 |

+---------+---------------+---------+-------------+

| barb | 1 | 58274 | 1 |

| tricia | 0 | 194925 | 1 |

| phil | 0 | 1048 | 0 |

| barb | 1 | 271 | 0 |

...

In these results, 1 means true and 0 means false.

Removing Duplicate Rows

Problem

Output from a query contains duplicate rows. You want to eliminate them.

Solution

Use DISTINCT.

Discussion

Some queries produce results containing duplicate rows. For example, to see who sent mail, you could query the mail table like this:

mysql>SELECT srcuser FROM mail;

+---------+

| srcuser |

+---------+

| barb |

| tricia |

| phil |

| barb |

| gene |

| phil |

| barb |

| tricia |

| gene |

| phil |

| gene |

| gene |

| gene |

| phil |

| phil |

| gene |

+---------+

That result is heavily redundant. Adding DISTINCT to the query removes the duplicate rows, producing a set of unique values:

mysql>SELECT DISTINCT srcuser FROM mail;

+---------+

| srcuser |

+---------+

| barb |

| tricia |

| phil |

| gene |

+---------+

DISTINCT works with multiple-column output, too. The following query shows which dates are represented in the mail table:

mysql>SELECT DISTINCT YEAR(t), MONTH(t), DAYOFMONTH(t) FROM mail;

+---------+----------+---------------+

| YEAR(t) | MONTH(t) | DAYOFMONTH(t) |

+---------+----------+---------------+

| 2006 | 5 | 11 |

| 2006 | 5 | 12 |

| 2006 | 5 | 13 |

| 2006 | 5 | 14 |

| 2006 | 5 | 15 |

| 2006 | 5 | 16 |

| 2006 | 5 | 17 |

| 2006 | 5 | 19 |

+---------+----------+---------------+

To count the number of unique values in a column, use COUNT(DISTINCT):

mysql>SELECT COUNT(DISTINCT srcuser) FROM mail;

+-------------------------+

| COUNT(DISTINCT srcuser) |

+-------------------------+

| 4 |

+-------------------------+

See Also

Chapter 8 revisits DISTINCT and COUNT(DISTINCT). Chapter 14 discusses duplicate removal in more detail.

Working with NULL Values

Problem

You’re trying to compare column values to NULL, but it isn’t working.

Solution

You have to use the proper comparison operators: IS NULL, IS NOT NULL, or <=>.

Discussion

Conditions that involve NULL are special. You cannot use comparisons of the form value = NULL or value != NULL to check whether value is NULL. Such comparisons always produce a result of NULL because it’s impossible to tell whether they are true or false. Even NULL = NULLproduces NULL because you can’t determine whether one unknown value is the same as another unknown value.

To look for columns that are or are not NULL, use the IS NULL or IS NOT NULL operator. Suppose that a table taxpayer contains taxpayer names and ID numbers, where a NULL value in the id column indicates that the value is unknown:

mysql>SELECT * FROM taxpayer;

+---------+--------+

| name | id |

+---------+--------+

| bernina | 198-48 |

| bertha | NULL |

| ben | NULL |

| bill | 475-83 |

+---------+--------+

You can see that = and != do not identify NULL values as follows:

mysql>SELECT * FROM taxpayer WHERE id = NULL;

Empty set (0.00 sec)

mysql> SELECT * FROM taxpayer WHERE id != NULL;

Empty set (0.01 sec)

To find rows where the id column is or is not NULL, write the statements like this instead:

mysql>SELECT * FROM taxpayer WHERE id IS NULL;

+--------+------+

| name | id |

+--------+------+

| bertha | NULL |

| ben | NULL |

+--------+------+

mysql> SELECT * FROM taxpayer WHERE id IS NOT NULL;

+---------+--------+

| name | id |

+---------+--------+

| bernina | 198-48 |

| bill | 475-83 |

+---------+--------+

You can also use <=> to compare values, which (unlike the = operator) is true even for two NULL values:

mysql>SELECT NULL = NULL, NULL <=> NULL;

+-------------+---------------+

| NULL = NULL | NULL <=> NULL |

+-------------+---------------+

| NULL | 1 |

+-------------+---------------+

Sometimes it’s useful to map NULL values onto some other distinctive value that has more meaning in the context of your application. If NULLid values in the taxpayer table mean “unknown,” you can display that fact by usingIF() to map NULL onto the string Unknown:

mysql>SELECT name, IF(id IS NULL,'Unknown', id) AS 'id' FROM taxpayer;

+---------+---------+

| name | id |

+---------+---------+

| bernina | 198-48 |

| bertha | Unknown |

| ben | Unknown |

| bill | 475-83 |

+---------+---------+

This technique actually works for any kind of value, but it’s especially useful with NULL values because NULL tends to be given a variety of meanings: unknown, missing, not yet determined, out of range, and so forth. You can choose the label that makes most sense in a given context.

The preceding query can be written more concisely using IFNULL(), which tests its first argument and returns it if it’s not NULL, or returns its second argument otherwise:

mysql>SELECT name, IFNULL(id,'Unknown') AS 'id' FROM taxpayer;

+---------+---------+

| name | id |

+---------+---------+

| bernina | 198-48 |

| bertha | Unknown |

| ben | Unknown |

| bill | 475-83 |

+---------+---------+

In other words, these two tests are equivalent:

IF(expr1 IS NOT NULL,expr1,expr2)

IFNULL(expr1,expr2)

From a readability standpoint, IF() often is easier to understand than IFNULL(). From a computational perspective, IFNULL() is more efficient because expr1 need not be evaluated twice, as happens with IF().

See Also

NULL values also behave specially with respect to sorting and summary operations. See Recipes and .

Writing Comparisons Involving NULL in Programs

Problem

You’re writing a program that looks for rows containing a specific value, but it fails when the value is NULL.

Solution

Choose the proper comparison operator according to whether the comparison value is or is not NULL.

Discussion

The need to use different comparison operators for NULL values than for non-NULL values leads to a subtle danger when constructing statement strings within programs. If you have a value stored in a variable that might represent a NULL value, you must account for that if you use the value in comparisons. For example, in Perl, undef represents a NULL value, so to construct a statement that finds rows in the taxpayer table matching some arbitrary value in an $id variable, you cannot do this:

$sth = $dbh->prepare ("SELECT * FROM taxpayer WHERE id = ?");

$sth->execute ($id);

The statement fails when $id is undef because the resulting statement becomes:

SELECT * FROM taxpayer WHERE id = NULL

A comparison of id = NULL is never true, so that statement returns no rows. To take into account the possibility that $id may be undef, construct the statement using the appropriate comparison operator like this:

$operator = (defined ($id) ? "=" : "IS");

$sth = $dbh->prepare ("SELECT * FROM taxpayer WHERE id $operator ?");

$sth->execute ($id);

This results in statements as follows for $id values of undef (NULL) or 43 (not NULL):

SELECT * FROM taxpayer WHERE id IS NULL

SELECT * FROM taxpayer WHERE id = 43

For inequality tests, set $operator like this instead:

$operator = (defined ($id) ? "!=" : "IS NOT");

Another way to avoid all this trouble, if it’s not necessary to allow a column to contain NULL values, is to declare it NOT NULL when you create the table. For example, the taxpayer table could have been defined like this to disallow NULL values in either of its columns:

# taxpayer2.sql

# taxpayer table, defined with NOT NULL columns

DROP TABLE IF EXISTS taxpayer;

CREATE TABLE taxpayer

(

name CHAR(20) NOT NULL,

id CHAR(20) NOT NULL

);

Sorting a Result Set

Problem

Your query results aren’t sorted the way you want.

Solution

MySQL can’t read your mind. Add an ORDER BY clause to tell it exactly how you want result rows sorted.

Discussion

When you select rows, the MySQL server is free to return them in any order, unless you instruct it otherwise by saying how to sort the result. There are lots of ways to use sorting techniques. Chapter 7 explores this topic in detail. Briefly, you sort a result set by adding an ORDER BY clause that names the column or columns that you want to use for sorting. The following statement sorts rows by size:

mysql>SELECT * FROM mail WHERE size > 100000 ORDER BY size;

+---------------------+---------+---------+---------+---------+---------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+---------+

| 2006-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |

| 2006-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |

| 2006-05-14 17:03:01 | tricia | saturn | phil | venus | 2394482 |

+---------------------+---------+---------+---------+---------+---------+

This statement names multiple columns in the ORDER BY clause to sort rows by host, and then by user within each host:

mysql>SELECT * FROM mail WHERE dstuser = 'tricia'

-> ORDER BY srchost, srcuser;

+---------------------+---------+---------+---------+---------+--------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+--------+

| 2006-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |

| 2006-05-14 11:52:17 | phil | mars | tricia | saturn | 5781 |

| 2006-05-17 12:49:23 | phil | mars | tricia | saturn | 873 |

| 2006-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |

| 2006-05-13 13:59:18 | barb | saturn | tricia | venus | 271 |

+---------------------+---------+---------+---------+---------+--------+

To sort a column in reverse (descending) order, add the keyword DESC after its name in the ORDER BY clause:

mysql>SELECT * FROM mail WHERE size > 50000 ORDER BY size DESC;

+---------------------+---------+---------+---------+---------+---------+

| t | srcuser | srchost | dstuser | dsthost | size |

+---------------------+---------+---------+---------+---------+---------+

| 2006-05-14 17:03:01 | tricia | saturn | phil | venus | 2394482 |

| 2006-05-15 10:25:52 | gene | mars | tricia | saturn | 998532 |

| 2006-05-12 12:48:13 | tricia | mars | gene | venus | 194925 |

| 2006-05-14 14:42:21 | barb | venus | barb | venus | 98151 |

| 2006-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 |

+---------------------+---------+---------+---------+---------+---------+

Using Views to Simplify Table Access

Problem

You often retrieve values that are calculated from expressions and you want a simpler way to refer to those values than writing the expressions each time you need them.

Solution

Use a view defined such that its columns perform the desired calculations.

Discussion

In Combining Columns to Construct Composite Values, we retrieved several values from the mail table, using expressions to calculate most of them:

mysql>SELECT

-> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,

-> CONCAT(srcuser,'@',srchost) AS sender,

-> CONCAT(dstuser,'@',dsthost) AS recipient,

-> size FROM mail;

+--------------+---------------+---------------+---------+

| date_sent | sender | recipient | size |

+--------------+---------------+---------------+---------+

| May 11, 2006 | barb@saturn | tricia@mars | 58274 |

| May 12, 2006 | tricia@mars | gene@venus | 194925 |

| May 12, 2006 | phil@mars | phil@saturn | 1048 |

| May 13, 2006 | barb@saturn | tricia@venus | 271 |

...

One problem with such a statement is that if you have to issue it often, it’s inconvenient to write the expressions repeatedly. You can make the statement results easier to access by using a view. A view is a virtual table that does not contain any data itself. Instead, it’s defined as the SELECTstatement that retrieves the data of interest. The following view, mail_view, is equivalent to the SELECT statement just shown:

mysql>CREATE VIEW mail_view AS

-> SELECT

-> DATE_FORMAT(t,'%M %e, %Y') AS date_sent,

-> CONCAT(srcuser,'@',srchost) AS sender,

-> CONCAT(dstuser,'@',dsthost) AS recipient,

-> size FROM mail;

To access the view contents, refer to it like any other table. You can select some or all of its columns, add a WHERE clause to restrict which rows to retrieve, use ORDER BY to sort the rows, and so forth. For example:

mysql>SELECT date_sent, sender, size FROM mail_view

-> WHERE size > 100000 ORDER BY size;

+--------------+---------------+---------+

| date_sent | sender | size |

+--------------+---------------+---------+

| May 12, 2006 | tricia@mars | 194925 |

| May 15, 2006 | gene@mars | 998532 |

| May 14, 2006 | tricia@saturn | 2394482 |

+--------------+---------------+---------+

Selecting Data from More Than One Table

Problem

You need to retrieve data from more than one table.

Solution

Use a join or a subquery.

Discussion

The queries shown so far select data from a single table, but sometimes you need to retrieve information from multiple tables. Two types of queries that accomplish this are joins and subqueries. A join matches rows in one table with rows in another and enables you to retrieve output rows that contain columns from either or both tables. A subquery is one query nested within the other. The result is a query that performs a comparison between values selected by the “inner” query against values selected by the “outer” query.

In this section, I will show a couple of brief examples to illustrate the basic ideas. Other examples appear elsewhere: subqueries are used in various examples throughout the book (for example, Recipes and ). Chapter 12 discusses joins in detail, including some that select from more than two tables.

The following examples use the profile table that was introduced in Chapter 2; recall that it lists the people on your buddy list. Let’s extend the scenario that uses that table a little bit to include another table named profile_contact. This second table contains information about how to contact people listed in the profile table via various instant messaging systems and is defined like this:

CREATE TABLE profile_contact

(

profile_id INT UNSIGNED NOT NULL, # ID from profile table

service CHAR(20) NOT NULL, # messaging service name

contact_name CHAR(25) NOT NULL, # name to use for contacting person

INDEX (profile_id)

);

The table associates each row with the proper profile row via the profile_id column. The service and contact_name columns name the messaging service and the name to use for contacting the given person via that service. For the examples, assume that the table contains these rows:

mysql>SELECT * FROM profile_contact ORDER BY profile_id, service;

+------------+---------+---------------+

| profile_id | service | contact_name |

+------------+---------+---------------+

| 1 | AIM | user1-aimid |

| 1 | MSN | user1-msnid |

| 2 | AIM | user2-aimid |

| 2 | MSN | user2-msnid |

| 2 | Yahoo | user2-yahooid |

| 4 | Yahoo | user4-yahooid |

+------------+---------+---------------+

A question that requires combining information from both tables is, “For each person in the profile table, show me the messaging services I can use to get in touch, and the contact name to use for each service.” To answer this question, use a join. Select from both tables and match rows by comparing the id column from the profile table with the profile_id column from the profile_contact table:

mysql>SELECT id, name, service, contact_name

-> FROM profile INNER JOIN profile_contact ON id = profile_id;

+----+------+---------+---------------+

| id | name | service | contact_name |

+----+------+---------+---------------+

| 1 | Fred | AIM | user1-aimid |

| 1 | Fred | MSN | user1-msnid |

| 2 | Mort | AIM | user2-aimid |

| 2 | Mort | MSN | user2-msnid |

| 2 | Mort | Yahoo | user2-yahooid |

| 4 | Carl | Yahoo | user4-yahooid |

+----+------+---------+---------------+

In the FROM clause, the query indicates the tables from which data should be selected, and the ON clause tells MySQL which columns to use when searching for matches between the two tables. In the resulting output, rows include the id and name columns from the profile table, and theservice and contact_name columns from the profile_contact table.

Here’s another question for which both tables are used to derive the answer: “List all the profile_contact records for Mort.” To pull the proper rows from the profile_contact table, you need to know Mort’s ID, which is stored in the profile table. To write the query without looking up Mort’s ID yourself, use a subquery that, given his name, looks it up for you:

mysql>SELECT * FROM profile_contact

-> WHERE profile_id = (SELECT id FROM profile WHERE name = 'Mort');

+------------+---------+---------------+

| profile_id | service | contact_name |

+------------+---------+---------------+

| 2 | AIM | user2-aimid |

| 2 | MSN | user2-msnid |

| 2 | Yahoo | user2-yahooid |

+------------+---------+---------------+

Here the subquery appears as a nested SELECT statement enclosed within parentheses.

Selecting Rows from the Beginning or End of a Result Set

Problem

You want to see only certain rows from a result set, such as the first one or the last five.

Solution

Use a LIMIT clause, perhaps in conjunction with an ORDER BY clause.

Discussion

MySQL supports a LIMIT clause that tells the server to return only part of a result set. LIMIT is a MySQL-specific extension to SQL that is extremely valuable when your result set contains more rows than you want to see at a time. It enables you to retrieve just the first part of a result set or an arbitrary section of the set. Typically, LIMIT is used for the following kinds of problems:

§ Answering questions about first or last, largest or smallest, newest or oldest, least or more expensive, and so forth.

§ Splitting a result set into sections so that you can process it one piece at a time. This technique is common in web applications for displaying a large search result across several pages. Showing the result in sections enables display of smaller pages that are easier to understand. SeeSelecting Rows from the Middle of a Result Set for details on this.

The following examples use the profile table that was introduced in Chapter 2. Its contents look like this:

mysql>SELECT * FROM profile;

+----+---------+------------+-------+-----------------------+------+

| id | name | birth | color | foods | cats |

+----+---------+------------+-------+-----------------------+------+

| 1 | Fred | 1970-04-13 | black | lutefisk,fadge,pizza | 0 |

| 2 | Mort | 1969-09-30 | white | burrito,curry,eggroll | 3 |

| 3 | Brit | 1957-12-01 | red | burrito,curry,pizza | 1 |

| 4 | Carl | 1973-11-02 | red | eggroll,pizza | 4 |

| 5 | Sean | 1963-07-04 | blue | burrito,curry | 5 |

| 6 | Alan | 1965-02-14 | red | curry,fadge | 1 |

| 7 | Mara | 1968-09-17 | green | lutefisk,fadge | 1 |

| 8 | Shepard | 1975-09-02 | black | curry,pizza | 2 |

| 9 | Dick | 1952-08-20 | green | lutefisk,fadge | 0 |

| 10 | Tony | 1960-05-01 | white | burrito,pizza | 0 |

+----+---------+------------+-------+-----------------------+------+

To select the first n rows of a query result, add LIMIT n to the end of your SELECT statement:

mysql>SELECT * FROM profile LIMIT 1;

+----+------+------------+-------+----------------------+------+

| id | name | birth | color | foods | cats |

+----+------+------------+-------+----------------------+------+

| 1 | Fred | 1970-04-13 | black | lutefisk,fadge,pizza | 0 |

+----+------+------------+-------+----------------------+------+

mysql> SELECT * FROM profile LIMIT 5;

+----+------+------------+-------+-----------------------+------+

| id | name | birth | color | foods | cats |

+----+------+------------+-------+-----------------------+------+

| 1 | Fred | 1970-04-13 | black | lutefisk,fadge,pizza | 0 |

| 2 | Mort | 1969-09-30 | white | burrito,curry,eggroll | 3 |

| 3 | Brit | 1957-12-01 | red | burrito,curry,pizza | 1 |

| 4 | Carl | 1973-11-02 | red | eggroll,pizza | 4 |

| 5 | Sean | 1963-07-04 | blue | burrito,curry | 5 |

+----+------+------------+-------+-----------------------+------+

Note that LIMIT n really means “return at most n rows.” If you specify LIMIT 10, and the result set has only 3 rows, the server returns 3 rows.

The rows in the preceding query results aren’t sorted into any particular order, so they may not be very meaningful. A more common technique is to use ORDER BY to sort the result set. Then you can use LIMIT to find smallest and largest values. For example, to find the row with the minimum (earliest) birth date, sort by the birth column, and then add LIMIT 1 to retrieve the first row:

mysql>SELECT * FROM profile ORDER BY birth LIMIT 1;

+----+------+------------+-------+----------------+------+

| id | name | birth | color | foods | cats |

+----+------+------------+-------+----------------+------+

| 9 | Dick | 1952-08-20 | green | lutefisk,fadge | 0 |

+----+------+------------+-------+----------------+------+

This works because MySQL processes the ORDER BY clause to sort the rows first, and then applies LIMIT.

To obtain rows from the end of a result set, sort them in the opposite order. The statement that finds the row with the most recent birth date is similar to the previous one, except that you sort in descending order:

mysql>SELECT * FROM profile ORDER BY birth DESC LIMIT 1;

+----+---------+------------+-------+-------------+------+

| id | name | birth | color | foods | cats |

+----+---------+------------+-------+-------------+------+

| 8 | Shepard | 1975-09-02 | black | curry,pizza | 2 |

+----+---------+------------+-------+-------------+------+

To find the earliest or latest birthday within the calendar year, sort by the month and day of the birth values:

mysql>SELECT name, DATE_FORMAT(birth,'%m-%d') AS birthday

-> FROM profile ORDER BY birthday LIMIT 1;

+------+----------+

| name | birthday |

+------+----------+

| Alan | 02-14 |

+------+----------+

You can obtain the same information by running these statements without LIMIT and ignoring everything but the first row. The advantage of using LIMIT is that the server returns just the first row, and the extra rows don’t travel over the network at all. This is much more efficient than retrieving an entire result set, only to discard all but one row.

See Also

Be aware that using LIMIT n to select the “n smallest” or “n largest” values may not yield quite the results you expect. See Choosing Appropriate LIMIT Values for some discussion on framing LIMIT clauses appropriately for the questions that you are asking.

LIMIT is useful in combination with RAND() to make random selections from a set of items. See Chapter 13.

You can use LIMIT to restrict the effect of a DELETE or UPDATE statement to a subset of the rows that would otherwise be deleted or updated, respectively. This can be useful in conjunction with a WHERE clause. For example, if a table contains five instances of a row, you can select them in aDELETE statement with an appropriate WHERE clause, and then remove the duplicates by adding LIMIT 4 to the end of the statement. This leaves only one copy of the row. For more information about uses of LIMIT in duplicate row removal, see Eliminating Duplicates from a Table.

Selecting Rows from the Middle of a Result Set

Problem

You don’t want the first or last rows of a result set. Instead, you want to pull a section of rows out of the middle of the set, such as rows 21 through 40.

Solution

That’s still a job for LIMIT. But you need to tell it the starting position within the result set in addition to the number of rows you want.

Discussion

LIMITn tells the server to return the first n rows of a result set. LIMIT also has a two-argument form that enables you to pick out any arbitrary section of rows from a result. The arguments indicate how many rows to skip and how many to return. This means that you can use LIMIT to do such things as skip two rows and return the next, thus answering questions such as “what is the third-smallest or third-largest value?” These are questions thatMIN() or MAX() are not suited for, but are easy with LIMIT:

mysql>SELECT * FROM profile ORDER BY birth LIMIT 2,1;

+----+------+------------+-------+---------------+------+

| id | name | birth | color | foods | cats |

+----+------+------------+-------+---------------+------+

| 10 | Tony | 1960-05-01 | white | burrito,pizza | 0 |

+----+------+------------+-------+---------------+------+

mysql> SELECT * FROM profile ORDER BY birth DESC LIMIT 2,1;

+----+------+------------+-------+----------------------+------+

| id | name | birth | color | foods | cats |

+----+------+------------+-------+----------------------+------+

| 1 | Fred | 1970-04-13 | black | lutefisk,fadge,pizza | 0 |

+----+------+------------+-------+----------------------+------+

The two-argument form of LIMIT also makes it possible to partition a result set into smaller sections. For example, to retrieve 20 rows at a time from a result, issue the same SELECT statement repeatedly, but vary the LIMIT clauses like so:

retrieve first 20 rows

SELECT ... FROM ... ORDER BY ... LIMIT 0, 20;

skip 20 rows, retrieve next 20

SELECT ... FROM ... ORDER BY ... LIMIT 20, 20;

skip 40 rows, retrieve next 20

SELECT ... FROM ... ORDER BY ... LIMIT 40, 20;

etc.

Web developers often use LIMIT this way to split a large search result into smaller, more manageable pieces so that it can be presented over several pages. We’ll discuss this technique further in Generating Previous-Page and Next-Page Links.

To determine the number of rows in a result set so that you can determine how many sections there are, you can issue a COUNT() statement first. For example, to display profile table rows in name order, four at a time, you can find out how many there are with the following statement:

mysql>SELECT COUNT(*) FROM profile;

+----------+

| COUNT(*) |

+----------+

| 10 |

+----------+

That tells you that you’ll have three sets of rows (although the last one will have fewer than four rows), which you can retrieve as follows:

SELECT * FROM profile ORDER BY name LIMIT 0, 4;

SELECT * FROM profile ORDER BY name LIMIT 4, 4;

SELECT * FROM profile ORDER BY name LIMIT 8, 4;

You can also fetch a part of a result set and find out in the same statement how big the result would have been without the LIMIT clause. For example, to fetch the first four rows from the profile table and then obtain the size of the full result, run these statements:

SELECT SQL_CALC_FOUND_ROWS * FROM profile ORDER BY name LIMIT 4;

SELECT FOUND_ROWS();

The keyword SQL_CALC_FOUND_ROWS in the first statement tells MySQL to calculate the size of the entire result set even though the statement requests that only part of it be returned. The row count is available by calling FOUND_ROWS(). If that function returns a value greater than four, there are other rows yet to be retrieved.

Choosing Appropriate LIMIT Values

Problem

LIMITdoesn’t seem to do what you want it to.

Solution

Be sure that you understand what question you’re asking. It may be that LIMIT is exposing some interesting subtleties in your data that you have not considered.

Discussion

LIMITn is useful in conjunction with ORDERBY for selecting smallest or largest values from a result set. But does that actually give you the rows with the n smallest or largest values? Not necessarily! It does if your rows contain unique values, but not if there are duplicates. You may find it necessary to run a preliminary query first to help you choose the proper LIMIT value.

To see why this is, consider the following dataset, which shows the American League pitchers who won 15 or more games during the 2001 baseball season (you can find this data in the al_winner.sql file in the tables directory of the recipes distribution):

mysql>SELECT name, wins FROM al_winner

-> ORDER BY wins DESC, name;

+----------------+------+

| name | wins |

+----------------+------+

| Mulder, Mark | 21 |

| Clemens, Roger | 20 |

| Moyer, Jamie | 20 |

| Garcia, Freddy | 18 |

| Hudson, Tim | 18 |

| Abbott, Paul | 17 |

| Mays, Joe | 17 |

| Mussina, Mike | 17 |

| Sabathia, C.C. | 17 |

| Zito, Barry | 17 |

| Buehrle, Mark | 16 |

| Milton, Eric | 15 |

| Pettitte, Andy | 15 |

| Radke, Brad | 15 |

| Sele, Aaron | 15 |

+----------------+------+

If you want to know who won the most games, adding LIMIT 1 to the preceding statement gives you the correct answer because the maximum value is 21, and there is only one pitcher with that value (Mark Mulder). But what if you want the four highest game winners? The proper statements depend on what you mean by that, which can have various interpretations:

§ If you just want the first four rows, sort the rows, and add LIMIT 4:

§ mysql>SELECT name, wins FROM al_winner

§ -> ORDER BY wins DESC, name

§ -> LIMIT 4;

§ +----------------+------+

§ | name | wins |

§ +----------------+------+

§ | Mulder, Mark | 21 |

§ | Clemens, Roger | 20 |

§ | Moyer, Jamie | 20 |

§ | Garcia, Freddy | 18 |

+----------------+------+

That may not suit your purposes because LIMIT imposes a cutoff that occurs in the middle of a set of pitchers with the same number of wins (Tim Hudson also won 18 games).

§ To avoid making a cutoff in the middle of a set of rows with the same value, select rows with values greater than or equal to the value in the fourth row. Find out what that value is with LIMIT, and then use it in the WHERE clause of a second query to select rows:

§ mysql>SELECT wins FROM al_winner

§ -> ORDER BY wins DESC, name

§ -> LIMIT 3, 1;

§ +------+

§ | wins |

§ +------+

§ | 18 |

§ +------+

§ mysql> SELECT name, wins FROM al_winner

§ -> WHERE wins >= 18

§ -> ORDER BY wins DESC, name;

§ +----------------+------+

§ | name | wins |

§ +----------------+------+

§ | Mulder, Mark | 21 |

§ | Clemens, Roger | 20 |

§ | Moyer, Jamie | 20 |

§ | Garcia, Freddy | 18 |

§ | Hudson, Tim | 18 |

+----------------+------+

To select these results in a single statement, without having to substitute the cutoff value from one statement manually into the other, use the first statement as a subquery of the second:

mysql>SELECT name, wins FROM al_winner

-> WHERE wins >=

-> (SELECT wins FROM al_winner

-> ORDER BY wins DESC, name

-> LIMIT 3, 1)

-> ORDER BY wins DESC, name;

+----------------+------+

| name | wins |

+----------------+------+

| Mulder, Mark | 21 |

| Clemens, Roger | 20 |

| Moyer, Jamie | 20 |

| Garcia, Freddy | 18 |

| Hudson, Tim | 18 |

+----------------+------+

§ If you want to know all the pitchers with the four largest wins values, another approach is needed. Determine the fourth-largest value with DISTINCT and LIMIT, and then use it to select rows:

§ mysql>SELECT DISTINCT wins FROM al_winner

§ -> ORDER BY wins DESC, name

§ -> LIMIT 3, 1;

§ +------+

§ | wins |

§ +------+

§ | 17 |

§ +------+

§ mysql> SELECT name, wins FROM al_winner

§ -> WHERE wins >= 17

§ -> ORDER BY wins DESC, name;

§ +----------------+------+

§ | name | wins |

§ +----------------+------+

§ | Mulder, Mark | 21 |

§ | Clemens, Roger | 20 |

§ | Moyer, Jamie | 20 |

§ | Garcia, Freddy | 18 |

§ | Hudson, Tim | 18 |

§ | Abbott, Paul | 17 |

§ | Mays, Joe | 17 |

§ | Mussina, Mike | 17 |

§ | Sabathia, C.C. | 17 |

§ | Zito, Barry | 17 |

+----------------+------+

As in the previous example, these statements can be combined into one by using a subquery:

mysql>SELECT name, wins FROM al_winner

-> WHERE wins >=

-> (SELECT DISTINCT wins FROM al_winner

-> ORDER BY wins DESC, name

-> LIMIT 3, 1)

-> ORDER BY wins DESC, name;

+----------------+------+

| name | wins |

+----------------+------+

| Mulder, Mark | 21 |

| Clemens, Roger | 20 |

| Moyer, Jamie | 20 |

| Garcia, Freddy | 18 |

| Hudson, Tim | 18 |

| Abbott, Paul | 17 |

| Mays, Joe | 17 |

| Mussina, Mike | 17 |

| Sabathia, C.C. | 17 |

| Zito, Barry | 17 |

+----------------+------+

For this dataset, each method yields a different result for “four highest.” The moral is that the way you use LIMIT may require some thought about what you really want to know.

What to Do When LIMIT Requires the “Wrong” Sort Order

Problem

LIMITusually works best in conjunction with an ORDERBY clause that sorts rows. But sometimes the sort order is the opposite of what you want for the final result.

Solution

Use LIMIT in a subquery to retrieve the rows you want, and then use the outer query to sort them into the proper order.

Discussion

If you want the last four rows of a result set, you can obtain them easily by sorting the set in reverse order and using LIMIT 4. For example, the following statement returns the names and birth dates for the four people in the profile table who were born most recently:

mysql>SELECT name, birth FROM profile ORDER BY birth DESC LIMIT 4;

+---------+------------+

| name | birth |

+---------+------------+

| Shepard | 1975-09-02 |

| Carl | 1973-11-02 |

| Fred | 1970-04-13 |

| Mort | 1969-09-30 |

+---------+------------+

But that requires sorting the birth values in descending order to place them at the head of the result set. What if you want the output rows to appear in ascending order instead? One way to solve this problem is to use two statements. First, use COUNT() to find out how many rows are in the table:

mysql>SELECT COUNT(*) FROM profile;

+----------+

| COUNT(*) |

+----------+

| 10 |

+----------+

Then, sort the values in ascending order and use the two-argument form of LIMIT to skip all but the last four rows:

mysql>SELECT name, birth FROM profile ORDER BY birth LIMIT 6, 4;

+---------+------------+

| name | birth |

+---------+------------+

| Mort | 1969-09-30 |

| Fred | 1970-04-13 |

| Carl | 1973-11-02 |

| Shepard | 1975-09-02 |

+---------+------------+

That’s somewhat unsatisfactory because it requires that you determine how many rows to skip. A more general approach is use LIMIT within a subquery to select the rows that you want, and then sort them in opposite order in the outer query:

mysql>SELECT * FROM

-> (SELECT name, birth FROM profile ORDER BY birth DESC LIMIT 4) AS t

-> ORDER BY birth;

+---------+------------+

| name | birth |

+---------+------------+

| Mort | 1969-09-30 |

| Fred | 1970-04-13 |

| Carl | 1973-11-02 |

| Shepard | 1975-09-02 |

+---------+------------+

ASt is used here because any table referred to in theFROM clause must have a name.

Calculating LIMIT Values from Expressions

Problem

You want to use expressions to specify the arguments for LIMIT.

Solution

Sadly, you cannot. You can use only literal integers—unless you issue the statement from within a program, in which case you can evaluate the expressions yourself and insert the resulting values into the statement string.

Discussion

Arguments to LIMIT must be literal integers, not expressions. Statements such as the following are illegal:

SELECT * FROM profile LIMIT 5+5;

SELECT * FROM profile LIMIT @skip_count, @show_count;

The same “no expressions allowed” principle applies if you’re using an expression to calculate a LIMIT value in a program that constructs a statement string. You must evaluate the expression first, and then place the resulting value in the statement. For example, if you produce a statement string in Perl or PHP as follows, an error will result when you attempt to execute the statement:

$str = "SELECT * FROM profile LIMIT $x + $y";

To avoid the problem, evaluate the expression first:

$z = $x + $y;

$str = "SELECT * FROM profile LIMIT $z";

Or do this (but don’t omit the parentheses or the expression won’t evaluate properly):

$str = "SELECT * FROM profile LIMIT " . ($x + $y);

If you’re constructing a two-argument LIMIT clause, evaluate both expressions before placing them into the statement string.