Doing More with MySQL - Using MySQL - Learning MySQL (2007)

Learning MySQL (2007)

Part II. Using MySQL

Chapter 8. Doing More with MySQL

MySQL is feature-rich. Over the past three chapters, you’ve seen the wide variety of techniques that can be used to query, modify, and manage data. However, there’s still much more that MySQL can do, and some of those additional features are the subject of this chapter.

In this chapter, you’ll learn how to:

§ Insert data into a database from other sources, including with queries and from text files

§ Perform updates and deletes using multiple tables in a single statement

§ Replace data

§ Use MySQL functions in queries to meet more complex information needs

§ Analyze queries using the EXPLAIN statement and then improve their performance with simple optimization techniques

Inserting Data Using Queries

Much of the time, you’ll create tables using data from another source. The examples you’ve seen so far in Chapter 5 therefore illustrate only part of the problem: they show you how to insert data that’s already in the form you want—that is, formatted as an SQL INSERT statement. The other ways to insert data include using SQL SELECT statements on other tables or databases, and reading in files from other sources. This section shows you how to tackle the former method of inserting data; you’ll learn how to insert data from a file of comma-separated values in the next section,Loading Data from Comma-Delimited Files.”

Suppose you’ve decided to create a new table in the music database. It’s going to store a shuffle list, tracks that are randomly selected from your music collection, put into a list, and played to you in that order. It’s a way of tasting part of the collection, rediscovering some old favorites and learning about hidden treasures in those albums you haven’t explored. We’ve decided to structure the table as follows:

mysql> CREATE TABLE shuffle (

-> artist_id SMALLINT(5) NOT NULL DEFAULT 0,

-> album_id SMALLINT(4) NOT NULL DEFAULT 0,

-> track_id SMALLINT(3) NOT NULL DEFAULT 0,

-> sequence_id SMALLINT(3) AUTO_INCREMENT NOT NULL,

-> PRIMARY KEY (sequence_id));

Query OK, 0 rows affected (0.01 sec)

You can download these instructions from the the file shuffle.sql on the book’s web site. This table stores the details of the track, allowing you to find the artist, album, and track names using simple queries on the other tables. It also stores a sequence_id, which is a unique number that enumerates where the track is in your playlist. When you first start using the shuffle feature, you’ll listen to the track with a sequence_id of 1, then track 2, and so on. When we get to track 999, we can have our application reset the counter and table so it starts again at 1. Our reasoning is that after you’ve heard 999 tracks, it doesn’t matter if you start hearing the same ones again. You can see that we’re using the MySQL auto_increment feature to allocate the sequence_id values.

Now we need to fill up our new shuffle table with a random selection of tracks. Importantly, we’re going to do the SELECT and INSERT together in one statement. Here we go:

mysql> INSERT INTO shuffle (artist_id, album_id, track_id)

-> SELECT artist_id, album_id, track_id FROM

-> track ORDER BY RAND() LIMIT 10;

Query OK, 10 rows affected (0.07 sec)

Records: 10 Duplicates: 0 Warnings: 0

Now, let’s investigate what happened before we explain how this command works:

mysql> SELECT * FROM shuffle;

+-----------+----------+----------+-------------+

| artist_id | album_id | track_id | sequence_id |

+-----------+----------+----------+-------------+

| 1 | 7 | 0 | 1 |

| 3 | 1 | 3 | 2 |

| 1 | 3 | 10 | 3 |

| 6 | 1 | 1 | 4 |

| 4 | 1 | 8 | 5 |

| 1 | 7 | 1 | 6 |

| 1 | 1 | 4 | 7 |

| 2 | 1 | 6 | 8 |

| 1 | 6 | 0 | 9 |

| 4 | 1 | 1 | 10 |

+-----------+----------+----------+-------------+

10 rows in set (0.00 sec)

You can see that we got 10 tracks into our shuffle playlist, numbered with sequence_id values from 1 to 10. We’re ready to start playing the shuffled tracks!

Let’s discuss how the command works. There are two parts to the SQL statement: an INSERT INTO and a SELECT. The INSERT INTO statement lists the destination table into which the data will be stored, followed by an optional list of column names in parentheses; if you omit the column names, all columns in the destination table are assumed in the order they appear in a DESCRIBE TABLE or SHOW CREATE TABLE statement. The SELECT statement outputs a list of columns that must match the type and order of the list provided for the INSERT INTO statement (or the implicit, complete list if one isn’t provided). The overall effect is that the rows output from the SELECT statement are inserted into the destination table by the INSERT INTO statement. In our example, artist_id, album_id, and track_id values from the track table are inserted into the three columns with the same names and types in the shuffle table; the sequence_id is automatically created using MySQL’s AUTO_INCREMENT feature, and so isn’t specified in the statements.

Our example includes the clause ORDER BY RAND(); this orders the results according to the MySQL function RAND(). The RAND() function returns a pseudorandom number in the range 0 to 1:

mysql> SELECT RAND();

+------------------+

| RAND() |

+------------------+

| 0.34423927529178 |

+------------------+

1 row in set (0.00 sec)

A pseudorandom number generator doesn’t generate truly random numbers, but rather generates numbers based on some property of the system, such as the time of day; this is sufficiently random for most applications. A notable exception is cryptography applications that depend on the true randomness of numbers for security.

If you ask for the RAND() value in a SELECT operation, you’ll get a random value for each returned row:

mysql> SELECT *, RAND() FROM artist;

+-----------+---------------------------+------------------+

| artist_id | artist_name | RAND() |

+-----------+---------------------------+------------------+

| 1 | New Order | 0.866806439 |

| 2 | Nick Cave & The Bad Seeds | 0.66403617492322 |

| 3 | Miles Davis | 0.71976158834972 |

| 4 | The Rolling Stones | 0.60669944771258 |

| 5 | The Stone Roses | 0.8742125042474 |

| 6 | Kylie Minogue | 0.55096420883291 |

+-----------+---------------------------+------------------+

6 rows in set (0.00 sec)

Since the values are effectively random, you’ll almost certainly see different results than we’ve shown here. Let’s return to the INSERT operation. When we ask that the results be ordered by RAND(), the results of the SELECT statement are sorted in a pseudorandom order.

The LIMIT 10 is there to limit the number of rows returned by the SELECT; we’ve limited in this example simply for readability, but in practice you’d limit it to 999 because that’s the maximum sequence_id you want to use.

The SELECT statement in an INSERT INTO statement can use all of the features of SELECT statements. You can use joins, aggregation, functions, and any other features you choose. You can also query data from one database into another, by prefacing the table names with the database name followed by a period (.) character. For example, if you wanted to insert the artist table from the music database into a new art database, you could do the following:

mysql> CREATE DATABASE art;

Query OK, 1 row affected (0.01 sec)

mysql> USE art;

Database changed

mysql> CREATE TABLE people (

-> people_id SMALLINT(4) NOT NULL,

-> name CHAR(128) NOT NULL,

-> PRIMARY KEY (people_id));

Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO art.people (people_id, name)

-> SELECT artist_id, artist_name FROM music.artist;

Query OK, 8 rows affected (0.00 sec)

Records: 8 Duplicates: 0 Warnings: 0

You can see that the new people table is referred to as art.people (though it doesn’t need to be, since art is the database that’s currently in use), and the artist table is referred to as music.artist (which it needs to be, since it isn’t the database being used). Note also that the column names don’t need to be the same for the SELECT and the INSERT.

Sometimes, you’ll encounter duplication issues when inserting with a SELECT statement. This occurs if you try to insert the same primary key value twice; it won’t happen in the shuffle table, as long as you automatically allocate a new sequence_id using the auto_increment feature. However, when you try to insert duplicate key values, MySQL will abort. Let’s force a duplicate into the shuffle table to show the 'margin-top:0cm;margin-right:0cm;margin-bottom:0cm; margin-left:20.0pt;margin-bottom:.0001pt;line-height:normal;vertical-align: baseline'>mysql> USE music;

Database changed

mysql> INSERT INTO shuffle (artist_id, album_id, track_id, sequence_id)

-> SELECT artist_id, album_id, track_id, 1 FROM track LIMIT 1;

ERROR 1062 (23000): Duplicate entry '1' for key 1

If you want MySQL to ignore this and keep going, add an IGNORE keyword after the INSERT:

mysql> INSERT IGNORE INTO shuffle (artist_id, album_id, track_id, sequence_id)

-> SELECT artist_id, album_id, track_id, 1 FROM track LIMIT 1;

Query OK, 0 rows affected (0.00 sec)

Records: 1 Duplicates: 1 Warnings: 0

MySQL doesn’t complain, but it does report that it encountered a duplicate. Prior to MySQL 4.0.1, the IGNORE mode was the default behavior, but for later versions, you have to add the keyword if you want duplicates to be ignored.

Finally, note that for versions of MySQL older than 4.0.14, you couldn’t insert into a table that’s listed in the SELECT statement, since the SELECT would find the newly inserted rows and try to insert them again. On newer systems, you still need to avoid duplicate primary keys:

mysql> INSERT INTO artist SELECT artist_id,artist_name FROM artist;

ERROR 1062 (23000): Duplicate entry '1' for key 1

but you can modify values in the SELECT statement to get a different primary key value and insert it back into the same table:

mysql> INSERT INTO artist SELECT 10*artist_id,artist_name FROM artist;

Query OK, 6 rows affected (0.00 sec)

Records: 6 Duplicates: 0 Warnings: 0

Here, we’re copying the rows but multiplying their artist_ids by 10 before we insert them. This is the result:

mysql> SELECT * FROM artist;

+-----------+---------------------------+

| artist_id | artist_name |

+-----------+---------------------------+

| 1 | New Order |

| 2 | Nick Cave & The Bad Seeds |

| 3 | Miles Davis |

| 4 | The Rolling Stones |

| 5 | The Stone Roses |

| 6 | Kylie Minogue |

| 60 | Kylie Minogue |

| 50 | The Stone Roses |

| 40 | The Rolling Stones |

| 30 | Miles Davis |

| 20 | Nick Cave & The Bad Seeds |

| 10 | New Order |

+-----------+---------------------------+

12 rows in set (0.01 sec)

Loading Data from Comma-Delimited Files

Databases are sometimes an afterthought. In fact, a staggeringly large amount of time spent by IT professionals is devoted to reformatting data from one application to suit another. It’s very common, for example, to store data using a spreadsheet program such as Microsoft Excel orOpenOffice Calc, only to realize later—when you’re swamped with data—that a relational database would have been a better choice. Most spreadsheet programs allow you to export data as rows of comma-separated values (CSV), often also referred to as comma-delimited format (CDF). You can then import the data with a little effort into MySQL.

If you need to import large numbers of spreadsheet files, you could use the xls2csv script (http://search.cpan.org/~ken/xls2csv) to automate the conversion from the Excel spreadsheet files to text files of comma-separated values.

If you’re not using a spreadsheet program, you can still often use tools such as sed and awk to convert text data into a CSV format suitable for import by MySQL. This section shows you the basics of how to import CSV data into MySQL.

List of Australian academics stored in a spreadsheet file

Figure 8-1. List of Australian academics stored in a spreadsheet file

Let’s work through an example. We have a list of Australian academics with their university affiliation that we want to store in a database. At present, it’s stored in a spreadsheet workbook file named academics.xls and has the format shown in Figure 8-1. You can see that the surname is stored in the first column, one or more given names and initials in the second column, and their affiliation in the third column. This example is formulated from a file that is publicly available at http://www.cs.jcu.edu.au/acsadb/nameonly_db.html, and the workbook format example is available from the book’s web site.

Saving the academics.xls file as values with a comma or other character as a delimiter is easy in most spreadsheet programs. In most versions of Microsoft Excel, you click on the File menu, then select Save As, and then choose “CSV (Comma delimited)” for the “Save as type” field. If you’re using OpenOffice or StarOffice, follow the same steps, but choose “Text CSV (.csv)” for the “File type” field. When you save the file, you’ll find it has the same name as the workbook (in this case, academics) but with the extension .csv.

If you open the file using a text editor (we discussed how to use a text editor in Using a Text Editor” in Chapter 2), you’ll see the result: the file has one line per spreadsheet row, with the value for each column separated by a comma. If you’re on a non-Windows platform, you may find each line terminated with a ^M, but don’t worry about this; it’s an artifact of the origins of Windows. Data in this format is often referred to as DOS format, and most software applications can handle it without problem. Here are a few lines selected from academics.csv:

Abramson,David,Griffith University

Addie,Ron,University of Southern Queensland

Al-Qaimari,Ghassan,Royal Melbourne Institute of Technology

Allen,Greg,James Cook University

Allen,Robert,Swinburne University of Technology

Anderson,Gerry,University of Ballarat

Armarego,Jocelyn,Curtin University of Technology

Ashenden,Peter,University of Adelaide

Atiquzzaman,M,La Trobe University

Backhouse,Jenny,"University College, ADFA, UNSW"

If there are commas within values, the whole value is enclosed in quotes, as in the last line shown here.

Let’s import this data into MySQL. First, create the new academics database:

mysql> CREATE DATABASE academics;

and choose this as the active database:

mysql> USE academics;

Now, create the details table to store the data. This needs to handle three fields: the surname, the given names, and the institution:

mysql> CREATE TABLE details (surname CHAR(40), given_names CHAR(40),

institution CHAR(40));

We’ve allocated 40 characters for each field.

Now that we’ve set up the database table, we can import the data from the file using the LOAD DATA INFILE command:

mysql> LOAD DATA INFILE 'academics.csv' INTO TABLE details FIELDS TERMINATED BY ',';

If the academics.csv file isn’t in the current directory, you’ll need to specify the full path—for example, /home/adam/academics.csv or C:\academics.csv. The MySQL server must have permission to read this file; for example, if the server is running as the user mysql on a Linux or Mac OS X system, the datafile must have its permissions set such that this user can read it.

The clause FIELDS TERMINATED BY ',' specifies the character that delimits the field values in the text file. For example, if you have a file called academics.colon_sv with values separated by colons, you can import it by specifying the colon as the field terminator:

mysql> LOAD DATA INFILE 'academics.colon_sv' INTO

-> TABLE details FIELDS TERMINATED BY ':';

Writing Data into Comma-Delimited Files

You can use the SELECT INTO OUTFILE statement to write out the result of a query into a comma-separated values (CSV) file that can be opened by a spreadsheet or other program.

Let’s export the list of artists from our music database into a CSV file. The query used to list all the artists is shown below:

mysql> USE music;

Database changed

mysql> SELECT artist_name, album_name FROM

-> artist, album WHERE artist.artist_id=album.artist_id;

+---------------------------+------------------------------------------+

| artist_name | album_name |

+---------------------------+------------------------------------------+

| New Order | Retro - John McCready FAN |

| New Order | Substance (Disc 2) |

| New Order | Retro - Miranda Sawyer POP |

| New Order | Retro - New Order / Bobby Gillespie LIVE |

| New Order | Power, Corruption & Lies |

| New Order | Substance 1987 (Disc 1) |

| New Order | Brotherhood |

| Nick Cave & The Bad Seeds | Let Love In |

| Miles Davis | Live Around The World |

| Miles Davis | In A Silent Way |

| The Rolling Stones | Exile On Main Street |

| The Stone Roses | Second Coming |

| Kylie Minogue | Light Years |

+---------------------------+------------------------------------------+

13 rows in set (0.10 sec)

We can change this SELECT query slightly to write this data into an output file as comma-separated values:

mysql> SELECT artist_name, album_name FROM

-> artist, album WHERE artist.artist_id=album.artist_id

-> INTO OUTFILE '/tmp/artists_and_albums.csv' FIELDS TERMINATED BY ',';

Query OK, 13 rows affected (0.02 sec)

Here, we’ve saved the results into the file artists_and_albums.csv in the /tmp directory; the MySQL server must be able to write to the directory that you specify. On a Windows system, specify a path such as C:\artists_and_albums.csv instead. If you omit the FIELDS TERMINATED BYclause, the server will use tabs as the default separator between the data values.

You can view the contents of the file artists_and_albums.csv in a text editor, or import it into a spreadsheet program:

New Order,Retro - John McCready FAN

New Order,Substance (Disc 2)

New Order,Retro - Miranda Sawyer POP

New Order,Retro - New Order / Bobby Gillespie LIVE

New Order,Power\, Corruption & Lies

New Order,Substance 1987 (Disc 1)

New Order,Brotherhood

Nick Cave & The Bad Seeds,Let Love In

Miles Davis,Live Around The World

Miles Davis,In A Silent Way

The Rolling Stones,Exile On Main Street

The Stone Roses,Second Coming

Kylie Minogue,Light Years

Notice how the comma in Power, Corruption & Lies has been automatically escaped with a backslash to distinguish it from the separator. Spreadsheet programs understand this and remove the backslash when importing the file.

Creating Tables with Queries

You can create a table or easily create a copy of a table using a query. This is useful when you want to build a new database using existing data—for example, you might want to copy across a list of countries—or when you want to reorganize data for some reason. Data reorganization is common for producing reports, merging data from two or more tables, and redesigning on the fly. This short section shows you how it’s done.

From MySQL 4.1 onward, you can easily duplicate the structure of a table using a variant of the CREATE TABLE syntax:

mysql> CREATE TABLE artist_2 LIKE artist;

Query OK, 0 rows affected (0.24 sec)

mysql> DESCRIBE artist_2;

+-------------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+-------------+-------------+------+-----+---------+-------+

| artist_id | smallint(5) | | PRI | 0 | |

| artist_name | char(128) | YES | | NULL | |

+-------------+-------------+------+-----+---------+-------+

2 rows in set (0.09 sec)

mysql> SELECT * FROM artist_2;

Empty set (0.30 sec)

The LIKE syntax allows you to create a new table with exactly the same structure as another, including keys. You can see that it doesn’t copy the data across. You can also use the IF NOT EXISTS and TEMPORARY features with this syntax.

If you want to create a table and copy some data, you can do that with a combination of the CREATE TABLE and SELECT statements. Let’s remove the artist_2 table and re-create it using this new approach:

mysql> DROP TABLE artist_2;

Query OK, 0 rows affected (0.08 sec)

mysql> CREATE TABLE artist_2 SELECT * from artist;

Query OK, 7 rows affected (0.02 sec)

Records: 7 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM artist_2;

+-----------+-----------------------------+

| artist_id | artist_name |

+-----------+-----------------------------+

| 1 | New Order |

| 2 | Nick Cave and The Bad Seeds |

| 3 | Miles Dewey Davis |

| 4 | The Rolling Stones |

| 5 | The Stone Roses |

| 6 | Kylie Minogue |

| 10 | Jane's Addiction |

+-----------+-----------------------------+

7 rows in set (0.01 sec)

An identical table artist_2 is created, and all of the data is copied across by the SELECT statement.

This technique is powerful. You can create new tables with new structures and use powerful queries to populate them with data. For example, here’s a report table that’s created to contain the names of artists and albums in our database:

mysql> CREATE TABLE report (artist_name CHAR(128), album_name CHAR(128))

-> SELECT artist_name, album_name FROM artist INNER JOIN album

-> USING (artist_id);

Query OK, 13 rows affected (0.45 sec)

Records: 13 Duplicates: 0 Warnings: 0

You can see that the syntax is a little different from the previous example. In this example, the new table name, report, is followed by a list of column names and types in parentheses; this is necessary because we’re not duplicating the structure of an existing table. Then, the SELECTstatement follows, with its output matching the new columns in the new table. You can check the contents of the new table:

mysql> SELECT * FROM report;

+-----------------------------+------------------------------------------+

| artist_name | album_name |

+-----------------------------+------------------------------------------+

| New Order | Retro - John McCready FAN |

| New Order | Substance (Disc 2) |

| New Order | RETRO - MIRANDA SAWYER POP |

| New Order | Retro - New Order / Bobby Gillespie LIVE |

| New Order | Power, Corruption & Lies |

| New Order | Substance 1987 (Disc 1) |

| New Order | Brotherhood |

| Nick Cave and The Bad Seeds | Let Love In |

| Miles Dewey Davis | LIVE AROUND THE WORLD |

| Miles Dewey Davis | In A Silent Way |

| The Rolling Stones | Exile On Main Street |

| The Stone Roses | Second Coming |

| Kylie Minogue | Light Years |

+-----------------------------+------------------------------------------+

13 rows in set (0.00 sec)

So, in this example, the artist_name and album_name values from the SELECT statement are used to populate the new artist_name and album_name columns in the report table.

Creating tables with a query has a major caveat that you need to be careful about. It doesn’t copy the indexes (or foreign keys, if you use them); this is a feature, since it gives you a lot of flexibility, but it can be a catch if you forget. Have a look at our artist_2 example:

mysql> DESCRIBE artist_2;

+-------------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+-------------+-------------+------+-----+---------+-------+

| artist_id | smallint(5) | | | 0 | |

| artist_name | char(128) | YES | | NULL | |

+-------------+-------------+------+-----+---------+-------+

2 rows in set (0.31 sec)

mysql> SHOW CREATE TABLE artist_2;

+----------+---------------------------------------------------+

| Table | Create Table |

+----------+---------------------------------------------------+

| artist_2 | CREATE TABLE `artist_2` (

`artist_id` smallint(5) NOT NULL default '0',

`artist_name` char(128) default NULL)

ENGINE=MyISAM DEFAULT CHARSET=latin1 |

+----------+---------------------------------------------------+

1 row in set (0.33 sec)

You can see that there’s no primary key; if there had been other keys, they’d be missing too.

To copy indexes across to the new table, there are at least three things you can do. The first is to use the LIKE statement to create the empty table with the indexes, as described earlier and then copy the data across using an INSERT with a SELECT statement as described earlier in this chapter in Inserting Data Using Queries.”

The second thing you can do is to use CREATE TABLE with a SELECT statement, and then add indexes using ALTER TABLE as described in Chapter 6.

The third way is to use the UNIQUE (or PRIMARY KEY or KEY) keyword in combination with the CREATE TABLE and SELECT to add a primary-key index. Here’s an example of this approach:

mysql> CREATE TABLE artist_2 (UNIQUE(artist_id))

-> SELECT * FROM artist;

Query OK, 7 rows affected (0.27 sec)

Records: 7 Duplicates: 0 Warnings: 0

mysql> DESCRIBE artist_2;

+-------------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+-------------+-------------+------+-----+---------+-------+

| artist_id | smallint(5) | | PRI | 0 | |

| artist_name | char(128) | YES | | NULL | |

+-------------+-------------+------+-----+---------+-------+

2 rows in set (0.26 sec)

The UNIQUE keyword is applied to the artist_id column, making it the primary key in the newly created table. The keywords UNIQUE and PRIMARY KEY can be interchanged.

You can use different modifiers when you’re creating tables using these techniques. For example, here’s a table created with defaults and other settings:

mysql> CREATE TABLE artist_3

-> (artist_id SMALLINT(5) NOT NULL AUTO_INCREMENT,

-> artist_name CHAR(128) NOT NULL DEFAULT "New Order",

-> PRIMARY KEY (artist_id), KEY (artist_name))

-> SELECT * FROM artist;

Query OK, 7 rows affected (0.31 sec)

Records: 7 Duplicates: 0 Warnings: 0

Here, we’ve set NOT NULL for the new columns, used the AUTO_INCREMENT feature on artist_id, and created two keys. Anything you can do in a regular CREATE TABLE statement can be done in this variant; just remember to add those indexes explicitly!

Updates and Deletes with Multiple Tables

In Chapter 5, we showed you how to update and delete data. In the examples there, each update and delete affected one table and used properties of that table to decide what to modify. This section shows you more complex updates and deletes, with which you can delete or update rows from more than one table in one statement and can use those or other tables to decide what rows to change.

Deletion

Imagine you’ve just run out of disk space or you’re sick of browsing unwanted data in your music collection. One way to solve this problem is to remove some data, and it’d make sense to remove tracks you’ve never listened to. Unfortunately, this means you need to remove data from thetrack table using information from the played table.

With the techniques we’ve described so far in the book, there’s no way of doing this without creating a table that combines the two tables (perhaps using INSERT with SELECT), removing unwanted rows, and copying the data back to its source. In fact, this is exactly what you had to do prior to MySQL 4.0. This section shows you how you can perform this procedure and other more advanced types of deletion in recent versions of MySQL.

Consider the query you need to write to find tracks you’ve never played. One way to do it is to use a nested query—following the techniques we showed you in Chapter 7—with the NOT EXISTS clause. Here’s the query:

mysql> SELECT track_name FROM track WHERE NOT EXISTS

-> (SELECT * FROM played WHERE

-> track.artist_id = played.artist_id AND

-> track.album_id = played.album_id AND

-> track.track_id = played.track_id);

+----------------------------+

| track_name |

+----------------------------+

| Do You Love Me? |

| Nobody's Baby Now |

| Loverman |

| Jangling Jack |

| Red Right Hand |

| I Let Love In |

| Thirsty Dog |

| Ain't Gonna Rain Anymore |

| Lay Me Low |

| Do You Love Me? (Part Two) |

...

+----------------------------+

We’ve shown only 10 tracks from the output, but there are actually 142 tracks we’ve never listened to. You can probably see how the query works, but let’s briefly discuss it anyway before we move on. You can see it uses a correlated subquery, where the current row being processed in the outer query is referenced by the subquery; you can tell this because three columns from track are referenced, but the track table isn’t listed in the FROM clause of the subquery. The subquery produces output when there’s a row in the played table that matches the current row in the outer query (and so there’s a track that’s been played). However, since the query uses NOT EXISTS, the outer query doesn’t produce output when this is the case, and so the overall result is that rows are output for tracks that haven’t been played.

Now let’s take our query and turn it into a DELETE statement. Here it is:

mysql> DELETE track FROM track WHERE NOT EXISTS

-> (SELECT * FROM played WHERE track.artist_id = played.artist_id AND

-> track.album_id = played.album_id AND

-> track.track_id = played.track_id);

Query OK, 142 rows affected (0.01 sec)

You can see that the subquery remains the same, but the outer SELECT query is replaced by a DELETE statement. The DELETE statement syntax is as follows: first, the keyword DELETE is followed by the table or tables from which rows should be removed; second, the keyword FROM is followed by the table or tables that should be queried to determine which rows to delete; and, last, a WHERE clause (and any other query clauses, such as GROUP BY or HAVING) follow. In this query, rows are deleted from the track table using the track table in the query along with theplayed table in the nested subquery.

As another example, let’s clean up our database to remove albums and tracks by the band New Order:

mysql> DELETE FROM track, album USING artist, album, track WHERE

-> artist_name = "New Order" AND

-> artist.artist_id = album.artist_id AND

-> artist.artist_id = track.artist_id AND

-> album.album_id = track.album_id;

Query OK, 93 rows affected (0.00 sec)

This query deletes rows from track and album, based on a query that involves artist, album, and track. You can see the result is that 93 rows are removed: 7 albums and 86 tracks.

In this syntax, the keywords DELETE FROM are followed by the table or tables from which you want to delete rows. The keyword USING then follows with a list of tables that are used in the query part of the statement (and then the WHERE clause or other associated query mechanisms).

With MySQL versions between 4.0 and 4.02, you had to use the following syntax:

mysql> DELETE track, album FROM artist, album, track WHERE

-> artist_name = "New Order" AND

-> artist.artist_id = album.artist_id AND

-> artist.artist_id = track.artist_id AND

-> album.album_id = track.album_id;

Query OK, 93 rows affected (0.10 sec)

The query identifies the artist_id of "New Order" and performs a join between the tables.

We prefer the newer syntax because it is clearer: DELETE FROM some tables USING other tables to drive the querying process.

Note that you can use clauses such as LEFT JOIN and INNER JOIN in DELETE statements. However, you can’t delete from a table that’s read from in a nested subquery, such as in the following line:

mysql> DELETE FROM artist WHERE artist_id IN (SELECT artist_id FROM artist);

ERROR 1093 (HY000): You can't specify target table 'artist' for update in

FROM clause

In multiple table deletes, you can’t use ORDER BY or LIMIT clauses.

Updates

Now we’ll contrive an example using the music database to illustrate multiple-table updates. We’ve decided to highlight albums we’ve played. Our method of highlighting is to change the album’s name to all capital letters. To begin, let’s display albums we’ve played:

mysql> SELECT DISTINCT album_name FROM

-> album INNER JOIN track USING (artist_id, album_id)

-> INNER JOIN played USING (artist_id, album_id, track_id);

+----------------------------+

| album_name |

+----------------------------+

| Retro - Miranda Sawyer POP |

| Live Around The World |

+----------------------------+

2 rows in set (0.00 sec)

Now, let’s put that query into an UPDATE statement:

mysql> UPDATE album INNER JOIN track USING (artist_id, album_id)

-> INNER JOIN played USING (artist_id, album_id, track_id)

-> SET album_name = UPPER(album_name);

Query OK, 2 rows affected (0.01 sec)

Rows matched: 11 Changed: 2 Warnings: 0

Let’s look at the syntax: a multiple-table update looks similar to a SELECT query. The UPDATE statement is followed by a list of tables that incorporates whatever join clauses you need or prefer; in this example, we’ve used INNER JOIN to bring together the artist, album, and tracktables. This is followed by the keyword SET, with assignments to individual columns; in this example, you can see that only one column is modified (to put the album name in uppercase), so columns in all other tables besides album aren’t modified. An optional WHERE may in turn follow (but doesn’t in this example, since the USING clause does it for us).

To illustrate using a WHERE clause, here’s the previous query rewritten with the join expressed using WHERE:

mysql> UPDATE artist, album, track, played

-> SET album_name = UPPER(album_name)

-> WHERE artist.artist_id = album.artist_id AND

-> album.artist_id = track.artist_id AND

-> album.album_id = track.album_id AND

-> track.artist_id = played.artist_id AND

-> track.album_id = played.album_id AND

-> track.track_id = played.track_id;

Query OK, 2 rows affected (0.00 sec)

Rows matched: 11 Changed: 2 Warnings: 0

The method that you choose to use is just personal preference, and that might be driven by the amount of typing you’re prepared to do!

As with multiple-table deletes, there are some limitations on updates:

§ You can’t use ORDER BY.

§ You can’t use LIMIT.

§ You can’t update a table that’s read from in a nested subquery.

Other than that, multiple-table updates are much the same as single-table ones.

Replacing Data

You’ll sometimes want to overwrite data. You can do this in two ways using the techniques we’ve shown previously:

§ Delete an existing row using its primary key and then insert a new replacement with the same primary key.

§ Update a row using its primary key, replacing some or all of the values (except the primary key).

The REPLACE statement gives you a third, convenient way to change data. This section explains how it works.

The REPLACE statement is just like INSERT, but with one difference. You can’t INSERT a new row if there is an existing row in the table with the same primary key, You can get around this problem with a REPLACE query, which first removes any existing row with the same primary key and then inserts the new one.

Let’s try an example, where we’ll replace the row for "Nick Cave & The Bad Seeds":

mysql> REPLACE artist VALUES (2, "Nick Cave and The Bad Seeds");

Query OK, 2 rows affected (0.02 sec)

You can see that MySQL reports that two rows were affected: first, the old row was deleted, and, second, the new row was inserted. You can see that the change we made was minor—we just changed the & to an “and”—and therefore, it could easily have been accomplished with an UPDATE. Because the tables in the music database contain few columns, it’s difficult to illustrate an example in which REPLACE looks simpler than UPDATE.

You can use the different INSERT syntaxes with REPLACE, including using SELECT queries. Here are some examples:

mysql> REPLACE INTO artist VALUES (2, "Nick Cave and The Bad Seeds");

Query OK, 2 rows affected (0.00 sec)

mysql> REPLACE INTO artist (artist_id, artist_name)

-> VALUES (2, "Nick Cave and The Bad Seeds");

Query OK, 2 rows affected (0.00 sec)

mysql> REPLACE artist (artist_id, artist_name)

-> VALUES (2, "Nick Cave and The Bad Seeds");

Query OK, 2 rows affected (0.01 sec)

mysql> REPLACE artist SET artist_id = 2,

-> artist_name = "Nick Cave and The Bad Seeds";

Query OK, 2 rows affected (0.00 sec)

The first variant is almost identical to our previous example, except it includes the optional INTO keyword (which, arguably, improves the readability of the statement). The second variant explicitly lists the column names that the matching values should be inserted into. The third variant is the same as the second, without the optional INTO keyword. The final variant uses the SET syntax; you can add the optional keyword INTO to this variant if you want. Note that if you don’t specify a value for a column, it’s set to its default value, just like for INSERT.

You can also bulk-replace into a table, removing and inserting more than one row. Here’s an example:

mysql> REPLACE artist (artist_id, artist_name)

-> VALUES (2, "Nick Cave and The Bad Seeds"),

-> (3, "Miles Dewey Davis");

Query OK, 4 rows affected (0.00 sec)

Records: 2 Duplicates: 2 Warnings: 0

Note that four rows are affected: two deletions and two insertions. You can also see that two duplicates were found, meaning the replacement of existing rows succeeded. In contrast, if there isn’t a matching row in a REPLACE statement, it acts just like an INSERT:

mysql> REPLACE INTO artist VALUES (10, "Jane's Addiction");

Query OK, 1 row affected (0.22 sec)

You can tell that only the insert occurred, since only one row was affected.

Replacing also works with a SELECT statement. Recall the shuffle table from Inserting Data Using Queries,” at the beginning of this chapter. Suppose you’ve added 10 tracks to it, but you don’t like the choice of the seventh track in the playlist. Here’s how you can replace it with a random choice of another track:

mysql> REPLACE INTO shuffle (artist_id, album_id, track_id, sequence_id)

-> SELECT artist_id, album_id, track_id, 7 FROM

-> track ORDER BY RAND() LIMIT 1;

Query OK, 2 rows affected (0.01 sec)

Records: 1 Duplicates: 1 Warnings: 0

Again, the syntax is the same as with INSERT, but a deletion is attempted (and succeeds!) before the insertion. Note that we keep the value of the sequence_id as 7.

If a table doesn’t have a primary key, replacing doesn’t make sense. This is because there’s no way of uniquely identifying a matching row in order to delete it. When you use REPLACE on such a table, its behavior is identical to INSERT. Also, as with INSERT, you can’t replace rows in a table that’s used in a subquery. Finally, note the difference between INSERT IGNORE and REPLACE: the first keeps the existing data with the duplicate key and does not insert the new row, while the second deletes the existing row and replaces it with the new one.

The EXPLAIN Statement

You’ll sometimes find that MySQL doesn’t run queries as quickly as you expect. For example, you’ll often find that a nested query runs slowly. You might also find—or, at least, suspect—that MySQL isn’t doing what you hoped, because you know an index exists but the query still seems slow. You can diagnose and solve query optimization problems using the EXPLAIN statement.

The EXPLAIN statement helps you learn about a SELECT query. Specifically, it tells you how MySQL is going to do the job in terms of the indexes, keys, and steps it’ll take if you ask it to resolve a query. Let’s try a simple example that illustrates the idea:

mysql> EXPLAIN SELECT * FROM artist;

+---+------------+-------+-----+--------------+-----+--------+-----+-----+------+

|id |select_type |table |type |possible_keys |key |key_len |ref |rows |Extra |

+---+------------+-------+-----+--------------+-----+--------+-----+-----+------+

| 1 |SIMPLE |artist |ALL |NULL |NULL | NULL |NULL | 6 | |

+---+------------+-------+-----+--------------+-----+--------+-----+-----+------+

1 row in set (0.10 sec)

The statement gives you lots of information. It tells you in this example that:

§ The id is 1, meaning the row in the output refers to the first (and only!) SELECT statement in this query. In the query:

SELECT * FROM artist WHERE artist_id in (SELECT artist_id FROM played);

each SELECT statement will have a different id in the EXPLAIN output.

§ The select_type is SIMPLE, meaning it doesn’t use a UNION or subqueries.

§ The table that this row is referring to is artist.

§ The type of join is ALL, meaning all rows in the table are processed by this SELECT statement. This is often bad—but not in this case—and we’ll explain why later.

§ The possible_keys that could be used are listed. In this case, no index will help find all rows in a table, so NULL is reported.

§ The key that is actually used is listed, taken from the list of possible_keys. In this case, since no key is available, none is used.

§ The key_len (key length) of the key MySQL plans to use is listed. Again, no key means a NULL key_len is reported.

§ The ref (reference) columns or constants that are used with the key is listed. Again, none in this example.

§ The rows that MySQL thinks it needs to process to get an answer are listed.

§ Any Extra information about the query resolution is listed. Here, there’s none.

In summary, the output tells you that all rows from the artist table will be processed (there are six of them), and no indexes will be used to resolve the query. This makes sense and is probably exactly what you expected would happen.

We’ll now give the EXPLAIN statement some work to do. Let’s ask it to explain an INNER JOIN between artist and album:

mysql> EXPLAIN SELECT * FROM artist INNER JOIN album USING (artist_id);

+----+-------------+--------+------+---------------+...

| id | select_type | table | type | possible_keys |...

+----+-------------+--------+------+---------------+...

| 1 | SIMPLE | artist | ALL | PRIMARY |...

| 1 | SIMPLE | album | ref | PRIMARY |...

+----+-------------+--------+------+---------------+...

...+---------+---------+------------------------+------+-------+

...| key | key_len | ref | rows | Extra |

...+---------+---------+------------------------+------+-------+

...| | | | 6 | |

...| PRIMARY | 2 | music.artist.artist_id | 1 | |

...+---------+---------+------------------------+------+-------+

2 rows in set (0.01 sec)

Before we discuss the output, think about how the query could be evaluated. MySQL could go through each row in the artist table and look up the album table to see what rows match. Or it could go through each row in the album table and look up the artist table to see what rows match. Let’s see what MySQL has decided to do. This time, there are two rows because there are two tables in the join. Let’s run through this, focusing on those things that are different from the previous example:

§ The first row is basically identical to the previous example. All rows in the artist table are processed, so MySQL has decided that the same method of solving the query is its preferred way here, too.

§ The join type for the album table is ref, meaning that all rows in the album table that match rows in the artist table will be read. In practice, this means one or more rows from the album table will be read for each artist_id.

§ The possible_keys for artist and album are both only the PRIMARY key. A key isn’t used in artist (because we’re scanning the whole table), but the key used for album is that table’s PRIMARY key

§ The primary key used to search album has a key_len of 2 and is searched using the music.artist.artist_id value from the artist table

Again, this seems like a sensible strategy, and it fits with what we thought about in our design of the database.

Exercises

1. Write the monitor command to import the file academics.tsv, which has its values separated by tabs, into the details table. Hint: the tab character is shown with the \t escape sequence.

2. When would you need to insert data using a query?

3. What’s the difference between REPLACE and INSERT IGNORE?

4. What can you tell from this output produced by the EXPLAIN command?

5. +----+-------------+------------+-------+---------------+---------+...

6. | id | select_type | table | type | possible_keys | key |...

7. +----+-------------+------------+-------+---------------+---------+...

8. | 1 | SIMPLE | supervisor | const | PRIMARY | PRIMARY |...

9. | 1 | SIMPLE | student | ALL | NULL | NULL |...

10.| 1 | SIMPLE | supervises | index | NULL | PRIMARY |...

11.+----+-------------+------------+-------+---------------+---------+...

12.... +---------+-------+------+------------------------------------+

13.... | key_len | ref | rows | Extra |

14.... +---------+-------+------+------------------------------------+

15.... | 4 | const | 1 | Using index; Using temporary |

16.... | NULL | NULL | 95 | |

17.... | 12 | NULL | 570 | Using where; Using index; Distinct |

18.... +---------+-------+------+------------------------------------+

3 rows in set (0.00 sec)

19.What can you tell from this output produced by the EXPLAIN command?

20.+-----+--------------+------------+--------+---------------+---------+---------+...

21.| id | select_type | table | type | possible_keys | key | key_len |...

22.+-----+--------------+------------+--------+---------------+---------+---------+...

23.| 1 | PRIMARY | played | index | PRIMARY | PRIMARY | 10 |...

24.| 1 | PRIMARY | track | eq_ref | PRIMARY | PRIMARY | 6 |...

25.| 2 | UNION | played | index | PRIMARY | PRIMARY | 10 |...

26.| 2 | UNION | track | eq_ref | PRIMARY | PRIMARY | 6 |...

27.| NULL| UNION RESULT | <union1,2> | ALL | | | |...

28.+-----+--------------+------------+--------+---------------+---------+---------+...

29....+--------------------------------------------------------------------+------+...

30....| ref | rows |...

31....+--------------------------------------------------------------------+------+...

32....| | 12 |...

33....| music.played.artist_id,music.played.album_id,music.played.track_id | 1 |...

34....| | 12 |...

35....| music.played.artist_id,music.played.album_id,music.played.track_id | 1 |...

36....| | NULL |...

37....+--------------------------------------------------------------------+------+...

38....+-----------------------------+

39....| Extra |

40....+-----------------------------+

41....| Using index; Using filesort |

42....| |

43....| Using index; Using filesort |

44....| |

45....| |

46....+-----------------------------+

5 rows in set (0.01 sec)