You Are Not Alone: Concurrency Issues - MySQL Troubleshooting (2012)

MySQL Troubleshooting (2012)

Chapter 2. You Are Not Alone: Concurrency Issues

MySQL is rarely deployed in a single-user environment. Usually it handles many connection threads doing different jobs for different people at the same time. These parallel connections can access the same databases and tables, so it is hard to know what the state of the database was when a particular connection had a problem.

This chapter describes issues caused by parallel execution. Unlike the troubleshooting scenarios shown in Chapter 1, this chapter covers more complex situations where you may experience problems without knowing which particular query caused them.

One of the symptoms of a concurrency issue is a sudden slowdown of a well-optimized query. The slowdown may not happen consistently, but even a few random occurrences should signal you to check for concurrency problems.

Concurrency can even affect the slave SQL thread. I explicitly mention this to correct a possible misconception. One might think that because the slave SQL is single-threaded,[7] it doesn’t require any troubleshooting techniques related to concurrency. This is actually not so: replication can be affected by concurrency on both the master and the slave. This chapter therefore contains a section devoted to replication.

NOTE

Let’s agree on some terms to start with. For each connected client, the MySQL server creates a separate thread. I will use the words thread, connection, or connection thread to refer a thread that serves a client connection. I will explicitly mention when the context requires another meaning of the word “thread.” I use “thread” instead of “query,” “statement,” or “transaction” because when you are troubleshooting, you often have to isolate a problematic thread before you can deal with the particular statement that might cause the problem.

Locks and Transactions

The MySQL server has internal mechanisms that prevent one user from damaging information inserted by another. Although these usually do their job silently and competently, being unaware of these safeguards can lead to problems for your own application as well as others. Therefore, I’m starting with a short overview of concurrency control mechanisms that the MySQL server uses.

The MySQL server uses locks and transactions to handle concurrent access to its tables. I will give a short overview of lock types and transaction processing first, then go on to troubleshooting techniques.

A lock is set when a thread requests a data set. In MySQL, this can be a table, a row, a page, or metadata. After a thread finishes working with a particular data set, its locks are released. I will describe the locks set by MySQL in detail in Locks and Metadata Locking Versus the Old Model. I cover metadata locking in a separate section because this is a new feature that invented differences in how the MySQL server handles concurrency. If you are familiar with the old table-locking mechanism and not with metadata locking, the latter section can help you determine whether you hit a metadata lock in a particular situation.

Database transactions are units of work treated in a coherent and reliable way that allow a user to work without the risk of intervening with other transactions. The transaction’s isolation level controls whether the transaction can see the changes made by concurrent operations, and if yes, which changes. We will discuss MySQL transactions in detail in Transactions.


[7] This will change in future: version 5.6.2 contains a preview of a multithreaded slave.

Locks

Locks can be set both by the server and by an individual storage engine. Locks are usually set differently for read and write operations. Read, or shared, locks allow concurrent threads to read from the locked data, but prevent writes. In contrast, write, or exclusive, locks prevent other threads from either reading or writing. In a storage engine, the implementation of such locks can differ, but the rationale for these policies is solid and will be the same nearly everywhere.

Read locks are set when you SELECT from a table or explicitly lock it with LOCK TABLE … READ. Write locks are set when you either modify a table or lock it explicitly with LOCK TABLE … WRITE.

NOTE

InnoDB uses the shortcut S for read/shared locks and X for write/exclusive locks. You will see this notation in its debugging output.

As I mentioned before, MySQL has four kind of locks: table, row, page, and metadata. A table lock, as the name suggests, locks the whole table so that no one can access any row in the table until the locking thread unlocks the table. Row locking is much more fine-grained, locking just one row or whatever number of rows are being accessed by the thread, so other rows in the same table are available for access by concurrent threads. Page locks lock a page, but they are found only in the rarely used BDB storage engine, so I will say no more about it. However, general lock troubleshooting recommendations apply to this kind of lock as well.

Metadata locks are a new feature introduced in MySQL version 5.5. These locks apply only to the metadata for a table, and lock all the metadata for that table when a thread starts to use the table. Metadata is the information altered by DDL (Data Definition Language or Data Description Language) statements, the CREATE, DROP, and ALTER statements that modify schemes. The introduction of metadata locks solved a problem in earlier versions of MySQL, when a thread could modify a table definition or even drop it while another thread used the same table in a multistatement transaction.

In the following sections I describe table, row, and metadata locks and the issues they can cause in your application.

Table Locks

When a table lock is set, the whole table is locked. This means concurrent threads cannot use the table as defined by the lock, e.g., write access is not allowed if the READ lock is set, and both read and write access are forbidden if the WRITE lock is set. Table locks are set when you access a table using a storage engine that supports table locking, notably MyISAM. You also invoke a table lock for any engine when you run LOCK TABLES explicitly or issue DDL operations on a version of MySQL earlier than 5.5.

As I like to demonstrate concepts by example, here is one demonstrating the effects of table locking:

mysql> SELECT * FROM t;

+-----+

| a |

+-----+

| 0 |

| 256 |

+-----+

2 rows in set (3 min 18.71 sec)

Three minutes to retrieve two rows? When I showed this example at a conference, I paused and asked whether anybody could figure out why. At that time, the netbook boom had just started, and the whole audience shouted, “It is running on an Atom CPU!” Actually, such a delay is way too much for any kind of modern processor. Let’s look at the table definition and try the same query again:

mysql> SHOW CREATE TABLE t\G

************** 1. row **************

Table: t

Create Table: CREATE TABLE `t` (

`a` int(10) unsigned NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`a`)

) ENGINE=MyISAM AUTO_INCREMENT=257 DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

mysql> SELECT * FROM t;

+-----+

| a |

+-----+

| 0 |

| 256 |

+-----+

2 rows in set (0.00 sec)

Now it runs in almost zero time!

To find out what happened, we need to run SHOW PROCESSLIST during the time the query is running slowly.

NOTE

In an actual application environment, you would either need to manually run a diagnostic query during a busy time or schedule a job that will do it for you from time to time and save the results.

mysql> SHOW PROCESSLIST\G

******************* 1. row *******************

Id: 1311

User: root

Host: localhost

db: test

Command: Query

Time: 35

State: Locked

Info: SELECT * FROM t

******************* 2. row *******************

Id: 1312

User: root

Host: localhost

db: test

Command: Query

Time: 36

State: User sleep

Info: UPDATE t SET a=sleep(200) WHERE a=0

******************* 3. row ******************

Id: 1314

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

3 rows in set (0.00 sec)

The fields of the output are as follows:

Id

The ID of the connection thread running in the MySQL server.

User, Host, and db

The connection options specified by the client when connecting to the server.

Command

The command currently executing in the thread.

Time

The elapsed wall clock time since the thread started to execute the command.

State

The internal state of the thread.

Info

Shows what the thread is currently doing. It contains a query if one is executing and NULL if the thread is sleeping while waiting for the next user command.

To find out what happened with our query, we need to find a row where Info contains the text of the query, and then examine the query’s state. At the top of the output, we can see that the state of our query is Locked, which means the query cannot proceed, because another thread holds the lock our thread is waiting for. The following row shows a query:

UPDATE t SET a=sleep(200) WHERE a=0

which accesses the same table t and has already run for 36 seconds. Because the table is the same and no other thread is using our table, we can conclude that this update is preventing our query from starting. In fact, our query will wait all 200 seconds until the other one is finished.

§ You just learned an important new debugging technique: run SHOW PROCESSLIST when you suspect concurrent threads are affecting the query.

Row Locks

Row locks block a set of rows, not the whole table. Therefore, you can modify rows in the table that are not blocked by the lock.

Row locks are set at the storage engine level. InnoDB is the main storage engine that currently uses row locks.

To show the difference between table and row locks, we will use a slightly modified example from the previous section:

mysql> CREATE TABLE `t` (

-> `a` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,

-> PRIMARY KEY (`a`)

-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Query OK, 0 rows affected (1.29 sec)

mysql> INSERT INTO t VALUES();

Query OK, 1 row affected (0.24 sec)

mysql> INSERT INTO t SELECT NULL FROM t;

Query OK, 1 row affected (0.19 sec)

Records: 1 Duplicates: 0 Warnings: 0

mysql> INSERT INTO t SELECT NULL FROM t;

Query OK, 2 rows affected (0.00 sec)

Records: 2 Duplicates: 0 Warnings: 0

mysql> INSERT INTO t SELECT NULL FROM t;

Query OK, 4 rows affected (0.01 sec)

Records: 4 Duplicates: 0 Warnings: 0

mysql> INSERT INTO t SELECT NULL FROM t;

Query OK, 8 rows affected (0.00 sec)

Records: 8 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM t;

+----+

| a |

+----+

| 1 |

| 2 |

| 3 |

| 4 |

| 6 |

| 7 |

| 8 |

| 9 |

| 13 |

| 14 |

| 15 |

| 16 |

| 17 |

| 18 |

| 19 |

| 20 |

+----+

16 rows in set (0.00 sec)

Let’s run the same sleeping UPDATE query as before to see the different effect of row locking compared to table locks:

mysql> UPDATE t SET a=sleep(200) WHERE a=6;

While the sleeping query is running, we have enough time to try to select rows using another client:

mysql> SELECT * FROM t;

+----+

| a |

+----+

| 1 |

| 2 |

| 3 |

| 4 |

| 6 |

| 7 |

| 8 |

| 9 |

| 13 |

| 14 |

| 15 |

| 16 |

| 17 |

| 18 |

| 19 |

| 20 |

+----+

16 rows in set (0.00 sec)

We got results immediately. Now let’s try to update a row:

mysql> UPDATE t SET a=23 WHERE a=13;

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

mysql> UPDATE t SET a=27 WHERE a=7;

Query OK, 1 row affected (0.09 sec)

Rows matched: 1 Changed: 1 Warnings: 0

Updates of rows that are not locked work fine too. Now let’s see what happens if we try to update a row using the same WHERE condition as the blocking UPDATE:

mysql> UPDATE t SET a=26 WHERE a=6;

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Our new query waits for innodb_lock_wait_timeout seconds (the default value is 50), then dies with an error. We have the same result as table locks in terms of data consistency, but the InnoDB lock does not affect parallel threads until they try to access the exact row that is locked.

NOTE

Actually, row-level locking is more complicated than I have described. For example, if we tried to access a table using a WHERE condition that could not be resolved using a UNIQUE key, we could not update any row in parallel, because the storage engine could not determine whether the other thread was trying to update the same row. This is not the only detail I skipped in my discussion of row-level locks to save space for the troubleshooting techniques themselves. Consult the Appendix A to find sources of information you can use to learn more about MySQL locks.

Now let’s see how the process list output changed. I will use the table INFORMATION_SCHEMA.PROCESSLIST in this example. It actually has same information as SHOW PROCESSLIST, but because it is in the table you can sort the query results as you wish. This is especially convenient if you have dozens of parallel threads:

mysql> SELECT * FROM PROCESSLIST\G

*************************** 1. row ***************************

ID: 4483

USER: root

HOST: localhost

DB: NULL

COMMAND: Sleep

TIME: 283

STATE:

INFO: NULL

*************************** 2. row ***************************

ID: 4482

USER: root

HOST: localhost

DB: information_schema

COMMAND: Query

TIME: 0

STATE: executing

INFO: SELECT * FROM PROCESSLIST

*************************** 3. row ***************************

ID: 4481

USER: root

HOST: localhost

DB: test

COMMAND: Query

TIME: 7

STATE: Updating

INFO: UPDATE t SET a=26 WHERE a=6

*************************** 4. row ***************************

ID: 4480

USER: root

HOST: localhost

DB: test

COMMAND: Query

TIME: 123

STATE: User sleep

INFO: UPDATE t SET a=sleep(200) WHERE a=6

4 rows in set (0.09 sec)

Here you can see another difference from the previous example: the state of our query is Updating, not Locked.

To find out whether a query is blocked by a lock in InnoDB, you can run the query SHOW ENGINE INNODB STATUS, which is part of a mechanism called the InnoDB Monitor. This is especially helpful if you analyze the effect of parallel multistatement transactions, which we will discuss a bit later in this chapter. I will not print here the whole output of this great tool, just the part relevant to our current example. We will discuss this tool further in SHOW ENGINE INNODB STATUS and InnoDB Monitors and in detail in Chapter 6:

mysql> SHOW ENGINE INNODB STATUS \G

*************************** 1. row ***************************

Type: InnoDB

Name:

Status:

=====================================

110802 2:03:45 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 41 seconds

...

------------

TRANSACTIONS

------------

Trx id counter 0 26243828

Purge done for trx's n:o < 0 26243827 undo n:o < 0 0

History list length 25

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0 0, not started, OS thread id 101514240

MySQL thread id 4483, query id 25022097 localhost root

show engine innodb status

---TRANSACTION 0 26243827, ACTIVE 9 sec, OS thread id 101403136 starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 4481, query id 25022095 localhost root Updating

update t set a=26 where a=6

------- TRX HAS BEEN WAITING 9 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26243827 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

0: len 4; hex 00000006; asc ;; 1: len 6; hex 0000019072e3; asc r ;; 2:

len 7; hex 800000002d0110; asc - ;;

------------------

---TRANSACTION 0 26243821, ACTIVE 125 sec, OS thread id 101238272,

thread declared inside InnoDB 500

mysql tables in use 1, locked 1

2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 4480, query id 25022091 localhost root User sleep

update t set a=sleep(200) where a=6

The important part we need to pay attention to is:

---TRANSACTION 0 26243827, ACTIVE 9 sec, OS thread id 101403136 starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 4481, query id 25022095 localhost root Updating

update t set a=26 where a=6

------- TRX HAS BEEN WAITING 9 SEC FOR THIS LOCK TO BE GRANTED:

This shows that a query is waiting for a lock.

Here are a few details about the preceding output, before we return to the lock:

TRANSACTION 0 26243827

This is the ID of the transaction.

ACTIVE 9 sec

Number of seconds the transaction was active for.

OS thread id 101403136

ID of the MySQL thread that is running the transaction.

starting index read

What the transaction is doing.

mysql tables in use 1, locked 1

How many tables are used and locked.

LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)

Information about the locks.

MySQL thread id 4481, query id 25022095 localhost root Updating

Information about the MySQL thread: ID, ID of the query, user credentials, and MySQL state.

update t set a=26 where a=6

Currently running query.

And here are details about the lock:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26243827 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

0: len 4; hex 00000006; asc ;; 1: len 6; hex 0000019072e3; asc r ;; 2:

len 7; hex 800000002d0110; asc - ;;

This shows the exact coordinates of the blocked transaction in the InnoDB tablespace, the type of the lock (exclusive, because we are going to do an update), and the binary content of the physical record.

Finally, let’s look at the information about the transaction running the query that locks the rows:

---TRANSACTION 0 26243821, ACTIVE 125 sec, OS thread id 101238272,

thread declared inside InnoDB 500

mysql tables in use 1, locked 1

2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 4480, query id 25022091 localhost root User sleep

update t set a=sleep(200) where a=6

Now that the situation is clear, we can consider how to fix it.

§ You just learned about another important troubleshooting instrument: InnoDB Monitor, which can be called with the help of SHOW ENGINE INNODB STATUS. I will add more details about InnoDB Monitors at the end of this chapter in SHOW ENGINE INNODB STATUS and InnoDB Monitors.

I need to point out one thing about the performance troubleshooting section in the previous chapter, Slow Queries. In that section, I wrote that indexes can decrease the performance of inserts because they need to update index files as well as insert data. But when row locks are in use, indexes can increase overall application performance, especially when an index is unique, because while updating such an indexed field, insert would not block access to the whole table.

Next, I take a short break from locks to describe transactions. Later, we will return to metadata locking. I decided to order the information in this way because we need to be acquainted with transactions before discussing metadata locks.

Transactions

MySQL supports transactions at the storage engine level. The most popular among the officially supported storage engines, InnoDB, provides transaction support. In this section, we discuss how to troubleshoot InnoDB transactions.

In MySQL, you can start a transaction using a START TRANSACTION or BEGIN statement. To commit a transaction, call COMMIT, and to roll it back (cancel it), call ROLLBACK.

An alternate way to start a multistatement transaction is to set the variable autocommit to zero. This will override the default MySQL behavior, which sends an implicit commit after each statement. With autocommit set to 0, you need to call COMMIT or ROLLBACK explicitly. After that, the next statement begins a new transaction automatically.

MySQL also provides SAVEPOINT and XA transaction interfaces. Although InnoDB supports both, I will not describe them in this book, because it would not offer any extra insight related to our troubleshooting techniques. In other words, the same techniques I describe can be applied to such transactions.

Hidden Queries

InnoDB treats every request for data as a transaction. It does not matter whether the transaction is a single statement or if it is multiple statements. With regard to troubleshooting, you can handle single-query transactions as described in Row Locks. You need to find out which queries run concurrently and which locks get in the way of each other.

Things change when you have transactions that consist of multiple statements. In this case, the transaction can lock rows even when you see no query in the SHOW PROCESSLIST output.

To illustrate this problem, let’s modify our example one more time. Now we don’t even need to call sleep to create a delay. Before running this example, I reverted the changed rows, so the table has the same values as in the initial test setup.

mysql1> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql1> UPDATE t SET a=26 WHERE a=6;

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

Please note that the transaction is not closed. Let’s start another from another connection:

mysql2> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql2> UPDATE t SET a=36 WHERE a=6;

Now let’s run SHOW PROCESSLIST. Our query is in the same Updating state as it was in our row locks example, but now it is not clear what is preventing the actual update:

mysql3> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 4484

User: root

Host: localhost

db: test

Command: Sleep

Time: 104

State:

Info: NULL

*************************** 2. row ***************************

Id: 4485

User: root

Host: localhost

db: test

Command: Query

Time: 2

State: Updating

Info: UPDATE t SET a=36 WHERE a=6

*************************** 3. row ***************************

Id: 4486

User: root

Host: localhost

db: test

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

*************************** 4. row ***************************

Id: 4487

User: root

Host: localhost

db: NULL

Command: Sleep

Time: 33

State:

Info: NULL

4 rows in set (0.09 sec)

Here is our sorted output from SHOW ENGINE INNODB STATUS:

mysql> SHOW ENGINE INNODB STATUS\G

*************************** 1. row ***************************

Type: InnoDB

Name:

Status:

=====================================

110802 14:35:28 INNODB MONITOR OUTPUT

=====================================

...

------------

TRANSACTIONS

------------

Trx id counter 0 26243837

Purge done for trx's n:o < 0 26243834 undo n:o < 0 0

History list length 2

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0 0, not started, OS thread id 101515264

MySQL thread id 4487, query id 25022139 localhost root

show engine innodb status

---TRANSACTION 0 26243836, ACTIVE 4 sec, OS thread id 101514240

starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 4485, query id 25022137 localhost root Updating

update t set a=36 where a=6

------- TRX HAS BEEN WAITING 4 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26243836 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 0000019072fb; asc r ;; 2:

len 7; hex 000000003202ca; asc 2 ;;

------------------

---TRANSACTION 0 26243835, ACTIVE 106 sec, OS thread id 100936704

2 lock struct(s), heap size 320, 1 row lock(s), undo log entries 2

MySQL thread id 4484, query id 25022125 localhost root

NOTE

The same recipe will work for MyISAM tables locked with LOCK TABLE query, and thus not necessarily visible in the SHOW PROCESSLIST output. InnoDB prints information about such tables in its status output:

------------

TRANSACTIONS

------------

Trx id counter B55

Purge done for trx's n:o < B27 undo n:o < 0

History list length 7

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0, not started

MySQL thread id 3, query id 124 localhost ::1 root

show engine innodb status

---TRANSACTION B53, not started

mysql tables in use 1, locked 1

MySQL thread id 1, query id 115 localhost ::1 root

--------

The lock information is similar to what we saw in Row Locks:

------- TRX HAS BEEN WAITING 4 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26243836 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 0000019072fb; asc r ;; 2:

len 7; hex 000000003202ca; asc 2 ;;

This clearly shows that our transaction is waiting for the lock. From this listing, though, it is not clear what holds the lock. If you are using version 5.0 or use bundled InnoDB in 5.1, you have two choices: figure it out yourself, or use the InnoDB Lock monitor. In our example, we have only two transactions, so it’s easy to see the answer, and “figuring it out” would work. But if you have dozens of connections using different rows of the same table, it would not be so easy. I will describe the InnoDB Lock monitor later in this chapter in SHOW ENGINE INNODB STATUS and InnoDB Monitors. Here we will use a third choice, available for the InnoDB Plugin only.

The InnoDB Plugin, in addition to monitors, has INFORMATION_SCHEMA tables named INNODB_LOCKS, INNODB_LOCK_WAITS (which holds information about acquired locks and waits for locks), and INNODB_TRX, which holds information about running transactions.

For our example, we can query these tables:

mysql> SELECT * FROM innodb_locks\G

*************************** 1. row ***************************

lock_id: 3B86:1120:3:6

lock_trx_id: 3B86

lock_mode: X

lock_type: RECORD

lock_table: `test`.`t`

lock_index: `PRIMARY`

lock_space: 1120

lock_page: 3

lock_rec: 6

lock_data: 6

*************************** 2. row ***************************

lock_id: 3B85:1120:3:6

lock_trx_id: 3B85

lock_mode: X

lock_type: RECORD

lock_table: `test`.`t`

lock_index: `PRIMARY`

lock_space: 1120

lock_page: 3

lock_rec: 6

lock_data: 6

2 rows in set (0.01 sec)

This shows information about locks. Both transactions have locks set on the same record, but the result does not give any idea as to which transaction holds the lock and which waits on it. More information about what is going on can be obtained from the table INNODB_LOCK_WAITS:

mysql> SELECT * FROM innodb_lock_waits\G

*************************** 1. row ***************************

requesting_trx_id: 3B86

requested_lock_id: 3B86:1120:3:6

blocking_trx_id: 3B85

blocking_lock_id: 3B85:1120:3:6

1 row in set (0.09 sec)

The column requesting_trx_id is the ID of our “hanging” transaction, and blocking_trx_id is the ID of the transaction that holds the lock. requested_lock_id and blocking_lock_id show information about the requested and blocked locks, respectively.

All we need to know now is the MySQL process ID of the blocking transaction, so we can do something with it. The content of the INNODB_TRX table will help us find it:

mysql> SELECT * FROM innodb_trx\G

*************************** 1. row ***************************

trx_id: 3B86

trx_state: LOCK WAIT

trx_started: 2011-08-02 14:48:51

trx_requested_lock_id: 3B86:1120:3:6

trx_wait_started: 2011-08-02 14:49:59

trx_weight: 2

trx_mysql_thread_id: 28546

trx_query: UPDATE t SET a=36 WHERE a=6

trx_operation_state: starting index read

trx_tables_in_use: 1

trx_tables_locked: 1

trx_lock_structs: 2

trx_lock_memory_bytes: 320

trx_rows_locked: 1

trx_rows_modified: 0

trx_concurrency_tickets: 0

trx_isolation_level: REPEATABLE READ

trx_unique_checks: 1

trx_foreign_key_checks: 1

trx_last_foreign_key_error: NULL

trx_adaptive_hash_latched: 0

trx_adaptive_hash_timeout: 10000

*************************** 2. row ***************************

trx_id: 3B85

trx_state: RUNNING

trx_started: 2011-08-02 14:48:41

trx_requested_lock_id: NULL

trx_wait_started: NULL

trx_weight: 4

trx_mysql_thread_id: 28544

trx_query: NULL

trx_operation_state: NULL

trx_tables_in_use: 0

trx_tables_locked: 0

trx_lock_structs: 2

trx_lock_memory_bytes: 320

trx_rows_locked: 1

trx_rows_modified: 2

trx_concurrency_tickets: 0

trx_isolation_level: REPEATABLE READ

trx_unique_checks: 1

trx_foreign_key_checks: 1

trx_last_foreign_key_error: NULL

trx_adaptive_hash_latched: 0

trx_adaptive_hash_timeout: 10000

2 rows in set (0.11 sec)

The Id of our blocking transaction is 3B85. So this is the second row in the output, with a trx_mysql_thread_id of 28544. We can confirm this using SHOW PROCESSLIST:

mysql> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 28542

User: root

Host: localhost

db: information_schema

Command: Sleep

Time: 46

State:

Info: NULL

*************************** 2. row ***************************

Id: 28544

User: root

Host: localhost

db: test

Command: Sleep

Time: 79

State:

Info: NULL

*************************** 3. row ***************************

Id: 28546

User: root

Host: localhost

db: test

Command: Query

Time: 1

State: Updating

Info: UPDATE t SET a=36 WHERE a=6

*************************** 4. row ***************************

Id: 28547

User: root

Host: localhost

db: test

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

4 rows in set (0.01 sec)

Now that we know the MySQL thread ID, we can do whatever we want with the blocking transaction: either wait until it finishes or kill it. If we ran the offending command from an application, we can also analyze what led to such a locking issue and can fix it to prevent future problems.

Actually, the INNODB_TRX table contains a lot of useful information about transactions. If we go back to our example, we can see trx_state: LOCK WAIT for our waiting transaction and trx_state: RUNNING for the one that is running. I won’t describe this additional information here, but I will touch on it in Chapter 6.

§ We have just learned that an uncommitted transaction can hold locks, even if the query using the affected rows finished hours ago.

You should remember this while coding. I saw environments where users set autocommit=0 and left transactions running for hours. This leads to issues that are hard to uncover and understand, especially when the user was not prepared for it. Such environments are often used in popular Java frameworks that add autocommit=0 to a URL by default.

§ To summarize, when you work with multistatement transactions, commit them as soon as you can. Don’t leave an uncommitted transaction around after its last update has finished, even if it does not modify any further rows.

Deadlocks

Deadlock is a situation when two or more competing transactions are waiting for each other to free locks, and thus neither ever finishes. With row-level locking, deadlocks are not 100% avoidable.

InnoDB has an internal deadlock detector. When it finds one, it just rolls back one of the transactions, reporting an error that we’ll see momentarily. When designing an application, you need to be prepared for such a situation and handle the rollback appropriately.

Information about deadlocks can be found in SHOW ENGINE INNODB STATUS. To demonstrate this, we will examine a trivial example of a deadlock.

The initial data is:

mysql> CREATE TABLE `t` (

`a` int(10) unsigned NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Query OK, 0 rows affected (0.27 sec)

mysql> INSERT INTO t VALUES();

Query OK, 1 row affected (0.16 sec)

mysql> INSERT INTO t SELECT NULL FROM t;

Query OK, 1 row affected (0.11 sec)

Records: 1 Duplicates: 0 Warnings: 0

mysql> INSERT INTO t SELECT NULL FROM t;

Query OK, 2 rows affected (0.09 sec)

Records: 2 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM t;

+---+

| a |

+---+

| 1 |

| 2 |

| 3 |

| 4 |

+---+

4 rows in set (0.00 sec)

Now let’s start two transactions and insert one row in each of them:

mysql1> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql1> INSERT INTO t VALUES();

Query OK, 1 row affected (0.00 sec)

mysql1> SELECT * FROM t;

+---+

| a |

+---+

| 1 |

| 2 |

| 3 |

| 4 |

| 8 |

+---+

5 rows in set (0.00 sec)

mysql2> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql2> INSERT INTO t VALUES();

Query OK, 1 row affected (0.00 sec)

mysql2> SELECT * FROM t;

+---+

| a |

+---+

| 1 |

| 2 |

| 3 |

| 4 |

| 9 |

+---+

5 rows in set (0.00 sec)

Everything’s OK so far. Both tables inserted one value into an auto-incremented field. Now let’s try to modify a row in the first transaction:

mysql1> UPDATE t SET a=9 WHERE a=8;

While it waits, let’s go to the second one and modify its row:

mysql2> UPDATE t SET a=8 WHERE a=9;

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

The query fails immediately, returning information about the deadlock that has occurred. Meanwhile, the second query completed with no problems:

Query OK, 1 row affected (9.56 sec)

Rows matched: 1 Changed: 1 Warnings: 0

You just saw how InnoDB’s deadlock detector worked. To find out what happened, we can again examine SHOW ENGINE INNODB STATUS:

------------------------

LATEST DETECTED DEADLOCK

------------------------

110803 3:04:34

*** (1) TRANSACTION:

TRANSACTION 3B96, ACTIVE 29 sec, OS thread id 35542016 updating or deleting

mysql tables in use 1, locked 1

LOCK WAIT 3 lock struct(s), heap size 320, 2 row lock(s), undo log entries 2

MySQL thread id 30446, query id 772 localhost root Updating

update t set a=9 where a=8

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1121 page no 3 n bits 80 index `PRIMARY` of table `test`.`t`

trx id 3B96 lock mode S locks rec but not gap waiting

Record lock, heap no 8 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000009; asc ;;

1: len 6; hex 000000003b97; asc ; ;;

2: len 7; hex 510000022328d5; asc Q #( ;;

*** (2) TRANSACTION:

TRANSACTION 3B97, ACTIVE 21 sec, OS thread id 35552256 updating or deleting

mysql tables in use 1, locked 1

3 lock struct(s), heap size 320, 2 row lock(s), undo log entries 2

MySQL thread id 30447, query id 773 localhost root Updating

update t set a=8 where a=9

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 1121 page no 3 n bits 80 index `PRIMARY` of table `test`.`t`

trx id 3B97 lock_mode X locks rec but not gap

Record lock, heap no 8 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000009; asc ;;

1: len 6; hex 000000003b97; asc ; ;;

2: len 7; hex 510000022328d5; asc Q #( ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1121 page no 3 n bits 80 index `PRIMARY` of table `test`.`t`

trx id 3B97 lock mode S locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000008; asc ;;

1: len 6; hex 000000003b96; asc ; ;;

2: len 7; hex 50000002221b83; asc P " ;;

*** WE ROLL BACK TRANSACTION (2)

The output contains a lot of information about the latest deadlock and why it happened. You need to pay attention to the parts named WAITING FOR THIS LOCK TO BE GRANTED (which shows which lock the transaction is waiting for) and HOLDS THE LOCK(S) (which shows the locks that are holding up this transaction). This knowledge is especially important in applications where you cannot predict which queries are executed at particular times, such as queries invoked by interactions with users of a web application.

To cope with potential deadlocks, you need to add error-handling functionality into your application, as was described in Chapter 1. If you get an error indicating a deadlock and rollback has occurred, restart the transaction.

Implicit Commits

Some statements commit transactions even when you don’t call COMMIT explicitly. This situation is called an implicit commit, and if you aren’t aware you’re doing a commit, you can end up with an inconsistent state.

A lot of statements cause implicit commits. I won’t list them here, because they can vary from version to version. The general rule is that DDL, transaction-related, and administrative statements cause implicit commits, whereas those that work with data do not.

One symptom of an unanticipated implicit commit is when you see unwanted data in tables even though the statements inserting that data were supposed to be rolled back. Here’s an example:

mysql> CREATE TABLE t1(f1 INT) ENGINE=InnoDB;

Query OK, 0 rows affected (0.14 sec)

mysql> SELECT * FROM t1;

Empty set (0.00 sec)

mysql> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES(100);

Query OK, 1 row affected (0.03 sec)

mysql> CREATE TABLE t2 LIKE t1;

Query OK, 0 rows affected (0.19 sec)

mysql> INSERT INTO t1 VALUES(200);

Query OK, 1 row affected (0.02 sec)

mysql> ROLLBACK;

Query OK, 0 rows affected (0.00 sec)

CREATE TABLE causes an implicit commit. So even though you may think you rolled back both inserts, t1 will contain one row with the value 100. The second insert with value 200 will be rolled back as desired.

This example assumes that you have set autocommit=0 so that multistatement transactions are used by default. While we’re on the subject of commits, it’s worth noting again that the default value of autocommit is 1, preventing the use of multistatement transactions when a BEGIN orSTART TRANSACTION is not called explicitly. When the value is 1, each statement will be committed right away, and in the previous situation you would actually end up with both rows in the table:

mysql> SELECT * FROM t1;

+------+

| f1 |

+------+

| 100 |

| 200 |

+------+

2 rows in set (0.00 sec)

mysql> SELECT @@autocommit;

+--------------+

| @@autocommit |

+--------------+

| 1 |

+--------------+

1 row in set (0.00 sec)

§ Generally, to prevent such issues, keep transactions small, so that even if you interrupt a transaction by mistake with a statement causing an implicit commit, the side effect will be minimal.

Metadata Locking

To ensure data consistency, DDL operations on a table should be blocked if another transaction is using the table. Starting with version 5.5.3, this is achieved by using metadata locks.

When a transaction starts, it acquires metadata locks on all the tables it uses and releases the locks when it finishes. All other threads that try to modify the tables’ definitions wait until the transaction ends.

DDL operations in MySQL servers prior to 5.5.3 knew nothing about parallel transactions. This would lead to collisions similar to the following:

mysql1> BEGIN;

Query OK, 0 rows affected (0.08 sec)

mysql1> SELECT * FROM t1;

+------+

| f1 |

+------+

| 100 |

| 200 |

+------+

2 rows in set (0.10 sec)

In one transaction, we are selecting data from a table, planning to use this result set during the current transaction. At the very same time, another thread drops the table:

mysql2> DROP TABLE t1;

Query OK, 0 rows affected (0.17 sec)

DROP is not an operation that can be rolled back, so the first thread is the one affected by the conflict:

mysql> SELECT * FROM t1;

ERROR 1146 (42S02): Table 'test.t1' doesn't exist

Our transaction obviously cannot complete. A metadata lock would allow our transaction to complete before the other connection’s DROP statement could execute. To illustrate this, we will execute the same example on version 5.5.3 or later:

mysql1> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql1> SELECT * FROM t1;

+------+

| f1 |

+------+

| 100 |

| 200 |

+------+

2 rows in set (0.00 sec)

Now the DROP attempt just blocks:

mysql2> DROP TABLE t1;

After issuing that command, I waited a few seconds, so you will see how long the drop was executing, and then rolled back the first transaction:

mysql1> ROLLBACK;

Query OK, 0 rows affected (0.00 sec)

Now we can look at the query execution time to be sure the DROP waited until the first transaction finished:

mysql2> DROP TABLE t1;

Query OK, 0 rows affected (1 min 0.39 sec)

The new model is much safer, and as such does not require any new troubleshooting techniques. But MySQL has been around for a long time before metadata locks were introduced, and users became used to the old behavior, even creating workarounds for it. So I want to add a few notes about the differences in server behavior created by metadata locking.

Metadata Locking Versus the Old Model

Metadata locks are acquired independently of the storage engine you use. So if you use a MyISAM table with autocommit=0 or if you start an explicit transaction with BEGIN or START TRANSACTION, your connection will acquire the metadata lock. You can clearly see this in the output ofSHOW PROCESSLIST, which will show the statement with the state “Waiting for table metadata lock.”

A small example demonstrates the use of the lock. The first thread opens a transaction that accesses a MyISAM table. I used BEGIN here, but the same behavior can be seen if you use autocommit=0:

mysql1> SHOW CREATE TABLE tm\G

*************************** 1. row ***************************

Table: tm

Create Table: CREATE TABLE `tm` (

`a` int(10) unsigned NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`a`)

) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

mysql1> BEGIN;

Query OK, 0 rows affected (0.00 sec)

mysql1> SELECT * FROM tm;

+---+

| a |

+---+

| 1 |

| 2 |

| 3 |

| 4 |

+---+

4 rows in set (0.00 sec)

At the same time, another thread calls TRUNCATE, which affects table metadata:

mysql2> TRUNCATE TABLE tm;

You can see the states of both threads if you run SHOW PROCESSLIST using a third connection:

mysql> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 30970

User: root

Host: localhost

db: test

Command: Sleep

Time: 26

State:

Info: NULL

*************************** 2. row ***************************

Id: 30972

User: root

Host: localhost

db: test

Command: Query

Time: 9

State: Waiting for table metadata lock

Info: TRUNCATE TABLE tm

*************************** 3. row ***************************

Id: 31005

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

3 rows in set (0.00 sec)

When a query blocks while waiting for the metadata lock, SHOW PROCESSLIST is your assistant again. After you move to a version of MySQL with support for the metadata lock, you may find that DDL queries start to run slowly. This is because they have to wait when another transaction has the lock.

In theory, the metadata lock can time out. The timeout is specified by a variable named lock_wait_timeout. By default, it is set to 31,536,000 seconds (one year), so effectively a locked query can never die:

mysql> truncate table tm;

Query OK, 0 rows affected (5 hours 12 min 52.51 sec)

To provide an effective timeout, you can set lock_wait_timeout to a smaller value, such as one second:

mysql> set lock_wait_timeout=1;

Query OK, 0 rows affected (0.00 sec)

mysql> truncate table tm;

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

How Concurrency Affects Performance

We just discussed cases when conflicts between parallel threads or transactions created performance issues or even aborted queries. You saw how locks set by SQL statements or the storage engines affect parallel threads. Such locks are visible to users and easy to track, although they are not always easy to debug. You need to account for the possibility of parallel, competing threads when an application is capable of creating multiple MySQL connections. This can be as common as a web server that opens parallel connections to a MySQL server because multiple users have opened a web page provided by that web server.

We also discussed wrong results or dramatic slowdowns that are hard to miss if your application handles errors. When I collected examples for this book, I put all such issues in Wrong Results from a SELECT. I distinguish those from performance problems because you can see their results immediately, whereas performance problems are generally hidden at first, and you notice them only after examining the slow query log or getting complaints from users about slow applications.

So let’s tackle the subtler performance problems. If a query suddenly starts to run slowly, your first step is to make sure it is properly optimized. The easiest way to do this is to run the query in an isolated, single-threaded environment. If the query still runs slowly, either it requires optimization or the recent execution of a large number of updates caused the index statistic to become out of date. (Chapter 1 contains basic optimization techniques.)

If a query completes quickly in single-threaded environment but slowly in a multithreaded one, this almost certainly means you are experiencing a concurrency issue. All the techniques that I illustrated for dealing with wrong results are suitable for this case as well. Slow queries are just a slightly more complex problem because in order to debug them, you have to reproduce the conditions under which they occur, and it can be hard to make the problem strike when you want it to.

NOTE

I always insist on reproducing problems, not just removing a problematic query. For concurrency problems, this is important because the problem query may be just a symptom of a deeper problem. If you stop executing it without solving the real problem, you may suffer from the same issue in another part of the application.

Monitoring InnoDB Transactions for Concurrency Problems

If you are debugging locks caused by an InnoDB transaction, InnoDB Monitors will make your life easier. Just turn a monitor on, and it will periodically dump messages into the error logfile, similar to the output you have already seen with the use of SHOW ENGINE INNODB STATUS.

To turn on InnoDB Monitors, create a table named innodb_monitor in any database:

$mysql test -A

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 2624

Server version: 5.1.59-debug Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE innodb_monitor(f1 INT) ENGINE=InnoDB;

Query OK, 0 rows affected (0.48 sec)

The -A option on the MySQL client command is useful when you’re trying to debug problems related to concurrency. Normally, the client asks for a list of available tables. The client could then be held up by locks held by other connections, which defeats the purpose of running the client to debug them. The -A option suppresses the request for the table list.

Once you do this, InnoDB recognizes the table and starts printing information to the error logfile. So if you check for the time when the slow query was running against the transactions information recorded at that time, you can find out what was holding a lock that held up the query. More information about InnoDB Monitors appears in SHOW ENGINE INNODB STATUS and InnoDB Monitors and also in Chapter 6.

Monitoring Other Resources for Concurrency Problems

If you debug a query that doesn’t use the InnoDB storage engine or that you suspect is affected by a different kind of lock, there are still a couple of options. You can issue SHOW PROCESSLIST, but an even better choice is to schedule SELECT … FROMINFORMATION_SCHEMA.PROCESSLIST to run repeatedly and save its output into a file or a table together with information about when it was taken. This output is easier to control and read than SHOW PROCESSLIST. Check the process list information from the same time as when the slow query was running to get some clue about what was holding up the query.

The effects of concurrency on performance do not always lie at the level of SQL or the storage engine. Multiple threads of the MySQL server also share hardware resources, such as RAM and CPU. Some of these resources are dedicated to each thread, whereas others, such as temporary tables, are dedicated for particular kinds of operations and allocated only when necessary. Some resources are shared by all threads. You should also consider operating system limits when planning an application. In this section, I will describe the concepts that affect the performance of these resources, and Chapter 3 will discuss in detail the options that control resource usage.

Let’s start with the memory allocated to each thread. The MySQL server has a group of options that allows you to set the size of thread-specific buffers. Roughly, the more memory you allocate, the faster the thread runs. But these buffers are allocated for every single thread you run, so the more resources you allocate to each thread, the fewer threads can run simultaneously. Always seek a balance between values that improve performance and the real amount of physical memory.

Resources allocated for certain kinds of operations are limited too. Don’t set them very high until it is necessary, but don’t set them too low either. A good compromise is to set high values for only a few sessions (connections) that need large buffers and leave the default value for others. This level of control can be set on a session-by-session basis dynamically:

SET SESSION join_buffer_size=1024*1024*1024;

The third kind of resources are those shared by all threads, usually internal caches. With such options, you generally don’t need to worry that adding more connections will increase memory usage. One potential issue with them, though, is that changing the data can invalidate the cache and cause subsequent statements to take longer. Usually this is a very quick operation, but if a cache is very large, invalidation can take a long time, so this can affect the user while a thread waits for the cache to become accessible again.

Finally, performance can run up against limits on operating resources such as file descriptors and CPU. The number of file descriptors available on the system limits the number of connections the server can make, the number of tables that can be opened simultaneously, and even the number of partitions in a single table. You may find it impossible even to open a table if it has more partitions than available file descriptors. We will discuss how the operating system can affect MySQL in Chapter 4.

Other Locking Issues

Other resources that can affect your application are internal locks and mutexes acquired when the server executes particular operations. Most of them protect data integrity. With a few exceptions, such as InnoDB mutexes and spin locks, you cannot and should not try to control them, butbecause a few of them can become visible to user applications, I’ll describe them here.

Transactions can create race conditions and therefore deadlocks, but so can other activities. When the MySQL server starts using a resource such as a file or modifies a variable that is shared between threads, it locks access to the resource to prevent concurrent access to the same resource by other threads. This is done for data consistency. But at the same time, such protection can lead to deadlocks.

These deadlocks are hard to diagnose and theoretically should never happen, but because they have turned up in the past, I’ll describe what to do when you suspect it. As an example, I will create a deadlock of this kind using a test case from a bug report. This bug is not related to troubleshooting a metadata lock (MDL), so I will concentrate on just the debugging aspect, not on the actions that led to the deadlock.

NOTE

It might seem artificial to illustrate a problem caused by a MySQL bug instead of a user error, but the message is useful. Nobody is insured against hitting a bug, and it is good to be prepared. Forewarned is forearmed.

The symptoms of a “resource” deadlock are the same as for deadlocks caused by row locking: queries just hang. No internal mechanism can find such a deadlock and kill it, so don’t expect that the thread will time out (as with an InnoDB lock) or immediately be rolled back (as with an InnoDB deadlock). SHOW PROCESSLIST will show something like this:

mysql> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 2

User: root

Host: localhost

db: performance_schema

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

*************************** 2. row ***************************

Id: 6

User: root

Host: localhost

db: test

Command: Query

Time: 9764

State: Waiting for table metadata lock

Info: SELECT * FROM t1, t2

*************************** 3. row ***************************

Id: 7

User: root

Host: localhost

db: test

Command: Query

Time: 9765

State: Waiting for table metadata lock

Info: RENAME TABLE t2 TO t0, t4 TO t2, t0 TO t4

*************************** 4. row ***************************

Id: 8

User: root

Host: localhost

db: test

Command: Query

Time: 9766

State: Waiting for table level lock

Info: INSERT INTO t3 VALUES ((SELECT count(*) FROM t4))

*************************** 5. row ***************************

Id: 10

User: root

Host: localhost

db: test

Command: Sleep

Time: 9768

State:

Info: NULL

*************************** 6. row ***************************

Id: 502

User: root

Host: localhost

db: test

Command: Sleep

Time: 2

State:

Info: NULL

6 rows in set (0.00 sec)

This output shows several queries waiting for different kinds of locks for more than 9,000 seconds.

The TRANSACTIONS part of the SHOW ENGINE INNODB STATUS output does not show any new information:

------------

TRANSACTIONS

------------

Trx id counter 4211

Purge done for trx's n:o < 4211 undo n:o < 0

History list length 127

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0, not started, OS thread id 35934208

MySQL thread id 502, query id 124 localhost root

show engine innodb status

---TRANSACTION 0, not started, OS thread id 33726976

MySQL thread id 6, query id 71 localhost root Waiting for table metadata lock

select * from t1, t2

---TRANSACTION 0, not started, OS thread id 35786240

mysql tables in use 2, locked 2

MySQL thread id 8, query id 69 localhost root Waiting for table level lock

insert into t3 values ((select count(*) from t4))

---TRANSACTION 4201, not started, OS thread id 35354624

mysql tables in use 2, locked 4

MySQL thread id 10, query id 68 localhost root

---TRANSACTION 0, not started, OS thread id 35633152

MySQL thread id 7, query id 70 localhost root Waiting for table metadata lock

rename table t2 to t0, t4 to t2, t0 to t4

Starting with version 5.5, additional information can be received from the performance_schema, which I’ll describe in PERFORMANCE_SCHEMA Tables. Here I want to show what to do when the problem I just highlighted arises.

The first table to check is MUTEX_INSTANCES, which lists all mutexes created since the server started. Some of them are currently unused, so you should skip them in your SELECT and retrieve only the ones where LOCKED_BY_THREAD_ID is not null:

mysql> SELECT * FROM MUTEX_INSTANCES WHERE LOCKED_BY_THREAD_ID is

not null\G

*************************** 1. row ***************************

NAME: wait/synch/mutex/sql/MDL_wait::LOCK_wait_status

OBJECT_INSTANCE_BEGIN: 35623528

LOCKED_BY_THREAD_ID: 23

*************************** 2. row ***************************

NAME: wait/synch/mutex/sql/MDL_wait::LOCK_wait_status

OBJECT_INSTANCE_BEGIN: 35036264

LOCKED_BY_THREAD_ID: 22

*************************** 3. row ***************************

NAME: wait/synch/mutex/mysys/THR_LOCK::mutex

OBJECT_INSTANCE_BEGIN: 508708108

LOCKED_BY_THREAD_ID: 24

3 rows in set (0.26 sec)

To find out who waits on these mutexes, query the EVENTS_WAITS_CURRENT table:

mysql> SELECT THREAD_ID, EVENT_ID, EVENT_NAME, SOURCE,

TIMER_START, OBJECT_INSTANCE_BEGIN, OPERATION FROM EVENTS_WAITS_CURRENT WHERE

THREAD_ID IN(SELECT LOCKED_BY_THREAD_ID FROM MUTEX_INSTANCES WHERE

LOCKED_BY_THREAD_ID IS NOT NULL)\G

*************************** 1. row ***************************

THREAD_ID: 24

EVENT_ID: 268

EVENT_NAME: wait/synch/cond/mysys/my_thread_var::suspend

SOURCE: thr_lock.c:461

TIMER_START: 128382107931720

OBJECT_INSTANCE_BEGIN: 508721156

OPERATION: timed_wait

*************************** 2. row ***************************

THREAD_ID: 22

EVENT_ID: 44

EVENT_NAME: wait/synch/cond/sql/MDL_context::COND_wait_status

SOURCE: mdl.cc:995

TIMER_START: 130306657228800

OBJECT_INSTANCE_BEGIN: 35036372

OPERATION: timed_wait

*************************** 3. row ***************************

THREAD_ID: 23

EVENT_ID: 42430

EVENT_NAME: wait/synch/cond/sql/MDL_context::COND_wait_status

SOURCE: mdl.cc:995

TIMER_START: 7865906646714888

OBJECT_INSTANCE_BEGIN: 35623636

OPERATION: timed_wait

3 rows in set (2.23 sec)

The THREAD_ID in the output is the actual number assigned internally to a thread by mysqld, not the number of the connection thread that can help us find the statement causing the deadlock. To find the number of the connection thread, query the THREADS table. I have not filtered the output here, because I want to show you all the threads that run inside the mysqld process while it serves the six connections from our example.

mysql> SELECT * FROM THREADS\G

*************************** 1. row ***************************

THREAD_ID: 0

ID: 0

NAME: thread/sql/main

*************************** 2. row ***************************

THREAD_ID: 24

ID: 8

NAME: thread/sql/one_connection

*************************** 3. row ***************************

THREAD_ID: 2

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 4. row ***************************

THREAD_ID: 14

ID: 0

NAME: thread/innodb/srv_monitor_thread

*************************** 5. row ***************************

THREAD_ID: 6

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 6. row ***************************

THREAD_ID: 518

ID: 502

NAME: thread/sql/one_connection

*************************** 7. row ***************************

THREAD_ID: 12

ID: 0

NAME: thread/innodb/srv_lock_timeout_thread

*************************** 8. row ***************************

THREAD_ID: 22

ID: 6

NAME: thread/sql/one_connection

*************************** 9. row ***************************

THREAD_ID: 7

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 10. row ***************************

THREAD_ID: 3

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 11. row ***************************

THREAD_ID: 26

ID: 10

NAME: thread/sql/one_connection

*************************** 12. row ***************************

THREAD_ID: 9

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 13. row ***************************

THREAD_ID: 16

ID: 0

NAME: thread/sql/signal_handler

*************************** 14. row ***************************

THREAD_ID: 23

ID: 7

NAME: thread/sql/one_connection

*************************** 15. row ***************************

THREAD_ID: 1

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 16. row ***************************

THREAD_ID: 4

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 17. row ***************************

THREAD_ID: 5

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 18. row ***************************

THREAD_ID: 8

ID: 0

NAME: thread/innodb/io_handler_thread

*************************** 19. row ***************************

THREAD_ID: 15

ID: 0

NAME: thread/innodb/srv_master_thread

*************************** 20. row ***************************

THREAD_ID: 18

ID: 2

NAME: thread/sql/one_connection

*************************** 21. row ***************************

THREAD_ID: 13

ID: 0

NAME: thread/innodb/srv_error_monitor_thread

*************************** 22. row ***************************

THREAD_ID: 10

ID: 0

NAME: thread/innodb/io_handler_thread

22 rows in set (0.03 sec)

Now we have all the information we can get about what is going on inside the server without engaging in special manipulations, such as attaching a debugger to the running mysqld process. In this case, we actually had all the information we needed after the first SHOW PROCESSLIST call. But I wanted to show what’s available from performance_schema to help debug internal problems, such as deadlocks, when the state is not as clear as it was in this example.

What should be done to solve a deadlock? You need to pick a connection to serve as the victim and kill it. In this case, fortunately we can find a relatively unimportant connection in the SHOW ENGINE INNODB STATUS we saw earlier. Here is what it says about MySQL server’s thread 10:

---TRANSACTION 4201, not started, OS thread id 35354624

mysql tables in use 2, locked 4

MySQL thread id 10, query id 68 localhost root

This is not waiting for any action, but it is locking two tables. Let’s kill it:

mysql> KILL 10;

Query OK, 0 rows affected (0.09 sec)

We were lucky, and killing this single connection solved the problem:

mysql> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 2

User: root

Host: localhost

db: performance_schema

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

*************************** 2. row ***************************

Id: 6

User: root

Host: localhost

db: test

Command: Sleep

Time: 10361

State:

Info: NULL

*************************** 3. row ***************************

Id: 7

User: root

Host: localhost

db: test

Command: Sleep

Time: 10362

State:

Info: NULL

*************************** 4. row ***************************

Id: 8

User: root

Host: localhost

db: test

Command: Sleep

Time: 10363

State:

Info: NULL

*************************** 5. row ***************************

Id: 502

User: root

Host: localhost

db: test

Command: Sleep

Time: 152

State:

Info: NULL

5 rows in set (0.00 sec)

mysql> SELECT * FROM MUTEX_INSTANCES WHERE LOCKED_BY_THREAD_ID IS

NOT NULL\G

Empty set (0.11 sec)

mysql> SELECT THREAD_ID, EVENT_ID, EVENT_NAME, SOURCE,

TIMER_START, OBJECT_INSTANCE_BEGIN, OPERATION FROM EVENTS_WAITS_CURRENT WHERE

THREAD_ID IN(SELECT LOCKED_BY_THREAD_ID FROM MUTEX_INSTANCES WHERE

LOCKED_BY_THREAD_ID IS NOT NULL)\G

Empty set (1.23 sec)

This was a short overview about system deadlocks and what to do if you meet them. The basic procedure is to make sure there are threads waiting essentially forever, find a thread you can live without, and kill it. You should not be afraid of killing stalled threads, because it is better to kill them and then fix any resulting error manually than to wait indefinitely until mysqld stops for another reason.

You may encounter other examples of the influence internal server locks have on applications. Some even cause server crashes. For instance, InnoDB uses semaphores for various locking purposes, such as to protect CHECK TABLE and OPTIMIZE TABLE, but if an InnoDB semaphore waits for more than 600 seconds, InnoDB intentionally crashes the server.

WARNING

Don’t mix semaphores’ long waiting times with user sessions that last more than 600 seconds. InnoDB semaphores do protect certain operations. A single session can issue none or dozens of them.

You can monitor the state of an application and prevent such situations. Thus, pertaining to InnoDB semaphores, the storage engine prints information about wait times in its monitor:

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 179, signal count 177

--Thread 35471872 has waited at trx/trx0rec.c line 1253 for 0.00 seconds the semaphore:

X-lock (wait_ex) on RW-latch at 0x149b124c created in file buf/buf0buf.c line 898

a writer (thread id 35471872) has reserved it in mode wait exclusive

number of readers 1, waiters flag 0, lock_word: ffffffff

Last time read locked in file buf/buf0flu.c line 1186

Last time write locked in file trx/trx0rec.c line 1253

Mutex spin waits 209, rounds 3599, OS waits 38

RW-shared spins 70, rounds 2431, OS waits 67

RW-excl spins 0, rounds 2190, OS waits 71

Spin rounds per wait: 17.22 mutex, 34.73 RW-shared, 2190.00 RW-excl

NOTE

Don’t worry if InnoDB prints information about semaphore waits for operations that need to take a long time, such as CHECK TABLE on a huge table. What you need to take action on is a large waiting time during a normal operation.

Replication and Concurrency

Another important aspect of concurrency troubleshooting concerns replicated environments.

When troubleshooting replication, you need to remember that a master server is always multithreaded, whereas the slave runs all updates in a single thread.[8] This affects performance and consistency during replication, regardless of the binary log formats or options you use.

The majority of replication issues involve data inconsistency, meaning that the data on the slave is different from that on the master. In most cases, MySQL replication takes care of data consistency, but sometimes you can still experience issues, especially if you use statement-based replication. This section focuses on consistency, with a few words at the end about how performance can be affected by replication.

Statement-Based Replication Issues

Starting with version 5.1, MySQL supports three binary logging formats: statement, row, and mixed. The majority of issues with data inconsistency happen with statement-basement replication (SBR), which uses the statement binary log format. This format has several advantages over row-based logging, and historically it was the only way to log changes, so its user base is still huge and it is still the default. But it is more risky than row-based logging.

NOTE

Row-based logging records raw data about rows in the log. The slave does not execute the same queries as the master, but instead updates table rows directly. This is the safest binary logging format because it can’t insert into the slave table any values that do not exist on the master. This format can be used safely even when you call nondeterministic functions such as NOW().

Statement-based logging just puts queries in the original SQL format into the log, so the slave executes the same commands as the master. If you use this format, network traffic usually—although not always—is much lower than if a row-based format is used because a query can take up less data than the actual changes it makes to rows. This is especially noticeable when you update several rows in a table with BLOB columns. The disadvantage of this format is that it forces you to ensure data consistency. For example, if you insert the output of the function NOW() into a column, you will most likely have different data on the master and slave.

The mixed binary logging format contains the advantages of both row and statement logging: it saves the majority of queries in statement format but switches to row format when a query is not safe, i.e., when it uses a nondeterministic function such as NOW().

When you plan an application under conditions where MySQL is using replication, be aware of how different statements affect consistency. Even when the master provides a lot of additional information to the slave, the latter cannot handle all issues.

The MySQL Reference Manual contains a list of statements that are not safe when statement-based replication is used, meaning that using such a statement can leave different results on the slave and master. MySQL High Availability by Charles Bell et al. (O’Reilly) covers consistency issues during replication in detail. Here I’ll focus on concurrency-related problems, which can lead to different results if statements are run in many threads on the master and were put in a particular order on the slave.

Because a master server is multithreaded, it can run multiple connections in a nondeterministic order. But the slave, which is currently single-threaded, reissues the statements in the order in which they were logged, which may be different from the order in which they were issued on the server. This can lead to differences between the master and slave.

MULTITHREADED SLAVE

The Replication Team is working on a multithreaded slave. The current status of this enhancement is feature preview. A multithreaded slave can distribute transactions into different threads. You can tune the number of threads using the slave_parallel_workers variable.

This feature affects replication scalability, i.e., the speed of data updates, so it can prevent slaves from falling so far behind that consistency is threatened. But don’t expect it to prevent data consistency issues.

You can learn more about multithreaded slaves in a blog post by Luís Soares.

To illustrate ordering problems, I will use a simple table with a single field and no unique index. In real life, a similar pattern applies whenever queries search for rows without using a unique index.

Example 2-1. Example of replication issue happened due to wrong order of transactions

CREATE TABLE t1(f1 CHAR(2)) ENGINE=InnoDB;

Now I’ll imitate the effect of concurrent transactions inserting rows in batch mode:

master1> BEGIN;

master1> INSERT INTO t1 VALUES(1);

master1> INSERT INTO t1 VALUES(2);

master1> INSERT INTO t1 VALUES(3);

master1> INSERT INTO t1 VALUES(4);

master1> INSERT INTO t1 VALUES(5);

Note that this transaction is not yet committed. Next, I open a new transaction in another connection and insert yet another bunch of rows:

master2> BEGIN;

master2> INSERT INTO t1 VALUES('a');

master2> INSERT INTO t1 VALUES('b');

master2> INSERT INTO t1 VALUES('c');

master2> INSERT INTO t1 VALUES('d');

master2> INSERT INTO t1 VALUES('e');

master2> COMMIT;

This second transaction is committed. Now I’ll commit the first one:

master1> COMMIT;

You probably expect that the table will have 1, 2, 3, 4, 5 as the first rows and a, b, c, d, e as the following rows. This is absolutely correct, because the first transaction started earlier than the second one:

master1> SELECT * FROM t1;

+------+

| f1 |

+------+

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

| a |

| b |

| c |

| d |

| e |

+------+

10 rows in set (0.04 sec)

But the master doesn’t write anything to the binary log until the transaction is committed. Therefore, the slave will have a, b, c, d, e as the first rows and 1, 2, 3, 4, 5 as the second set:

slave> SELECT * FROM t1;

+------+

| f1 |

+------+

| a |

| b |

| c |

| d |

| e |

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

+------+

10 rows in set (0.04 sec)

This is fine so far: the master and slave contain same the data, although in a different order. Things break when one runs an unsafe UPDATE:

master1> UPDATE t1 SET f1='A' LIMIT 5;

Query OK, 5 rows affected, 1 warning (0.14 sec)

Rows matched: 5 Changed: 5 Warnings: 1

mysql> SHOW WARNINGS\G

*************************** 1. row ***************************

Level: Note

Code: 1592

Message: Unsafe statement written to the binary log using statement format since

BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT

clause. This is unsafe because the set of rows included cannot be predicted.

1 row in set (0.00 sec)

As you see, the server warns us that this query is not safe. Let’s see why. On the master we end up with:

master1> SELECT * FROM t1;

+------+

| f1 |

+------+

| A |

| A |

| A |

| A |

| A |

| a |

| b |

| c |

| d |

| e |

+------+

whereas the slave has a completely different data set:

slave> SELECT * FROM t1;

+------+

| f1 |

+------+

| A |

| A |

| A |

| A |

| A |

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

+------+

This example is quite simple. In real life, mix-ups are usually much worse. In this example, I used a transactional storage engine and multistatement transactions just to reproduce an erroneous result easily. With a nontransactional storage engine, you can see similar issues when using MySQL extensions that delay actual data changes, such as INSERT DELAYED.

The best way to prevent such a situation is to use row-based or mixed replication. If you stick to statement-based replication, it is absolutely necessary to design the application to prevent such situations. Keep in mind that every transaction is written to the binary log when and only when it is committed. You also need to use the techniques to check warnings described in Chapter 1. If this sounds too hard to do for every single query in production (e.g., due to performance considerations), do it at least during the development stage.

Mixing Transactional and Nontransactional Tables

It is important not to mix transactional and nontransactional tables in the same transaction.[9] Once you make changes to a nontransactional table, you cannot roll them back, so if the transaction is aborted or rolled back, your data can become inconsistent.

WARNING

The same issue occurs when mixing transactional tables using different engines in the same transaction. If engines have different transaction isolation levels or different rules for statements that cause implicit commits, you can end up with problems similar to those described for mixing transactional and nontransactional tables.

This can happen even when using a single MySQL server, but in replicated environments things sometimes get even worse. So I’m including this section in the part about replication, even though it contains lessons for nonreplicated environments as well.

For our example, we will show a stored procedure using the same concept as the example from Tracing Back Errors in Data. I’m changing the temporary table here to a persistent one and adding another procedure that will fill rows in table t2. Table t2 will also use the MyISAM storage engine. MyISAM was the default engine before 5.5, so such a temporary table can be created easily if the user forgets to add the ENGINE option to the CREATE TABLE statement. The user may also use MyISAM in the mistaken hope of improving performance:

CREATE TABLE t1(f1 INT) ENGINE=InnoDB;

CREATE TABLE t2(f1 INT) ENGINE=MyISAM;

CREATE TABLE t3(f1 INT) ENGINE=InnoDB;

INSERT INTO t3 VALUES(1),(2),(3);

CREATE PROCEDURE p1()

BEGIN

DECLARE m INT UNSIGNED DEFAULT NULL;

SELECT max(f1) INTO m FROM t2;

IF m IS NOT NULL

THEN

INSERT INTO t1(f1) SELECT f1 FROM t2;

END IF;

END

|

Now I will add values into t2 in a transaction:

master> BEGIN;

Query OK, 0 rows affected (0.00 sec)

master> INSERT INTO t1 VALUES(5);

Query OK, 1 row affected (0.00 sec)

master> INSERT INTO t2 SELECT f1 FROM t3;

Query OK, 3 rows affected, 1 warning (0.00 sec)

Records: 3 Duplicates: 0 Warnings: 1

And I’ll call p1() without committing the transaction in the parallel master connection:

master> BEGIN;

Query OK, 0 rows affected (0.00 sec)

master> CALL p1();

Query OK, 3 rows affected (0.00 sec)

master> COMMIT;

Query OK, 0 rows affected (0.00 sec)

master> SELECT * FROM t1;

+------+

| f1 |

+------+

| 1 |

| 2 |

| 3 |

+------+

3 rows in set (0.00 sec)

As you see, there are three rows on the master. Let’s see how many we have on the slave:

slave> SELECT * FROM t1;

Empty set (0.00 sec)

The slave actually has no rows. This happened because the master wrote updates to nontransactional tables into the transaction cache. But as we saw earlier, cache content is written to the binary log only after a transaction actually ends. In this case, perhaps the slave actually corresponds more closely to the user’s intent than the master (because the master has “ghost” entries in t1 corresponding to a transaction that never completed), but the point is that we end up with data inconsistency thanks to the nontransactional table.

NOTE

The section Replication options discusses the binlog_direct_non_transactional_updates option, which controls when updates on nontransactional tables are written to the binary log.

§ So, don’t mix transactional and nontransactional tables in a transaction. If it is absolutely necessary, use other locking methods, such as LOCK TABLE, to ensure consistency in case of a rollback or crash.

You can solve most replication-related issues using the row-based or mixed binary log format, but that won’t help if the data on the master is not what you expect.

Issues on the Slave

We just discussed concurrency problems that arise because the master is multithreaded while the slave is single-threaded. If the slave SQL thread was the only one running, we wouldn’t have any additional concurrency issues on the slave. But in real life, a slave also does other jobs besides replicating from the master, and therefore the slave’s SQL thread can be affected by concurrency issues just like any other connection thread.

When the slave SQL thread writes to tables that are currently used by user threads, it acquires all locks for such tables, just like any other multithreaded case. You can see this by running SHOW PROCESSLIST:

slave> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 1

User: system user

Host:

db: NULL

Command: Connect

Time: 115

State: Waiting for master to send event

Info: NULL

*************************** 2. row ***************************

Id: 2

User: system user

Host:

db: test

Command: Connect

Time: 16

State: update

Info: INSERT INTO t1 VALUES(3)

*************************** 3. row ***************************

Id: 3

User: msandbox

Host: localhost

db: test

Command: Sleep

Time: 28

State:

Info: NULL

*************************** 4. row ***************************

Id: 4

User: msandbox

Host: localhost

db: test

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

4 rows in set (0.00 sec)

The slave SQL thread is listed in row 2 and is run by system user, a special user that executes slave queries. I now run the following query in a parallel connection, before starting the slave, to demonstrate how the SQL thread waits for other threads to finish before it can execute the update:

SELECT * FROM t1 FOR UPDATE;

After taking this SHOW PROCESSLIST output, I rolled back the parallel query, so the SQL thread could successfully finish the query:

slave> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 1

User: system user

Host:

db: NULL

Command: Connect

Time: 267

State: Waiting for master to send event

Info: NULL

*************************** 2. row ***************************

Id: 2

User: system user

Host:

db: NULL

Command: Connect

Time: 168

State: Slave has read all relay log; waiting for the slave I/O thread to update it

Info: NULL

...

You can also encounter a situation where the slave SQL thread is holding locks on rows that your application is trying to access. To see when this happens, examine the output of SHOW PROCESSLIST and SHOW ENGINE INNODB STATUS.

The solution to both of these situations is either to wait until the active thread finishes or to roll back a user transaction if it waits for a lock for a long time.


[8] This will change, as discussed in Multithreaded Slave.

[9] There are exceptions to this rule that I would call “semi-safe,” like when a nontransactional table is read-only or write-only (a log table, for instance, is write-only in this context). When you use such tables together with transactional tables, plan carefully.

Effectively Using MySQL Troubleshooting Tools

To end this chapter, I want to repeat the descriptions of tools we used and describe some of their useful features that I bypassed before.

SHOW PROCESSLIST and the INFORMATION_SCHEMA.PROCESSLIST Table

SHOW PROCESSLIST is the first tool to use when you suspect a concurrency issue. It will not show the relationships among statements in multistatement transactions, but will expose the symptoms of the problem to confirm that more investigation of concurrency is needed. The main symptom is a thread that’s in the “Sleep” state for a long time.

The examples in this chapter used the short version of SHOW PROCESSLIST, which crops long queries. SHOW FULL PROCESSLIST shows the full query, which can be convenient if you have long queries and it’s not easy to guess the full version from just the beginning of the query.

Starting with version 5.1, MySQL also offers the INFORMATION_SCHEMA.PROCESSLIST table, with the same data as SHOW FULL PROCESSLIST. On busy servers, the table greatly facilitates troubleshooting because you can use SQL to narrow down what you see:

slave2> SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST\G

*************************** 1. row ***************************

ID: 5

USER: msandbox

HOST: localhost

DB: information_schema

COMMAND: Query

TIME: 0

STATE: executing

INFO: SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST

*************************** 2. row ***************************

ID: 4

USER: msandbox

HOST: localhost

DB: test

COMMAND: Sleep

TIME: 583

STATE:

INFO: NULL

*************************** 3. row ***************************

ID: 2

USER: system user

HOST:

DB: NULL

COMMAND: Connect

TIME: 940

STATE: Slave has read all relay log; waiting for the slave I/O thread t

INFO: NULL

*************************** 4. row ***************************

ID: 1

USER: system user

HOST:

DB: NULL

COMMAND: Connect

TIME: 1936

STATE: Waiting for master to send event

INFO: NULL

4 rows in set (0.00 sec)

So why did I use SHOW PROCESSLIST instead of the PROCESSLIST table in the majority of my examples? First, SHOW PROCESSLIST is supported in all MySQL versions. When doing my support job, I can request this information from customers without checking in advance what version of MySQL they use. Another explanation comes from my support job as well: when working with customers, we don’t know all the details about their environment, so looking at the nonfiltered process list can provide some insights about it.

Considerations are different when you debug your own application. Because you already have information about which processes are important, you can limit output using queries with a WHERE condition, such as:

mysql> SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 50

mysql> SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE INFO LIKE 'my query%'

This can save time when analyzing the results.

SHOW ENGINE INNODB STATUS and InnoDB Monitors

These tools display the most important information you need when using InnoDB tables. You can get the same information through a SHOW ENGINE INNODB STATUS command or by creating InnoDB monitor tables. These are not tables intended for users, but a way to tell InnoDB to write information about InnoDB’s status in the error log every few seconds.

We will discuss standard and lock monitors here. InnoDB also offers Tablespace and Table monitors, which print information from the shared tablespace and the InnoDB internal dictionary, respectively. Tablespace and Table monitors are not directly related to the topic of concurrency, so I’m skipping them here.

The standard InnoDB monitor is what you get when you call SHOW ENGINE INNODB STATUS. In regards to concurrency, we are interesting in SEMAPHORES, LATEST DETECTED DEADLOCK, and TRANSACTIONS.

The SEMAPHORES section contains information about threads waiting for a mutex or a rw-lock semaphore. Pay attention to the number of waiting threads or to threads that wait for a long time. Long waits are not necessarily symptoms of a problem, however. For example, CHECK TABLErunning on a huge table will hold a semaphore for a long time. But if you see a long wait during normal operations, you should check whether your installation can handle the number of InnoDB threads you have. Lowering the value of innodb_thread_concurrency can help.

The LATEST DETECTED DEADLOCK section contains information about the most recently detected deadlock. It is empty if no deadlock has been detected since server startup. You can monitor this section to see whether there are deadlocks in your application. Knowing there are deadlocks, you can either move queries apart in your application so there is no conflict leading to deadlock or add code that will handle it gracefully.

The TRANSACTIONS section contains information about all currently running transactions. For the discussion in this chapter, it is especially important to note that this section lists information about all locks for which all active transactions are waiting. If InnoDB Lock Monitor is turned on, this section will also contain information about which locks it holds. This is very useful information for debugging lock waits.

To demonstrate how the InnoDB Lock Monitor can help you debug locks, let’s go back to the example in Hidden Queries. If we turn on the InnoDB Lock Monitor and run same queries, we will see a bit more in the SHOW ENGINE INNODB STATUS output:

mysql> SHOW ENGINE INNODB STATUS \G

*************************** 1. row ***************************

Type: InnoDB

Name:

Status:

=====================================

110809 14:03:45 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 6 seconds

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 12, signal count 12

Mutex spin waits 0, rounds 209, OS waits 7

RW-shared spins 10, OS waits 5; RW-excl spins 0, OS waits 0

------------

TRANSACTIONS

------------

Trx id counter 0 26244358

Purge done for trx's n:o < 0 26244356 undo n:o < 0 0

History list length 4

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0 0, not started, OS thread id 101493760

MySQL thread id 219, query id 96 localhost root

show engine innodb status

---TRANSACTION 0 26244357, ACTIVE 1 sec, OS thread id 101357568 starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 217, query id 95 localhost root Updating

update t set a=36 where a=6

------- TRX HAS BEEN WAITING 1 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26244357 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907504; asc u ;; 2:

len 7; hex 0000000032081c; asc 2 ;;

------------------

TABLE LOCK table `test`.`t` trx id 0 26244357 lock mode IX

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26244357 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907504; asc u ;; 2:

len 7; hex 0000000032081c; asc 2 ;;

---TRANSACTION 0 26244356, ACTIVE 6 sec, OS thread id 101099008

2 lock struct(s), heap size 320, 1 row lock(s), undo log entries 2

MySQL thread id 184, query id 93 localhost root

TABLE LOCK table `test`.`t` trx id 0 26244356 lock mode IX

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26244356 lock_mode X locks rec but not gap

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907504; asc u ;; 2:

len 7; hex 0000000032081c; asc 2 ;;

Compare this output with the output without locks you saw earlier. The important difference is that now you have information about the transaction that holds the lock:

---TRANSACTION 0 26244356, ACTIVE 6 sec, OS thread id 101099008

2 lock struct(s), heap size 320, 1 row lock(s), undo log entries 2

MySQL thread id 184, query id 93 localhost root

TABLE LOCK table `test`.`t` trx id 0 26244356 lock mode IX

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26244356 lock_mode X locks rec but not gap

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907504; asc u ;; 2:

len 7; hex 0000000032081c; asc 2 ;;

I put in the whole transaction information to show the identification information for the lock “Record lock, heap no 6.” We are interested in the following:

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907504; asc u ;; 2:

len 7; hex 0000000032081c; asc 2 ;;

This is the physical content of the locked record. And when you check the waiting transaction, you can see that it waits for exactly the same lock (pay attention to PHYSICAL RECORD):

update t set a=36 where a=6

------- TRX HAS BEEN WAITING 1 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26244357 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907504; asc u ;; 2:

len 7; hex 0000000032081c; asc 2 ;;

This is very useful information because you can clearly correlate the transactions waiting for locks with those holding locks.

Earlier we discussed InnoDB INFORMATION_SCHEMA tables. Why would one use InnoDB Monitors when you could just look at these tables? The reason is that INFORMATION_SCHEMA contains only current information, whereas InnoDB monitors can dump information into the error logfile so that you can analyze it later. This information is very useful when you want to find out what is going on while your application is running.

There are other useful parts to the InnoDB Monitor output. It makes sense to look at FILE I/O, INSERT BUFFER AND ADAPTIVE HASH INDEX, BUFFER POOL AND MEMORY, and ROW OPERATIONS when you suspect an undiagnosed deadlock or just a very long lock. Whenever read/write operations are stopped and threads are waiting for each other, this is probably a locking issue, even if the locked thread is in the “Updating” State.

INFORMATION_SCHEMA Tables

Starting with the InnoDB Plugin for version 5.1, MySQL supports new InnoDB INFORMATION_SCHEMA tables. Those related to concurrency issues are:

INNODB_TRX

Contains a list of all transactions that are currently running

INNODB_LOCKS

Contains information about the current locks held by transactions and which locks each transaction is waiting for

INNODB_LOCK_WAITS

Contains information about the locks transactions are waiting for

These tables are easy to use and can quickly provide information about transaction states and locks. You saw examples earlier in this chapter that illustrate everything I need to say on the subject.

TYPICAL INFORMATION_SCHEMA QUERIES THAT ARE USEFUL WHEN DEBUGGING CONCURRENCY ISSUES

Information about all locks transactions are waiting for:

SELECT * FROM INNODB_LOCK_WAITS

A list of blocking transactions:

SELECT * FROM INNODB_LOCKS WHERE LOCK_TRX_ID IN

(SELECT BLOCKING_TRX_ID FROM INNODB_LOCK_WAITS)

or:

SELECT INNODB_LOCKS.* FROM INNODB_LOCKS JOIN INNODB_LOCK_WAITS

ON (INNODB_LOCKS.LOCK_TRX_ID = INNODB_LOCK_WAITS.BLOCKING_TRX_ID)

A list of locks on particular table:

SELECT * FROM INNODB_LOCKS WHERE LOCK_TABLE = 'db_name.table_name'

A list of transactions waiting for locks:

SELECT TRX_ID, TRX_REQUESTED_LOCK_ID, TRX_MYSQL_THREAD_ID, TRX_QUERY

FROM INNODB_TRX WHERE TRX_STATE = 'LOCK WAIT'

PERFORMANCE_SCHEMA Tables

The performance schema lets you monitor MySQL server execution at a low level. It is implemented as a database containing tables based on the PERFORMANCE_SCHEMA storage engine. This storage engine collects event data using “instrumentation points” defined in the server source code. Tables in PERFORMANCE_SCHEMA do not use persistent disk storage.

To debug concurrency issues, you can use the COND_INSTANCES, FILE_INSTANCES, MUTEX_INSTANCES, and RWLOCK_INSTANCES tables along with various EVENTS_WAITS_* tables. The THREADS table can help you correlate internal threads with MySQL’s user threads.

All *INSTANCES tables contain NAME and OBJECT_INSTANCE_BEGIN fields, which are the name of the instance and the memory address of the object being instrumented, respectively.

The COND_INSTANCES table contains a list of wait conditions that were created after the server started. Conditions (a term that will be familiar to programmers who have studied concurrency) are a way to make one thread wait for another.

The FILE_INSTANCES table contains a list of files seen by the performance schema. A filename is inserted into this table the first time a server opens it and stays there until it is deleted from the disk. Currently open files have a positive OPEN_COUNT. The Number field contains the number of file handles that currently use the file.

The MUTEX_INSTANCES table contains a list of mutexes seen by the performance schema. Mutexes where LOCKED_BY_THREAD_ID is NOT NULL are those that are currently locked.

The RWLOCK_INSTANCES table is a list of all read/write lock instances. WRITE_LOCKED_BY_THREAD_ID shows the ID of the thread that holds the lock. READ_LOCKED_BY_COUNT shows how many read locks are currently acquired on the instance.

EVENTS_WAITS_* tables contain information about wait events for each thread.

For example, to find out which kind of lock a transaction is waiting for, you can use following query:

mysql> SELECT THREAD_ID, EVENT_NAME, SOURCE, OPERATION, PROCESSLIST_ID \

FROM events_waits_current JOIN threads USING (THREAD_ID) WHERE PROCESSLIST_ID > 0\G

*************************** 1. row ***************************

THREAD_ID: 36

EVENT_NAME: wait/synch/mutex/mysys/THR_LOCK::mutex

SOURCE: thr_lock.c:550

OPERATION: lock

PROCESSLIST_ID: 20

*************************** 2. row ***************************

THREAD_ID: 41

EVENT_NAME: wait/synch/mutex/sql/THD::LOCK_thd_data

SOURCE: sql_class.cc:3754

OPERATION: lock

PROCESSLIST_ID: 25

*************************** 3. row ***************************

THREAD_ID: 40

EVENT_NAME: wait/synch/mutex/innodb/kernel_mutex

SOURCE: srv0srv.c:1573

OPERATION: lock

PROCESSLIST_ID: 24

3 rows in set (0.00 sec)

This shows that thread 24 is waiting on InnoDB kernel_mutex, while in SHOW PROCESSLIST, the same query is in the Updating state:

mysql> SHOW PROCESSLIST \G

*************************** 1. row ***************************

Id: 20

User: root

Host: localhost

db: performance_schema

Command: Query

Time: 0

State: NULL

Info: show processlist

*************************** 2. row ***************************

Id: 24

User: root

Host: localhost

db: sbtest

Command: Query

Time: 3

State: Updating

Info: update example set f2=f2*2

*************************** 3. row ***************************

Id: 25

User: root

Host: localhost

db: sbtest

Command: Sleep

Time: 228

State:

Info: NULL

3 rows in set (0.00 sec)

The THREADS table contains a list of all currently running threads. The IDs are internally assigned and are totally different from the IDs of the connection threads. Furthermore, the server runs a lot of internal threads that are not related to connection threads. (For instance, slaves run an SQL thread and I/O thread.) This table contains a PROCESSLIST_ID field that shows which connection thread ID, if any, is associated with each particular thread.

*_SUMMARY_* tables contain aggregate information for terminated events.

As an example, to find out which tables are used most, you can try the following query. It will work for storage engines that store table data in separate files, such as MyISAM or InnoDB, when the option innodb_file_per_table is in use.

mysql> SELECT * FROM file_summary_by_instance WHERE file_name \

LIKE CONCAT(@@datadir,'sbtest/%') ORDER BY SUM_NUMBER_OF_BYTES_WRITE DESC, \

SUM_NUMBER_OF_BYTES_READ DESC \G

*************************** 1. row ***************************

FILE_NAME: /home/ssmirnov/mysql-5.5/data/sbtest/example.ibd

EVENT_NAME: wait/io/file/innodb/innodb_data_file

COUNT_READ: 0

COUNT_WRITE: 8

SUM_NUMBER_OF_BYTES_READ: 0

SUM_NUMBER_OF_BYTES_WRITE: 196608

*************************** 2. row ***************************

FILE_NAME: /home/ssmirnov/mysql-5.5/data/sbtest/example.frm

EVENT_NAME: wait/io/file/sql/FRM

COUNT_READ: 14

COUNT_WRITE: 17

SUM_NUMBER_OF_BYTES_READ: 948

SUM_NUMBER_OF_BYTES_WRITE: 4570

*************************** 3. row ***************************

FILE_NAME: /home/ssmirnov/mysql-5.5/data/sbtest/sbtest.ibd

EVENT_NAME: wait/io/file/innodb/innodb_data_file

COUNT_READ: 5236

COUNT_WRITE: 0

SUM_NUMBER_OF_BYTES_READ: 85786624

SUM_NUMBER_OF_BYTES_WRITE: 0

*************************** 4. row ***************************

FILE_NAME: /home/ssmirnov/mysql-5.5/data/sbtest/sbtest.frm

EVENT_NAME: wait/io/file/sql/FRM

COUNT_READ: 7

COUNT_WRITE: 0

SUM_NUMBER_OF_BYTES_READ: 1141

SUM_NUMBER_OF_BYTES_WRITE: 0

*************************** 5. row ***************************

FILE_NAME: /home/ssmirnov/mysql-5.5/data/sbtest/db.opt

EVENT_NAME: wait/io/file/sql/dbopt

COUNT_READ: 0

COUNT_WRITE: 0

SUM_NUMBER_OF_BYTES_READ: 0

SUM_NUMBER_OF_BYTES_WRITE: 0

5 rows in set (0.00 sec)

Log Files

There are two MySQL server logfiles that can help you with concurrency problems: the error logfile and the general query logfile.

The error logfile contains information about what is going wrong. It will contain information about unsafe replication statements, intentional crashes due to long semaphore waits, and so on. I have already advised in Chapter 1 that the error logfile is the first place to look when you encounter a problem. The same tip applies to concurrency as well. When you don’t know the source of a problem, look at the error log first.

The general query log can help you to find a query that cannot be found by other means. One example is a multistatement transaction that blocks others. If this is called from an application, it is sometimes hard to determine which exact query caused the problem. In such cases, turn on the general query log, wait until the problem reoccurs, and then search the general log for queries by a particular thread. A typical troubleshooting pattern is to look in the InnoDB Monitor output, find the ID of the MySQL thread of the transaction that probably blocking another transaction, and then run the query:

SELECT argument, event_time FROM mysql.general_log WHERE thread_id =

THIS_THREAD_ID ORDER BY event_time

This will return a list of queries that were run by the locking thread. You should be able to find a BEGIN or START TRANSACTION statement that kicks off the whole multistatement transaction. Once you discover the offending transaction, you can research what to change in the application to prevent similar locking in the future.

To illustrate this, let’s return to the example from Hidden Queries. In the output from SHOW PROCESSLIST we saw:

mysql> SHOW PROCESSLIST\G

*************************** 1. row ***************************

Id: 184

User: root

Host: localhost

db: test

Command: Sleep

Time: 25

State:

Info: NULL

*************************** 2. row ***************************

Id: 217

User: root

Host: localhost

db: test

Command: Query

Time: 5

State: Updating

Info: UPDATE t SET a=36 WHERE a=6

*************************** 3. row ***************************

Id: 219

User: root

Host: localhost

db: mysql

Command: Query

Time: 0

State: NULL

Info: SHOW PROCESSLIST

3 rows in set (0.00 sec)

SHOW ENGINE INNODB STATUS shows:

mysql> SHOW ENGINE INNODB STATUS \G

*************************** 1. row ***************************

Type: InnoDB

Name:

Status:

=====================================

110809 13:57:21 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 33 seconds

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 5, signal count 5

Mutex spin waits 0, rounds 80, OS waits 2

RW-shared spins 6, OS waits 3; RW-excl spins 0, OS waits 0

------------

TRANSACTIONS

------------

Trx id counter 0 26244354

Purge done for trx's n:o < 0 26243867 undo n:o < 0 0

History list length 3

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0 0, not started, OS thread id 101493760

MySQL thread id 219, query id 86 localhost root

SHOW ENGINE INNODB STATUS

---TRANSACTION 0 26244353, ACTIVE 119 sec, OS thread id 101357568 starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)

MySQL thread id 217, query id 85 localhost root Updating

UPDATE t SET a=36 WHERE a=6

------- TRX HAS BEEN WAITING 1 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 349 page no 3 n bits 88 index `PRIMARY` of table `test`.`t`

trx id 0 26244353 lock_mode X locks rec but not gap waiting

Record lock, heap no 6 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 00000006; asc ;; 1: len 6; hex 000001907500; asc u ;; 2:

len 7; hex 00000000320762; asc 2 b;;

------------------

---TRANSACTION 0 26244352, ACTIVE 139 sec, OS thread id 101099008

2 lock struct(s), heap size 320, 1 row lock(s), undo log entries 2

MySQL thread id 184, query id 79 localhost root

...

Our blocked transaction is 26244353 of MySQL thread 217. The only transaction currently holding locks is 26244352 of MySQL thread 184. But it is not at all clear what thread 184 is doing until we look at the general query logfile:

mysql> SELECT argument, event_time FROM mysql.general_log WHERE

thread_id=184 ORDER BY event_time;

+-----------------------------+---------------------+

| argument | event_time |

+-----------------------------+---------------------+

| begin | 2011-08-09 13:55:58 |

| update t set a=26 where a=6 | 2011-08-09 13:56:09 |

+-----------------------------+---------------------+

2 rows in set (0.15 sec)

From this output, we can easily see that our transaction is updating the same row in the same table as the blocked transaction. With this knowledge, we can recode our application.