MySQL Troubleshooting (2012)

Chapter 6. Troubleshooting Techniques and Tools

I have discussed many troubleshooting techniques and tools earlier in this book. Some of them were explained thoroughly, and in other cases I only touched on aspects of their use. This chapter adds details left out of previous chapters. I have tried to avoid repetition. Many techniques and tools depend on each other, so they are combined in this chapter.

There are too many tools to describe them all, so here I stick to those that I find absolutely necessary. They are mostly command-line tools that either come with the MySQL distribution or are provided as separate packages. I include a few descriptions of third-party tools, again command-line utilities. I did this not because I don’t like third-party tools, but to let you know about the great tools that come with MySQL and that you therefore always have. One of the advantages of MySQL tools is that they are always available, which is very important for our customers. Some companies have rules that prevent their employees from downloading a third-party tool. Therefore, when doing customer support, we always prefer tools that come with the MySQL distribution.

For similar reasons, I don’t describe graphical tools here. The command-line tools pose no special requirements, such as the X Window System or a particular operating system, whereas graphical tools are more demanding.

Finally, I know MySQL’s tools much better than their analogs in other packages. If you find more powerful third-party tools, use them. But knowing what can be done with simple tools is always useful.

The Query

In Chapter 1 we learned how a single query can affect performance across the whole server and how to find which one is misbehaving. Here I’ll add a few words about this type of problem.

A problematic query fired by an application can be found by checking the error logs, through the use of output functions in the application that automatically logs queries sent to the MySQL server, using a library written for such a purpose in the application, and in the general query logfile. Here I want to discuss how such logging can be tuned.

The general query log is missing some useful information that can help you debug your query, such as query execution time, information about errors and warnings, and the result set. Information about query execution time can be used to find slow queries. Of course, you can log all this special data using a custom library or by simply adding output functions everywhere in your application. But before you start tuning the application, you can still use a built-in resource: the slow query log.

Slow Query Log

The slow query log contains queries that run longer than long_query_time seconds. The default value of this variable is 10, but you can decrease it. In fact, by setting the value to zero, you can log all queries. Since version 5.1, you can turn logging to the slow log on and off dynamically as needed, just as for the general query log. You can also redirect output to a table, so it can be queried like any other.

For performance tuning, find the slowest queries, check them one by one in isolation, and then change the query statement or make other necessary changes, such as to indexes. You can start from the default long_query_time, then decrease it bit by bit to zero to find more and more queries. This method reveals the slowest queries first.

By default, this option does not log administrative statements and fast queries that don’t use indexes, but you can log such queries by setting the log-slow-admin-statements and log_queries_not_using_indexes options, respectively.

One disadvantage of the slow query log is that you can’t omit queries that you think don’t need to be optimized. Writing the log to a table can help you screen out what you don’t need to see because you can use WHERE clauses, grouping, and sorting to focus down on the queries that you think are important.

MYSQLDUMPSLOW

The mysqldumpslow utility prints the contents of the slow query log in a kind of summary format. It groups queries, so if two queries are literally the same but use different parameters, they are printed once, together with the number of execution times. This means the utility treats queries such as SELECT * FROM t2 WHERE f1=1 and SELECT * FROM t2 WHERE f1=2 as the same because the actual value of the parameter f1 usually does not affect query execution time. This is especially convenient if you want to find slow queries that use a similar pattern in an application that runs thousands of them.

$mysqldumpslow /Users/apple/Applications/mysql-5.1/data512/mysqld512-apple-slow.log

Reading mysql slow query log from

/Users/apple/Applications/mysql-5.1/data512/mysqld512-apple-slow.log

Count: 3 Time=0.03s (0s) Lock=0.03s (0s) Rows=0.7 (2), root[root]@localhost

SELECT * FROM t2 WHERE f1=N

Count: 1 Time=0.03s (0s) Lock=0.00s (0s) Rows=1.0 (1), root[root]@localhost

select @@version_comment limit N

Count: 1 Time=0.02s (0s) Lock=0.03s (0s) Rows=3.0 (3), root[root]@localhost

SELECT * FROM t2

Count: 3 Time=0.00s (0s) Lock=0.00s (0s) Rows=0.3 (1), root[root]@localhost

select TEXT from test where ID=N

Count: 1 Time=0.00s (0s) Lock=0.00s (0s) Rows=3.0 (3), root[root]@localhost

select * from t2

Note that although the utility is smart enough to group similar queries using different parameters, it treats queries that differ syntactically only in insignificant ways—such as using a different letter case or having different whitespace—as different queries.

Tools That Can Be Customized

Usually, just finding slow queries is not enough. You’ll want to know such things as which error or warning was returned and how many rows were updated or selected. There are three ways to get this information: through your application, by writing a plug-in, or by using a proxy.

An application can use the methods described in Getting Information About a Query to receive and log information; I won’t offer details or examples, because they depend so much on your programming language and other context. If you are interested in query execution time, just measure it in the application before and after a mysql_query or mysql_real_query call. The advantage of this method is that it is very tunable. The disadvantage is that you need to modify the application, which isn’t possible for people using third-party software.

If you want to write a MySQL server plug-in for auditing purposes, refer to the section “Writing Audit Plugins” in the MySQL Reference Manual. Once installed, a MySQL server plug-in becomes part of the MySQL server and can be accessed through SQL queries. Besides this advantage, the solution is completely independent from the application, does not need any changes to existing code, and can be used by multiple applications. The disadvantage is that it must be compiled for a specific version of the MySQL server and installed, which is not good if you distribute your plug-in to a wide audience.

The third solution is to use a scriptable proxy. A proxy is a daemon that sits between the server and client and can be configured independently from both the server and the client. Because it gets all traffic, you can do whatever you want with it. The advantages of this method are that you are completely independent from both the server and client, so you don’t need to change anything that you have inherited from other people. The disadvantage is that the proxy adds an additional layer of processing, so it will slow down the application and create a new single point of failure between the client and server.

MYSQL PROXY

MySQL Proxy is a scriptable daemon that supports the MySQL protocol and sits between the MySQL server and the application. The application should be configured in such a way that all queries go through the proxy. This generally just means setting the proper hostname and port.

MySQL Proxy supports the Lua scripting language. It allows query and result set rewriting, logging, load balancing, and much more. Here I’ll discuss only logging because that’s what will help you debug slow queries.

For auditing purposes, you need to write a Lua script that saves the necessary information. A sample script that can imitate general query log behavior and, in addition, save query execution time can look like:

function read_query( packet )

if packet:byte() == proxy.COM_QUERY then

print(os.date("%d%m%y %H:%M:%S") .. "\t"

.. proxy.connection.server.thread_id

.."\tQuery\t" .. packet:sub(2))

proxy.queries:append(1, packet )

return proxy.PROXY_SEND_QUERY

end

function read_query_result(inj)

print("Query execution time: " .. (inj.query_time / 1000) .. "ms,\t"

.. "Response time: " .. (inj.response_time / 1000) .. "ms,\t"

.. "Total time: " .. ((inj.query_time + inj.response_time) / 1000) .. "ms")

end

Call the script as follows:

$mysql-proxy --admin-username=admin --admin-password=foo \

--admin-lua-script=./lib/mysql-proxy/lua/admin.lua \

--proxy-address=127.0.0.1:4040 --proxy-backend-addresses=127.0.0.1:3355 \

--proxy-lua-script=`pwd`/general_log.lua

The results will look like:

$mysql-proxy --admin-username=admin --admin-password=foo \

--admin-lua-script=./lib/mysql-proxy/lua/admin.lua \

--proxy-address=127.0.0.1:4040 --proxy-backend-addresses=127.0.0.1:3355 \

--proxy-lua-script=`pwd`/general_log.lua

031111 01:51:11 20 Query show tables

Query execution time: 376.57ms, Response time: 376.612ms, Total time: 753.182ms

031111 01:51:19 20 Query select * from t1

Query execution time: 246.849ms, Response time: 246.875ms, Total time: 493.724ms

031111 01:51:27 20 Query select * from t3

Query execution time: 689.772ms, Response time: 689.801ms, Total time: 1379.573ms

031111 01:51:39 20 Query select count(*) from t4

Query execution time: 280.751ms, Response time: 280.777ms, Total time: 561.528ms

You can adapt this script to your needs. MySQL Proxy has access to the query and result set both before and after its execution, which allows you to save much more information than a simple general logfile: errors, warnings, number of affected rows, query execution time, and even the full result set.

A lot of useful scripts can also be found at MySQL Forge.

The MySQL Command-Line Interface

MySQL command-line client, also known as MySQL CLI, is the first tool you should use for testing most situations.

When queries do not work properly, the first suspect is a bug in the application. But every query can be affected by numerous issues, especially client and server options. So if you think you’re sending the correct query but getting wrong results, test it in the MySQL CLI. This is the easiest and fastest way to confirm your guess.

When members of MySQL Support Bugs Verification Group suspect client bugs or (most often) misconfigurations, we always ask for queries that we can test in the MySQL CLI. Here I’ll explain briefly why this is important and why other tools do not suit the first round of troubleshooting so well.

NOTE

Besides the regular client options, which affect every MySQL client application, Connector/J and Connector/ODBC have their own APIs and configurations. Queries can also be affected by conversion rules carried out by these interfaces. Therefore, if you use one of them, it becomes critical to test a problematic query in the MySQL CLI.

To test an application that gets results that are different from what you expect, start the MySQL CLI with the --column-type-info option, which prints information about data types:

$mysql --column-type-info test

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 415

Server version: 5.1.60-debug Source distribution

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SELECT * FROM t1;

Field 1: `f1`

Catalog: `def`

Database: `test`

Table: `t1`

Org_table: `t1`

Type: LONG

Collation: binary (63)

Length: 11

Max_length: 2

Decimals: 0

Flags: NOT_NULL PRI_KEY AUTO_INCREMENT NUM PART_KEY

Field 2: `f2`

Catalog: `def`

Database: `test`

Table: `t1`

Org_table: `t1`

Type: BLOB

Collation: latin1_swedish_ci (8)

Length: 65535

Max_length: 32

Decimals: 0

Flags: BLOB

+----+----------------------------------+

| f1 | f2 |

+----+----------------------------------+

| 1 | f9f760a2dc91dfaf1cbc95046b249a3b |

| 2 | e81077a403cc27525fdbb587451e7935 |

| 3 | a003a1256c0178e0c4d37a063ad1786b |

| 4 | 2447565c49917f2daeaac192614eabe8 |

| 6 | 6bfb21c57cc3a8de22dc4dbf635fdc77 |

| 7 | 2d9f5350ba5b914a8f4abf31b5ae975c |

| 8 | 57e87a3c55d053b5ab428f5da7f6ba28 |

| 9 | ad2ede5e02ce1da95dcd8f71426d5e7b |

| 13 | 65400ab09cdc725ec5bafe2ac4f5045d |

| 14 | 48f1b1e99041365a74444f75a6689d64 |

| 15 | 1f6c558fe2c492f1da2ebfb42d9f53dc |

| 16 | ff931f7ac8c8035a929dc35fee444332 |

| 17 | f26f6b6e8d16ae5603cf8c02409f4bb5 |

| 18 | 239ca93bf7b5fd82a53e731004cba761 |

| 19 | 1f985d1fe9efa14453a964e2c4657ab5 |

| 20 | 1d599460b91f2d892c024fe5a64f7d6d |

+----+----------------------------------+

16 rows in set (0.09 sec)

This output tells you basic meta-information about the field, such as its data type and collation—information that may be changed by configuration options or application settings. By running it in the MySQL CLI, you can see what the query would normally do on the server, and you can guess there’s a problem if the information received by the server from the application is different. Let’s examine this output line by line.

mysql> SELECT `name` FROM `t1`;

Field 1: `name`

The preceding output shows the name of a field.

Catalog: `def`

The catalog name, which is always def.

Database: `test`

The database currently being used.

Table: `t1`

The table name, but the output shows the alias of the table when you use a syntax such as select field_name from table_name as alias_name.

Org_table: `t1`

The original table name, which is useful to know if the previous line showed an alias.

Type: VAR_STRING

The preceding line shows the field type.

Collation: latin1_swedish_ci (8)

The collation.

Length: 255

The field length as defined in the table’s definition.

Max_length: 5

The length of the largest value in the field, in the result set that was returned.

Decimals: 0

The number of decimals in the field, if it is an integer type.

Flags:

Field flags, if any. For instance, a primary key field will have the flags PRI_KEY and AUTO_INCREMENT.

+-------+

| name |

+-------+

| sveta |

+-------+

1 row in set (0.00 sec)

The result set of the query.

What if your favorite MySQL client is something other than the MySQL CLI? You can test there, but remember that it may introduce side effects. This is particularly true of GUI clients. For example, if the client uses JDBC, it is affected by its configuration, which wouldn’t affect the MySQL CLI. Other clients have preconfigured character sets, such as MySQL Workbench, which supports only UTF-8. Such a setup prevents you from testing another character set. Some clients (Workbench again) disconnect and reconnect after each query. Others can be affected by small thread buffers, which is common for web-based clients. Sometimes you can reconfigure the client, but when in doubt, it’s much easier to switch to the command line and try the query in the MySQL CLI.

One of the strengths of the MySQL CLI is that it’s very transparent in regard to options: you can always see and tune its configuration. Like every piece of software, admittedly, the MySQL CLI could have a bug. But the tool is in regular, heavy use by millions of users and actively used internally at Oracle, so the chances that you’ll be affected by a bug in it is very low.

NOTE

All of the connectors in their unconfigured state, with the exception of Connector/PHP, start with character_set_client set to UTF8 and character_set_results set to NULL. This is actually a “debugging” mode regarding charsets and is not recommended in a command-line client.

The reason behind this behavior is to let the driver logic determine the best way of displaying and storing results to or from the client and to avoid “double-conversion” bugs, which are rather common, by preventing the server from converting textual results into charset results. However, this trick does not work on ad-hoc queries, such as SHOW CREATE TABLE, where it should treat BINARYas UTF8, or SELECT varbinary_col FROM some_table, where it really should be binary, or with SELECT CONCAT(char_field1, 1) AS a where a will have BINARY flag set.

Thus, all of the connectors have some sort of workaround in their connection options that tells the driver to treat function results as UTF8 instead of BINARY strings. Also, even though each connector has its own default encoding, they issue SET NAMES UTF8. This is mostly to avoid the default behavior of the libmysqlclient library, which sets all character-set-related variables to latin1.

§ If you think a query should run fine but it is giving you unexpected results, before you consider the possibility of a bug in the MySQL server code, try the query in the MySQL CLI.

NOTE

I love automation. When I create tests for bug reports, I use a script that runs MySQL Test Framework tests (see MySQL Test Framework) in a bunch of MySQL server distributions. This helps me to test a problem in many versions with single a command.

But once this habit played a bad joke on me. I tested one of the bug reports and could not repeat it. I spent a lot of time communicating with the reporter and tried many options without any luck. I relied entirely on our test suite and didn’t suspect that the client could be introducing side effects. Then my colleague tried the test case in the MySQL CLI and got the absolute same results as the original reporter. The bug was confirmed and fixed.

This experience shows how dangerous it is to ignore possible client differences and how important it is to try the MySQL CLI before anything else.

Effects of the Environment

I already discussed some effects of environments such as concurrent threads, the operating system, hardware, concurrently running software, and the MySQL server and client options in this book. But a query, even if it is running in a single client connected to a dedicated MySQL server, can also be affected by the context in which it is run.

When you call a query from a stored procedure, function, trigger, or event, these contexts can override current session options with their own defaults. Therefore, if you encounter a problem you can’t explain, try the same query outside of the routine. If the results are different, check the routine’s character set and SQL mode. Examine the body of the routine to check whether all necessary objects exist and whether a variable that can affect the query was set. Another environment variable that matters is time_zone, which affects the results of time functions such as NOW()and CURDATE().

§ If a query does not work properly, check the environment in which it was called.

Sandboxes

A sandbox is an isolated environment for running an application, where it cannot affect anything outside of that environment. Throughout this book, I’ve been encouraging you to “try” various configuration options and changes to databases. But some such “tries” can slow down the application or even crash the application or the database. This is not what most users want. Instead, you can use a sandbox to isolate the system you’re testing in its own environment, where anything you do wrong doesn’t matter.

In the MySQL world, Giuseppe Maxia introduced this term by creating a tool named the MySQL Sandbox. I will describe the MySQL Sandbox and how it can be helpful a bit later, but here I want to briefly show some variants of sandboxes.

The simplest way to safely test queries on a table is to make a copy, so that the original table is secured and can be used by the application as usual while you are be experimenting with the copy. You also won’t have to worry about reverting changes that you inadvertently make:

CREATE TABLE test_problem LIKE problem;

INSERT INTO test_problem SELECT * FROM problem;

One good thing with this solution is that you can copy just part of the data, using WHERE to limit the number of rows. For example, suppose you are testing a complex query and are sure that it is correctly executing one of its WHERE clauses. You can limit your test table to items meeting that condition when you create it, and then have a smaller table on which to test the query:

INSERT INTO test_problem SELECT FROM problem WHERE condition]

You can then simplify the query as well by removing that condition. This can save a lot of time when the original table is huge. This technique is also useful when the WHERE clause worked properly but a GROUP BY grouping or ORDER BY sort is wrong.

If a query accesses more than one table or you just want to test queries on different tables, it makes sense to create a whole separate database:

CREATE DATABASE sandbox;

USE sandbox;

CREATE TABLE problem LIKE production.problem;

INSERT INTO problem SELECT * FROM production.problem [WHERE ...]

In this case, you will have an environment absolutely reproducing your production database, but you won’t harm anything, even if you damage rows in the copy.

These two methods are good for query rewriting and similar problems. But if the server crashes or uses a lot of resources, it’s best not to test anything on it. Instead, set up a development server just for testing purposes and copy the data from the production server. This also can help if you are planning an upgrade or want to check whether a particular bug is fixed in a newer version of MySQL.

When you create an application in the first place, you can just upgrade the MySQL server on your development machine. But if the application has been running for a long time and you need to test how a particular MySQL version affects actual data, such an upgrade in a sandbox can be hard to create manually. In this case, the MySQL Sandbox is the best choice.

To create the installation in the first place, you need to have a MySQL package without an installer (such as those that end with tar.gz for Linux), of the desired version and a copy of MySQL Sandbox, available for download from https://launchpad.net/mysql-sandbox. Create the sandbox from the MySQL package with a command such as the following:

$make_sandbox mysql-5.4.2-beta-linux-x86_64-glibc23.tar.gz

unpacking /mysql-5.4.2-beta-linux-x86_64-glibc23.tar.gz

...

The MySQL Sandbox, version 3.0.05

installing with the following parameters:

upper_directory = /users/ssmirnova/sandboxes

...

........ sandbox server started

Your sandbox server was installed in

$HOME/sandboxes/msb_5_4_2

Once installed, you should stop the server and change the configuration file so it corresponds to your production configuration, then restart it and load a backup of your production databases. Now you are ready to test safely. This method is very useful when you need to quickly check an application on several versions of MySQL, for example, to determine whether a bug is fixed.

You can have as many sandboxes as you want and test different aspects of MySQL and your databases without additional effort. You can even create a replication sandbox—a sandbox that contains a master server along with as many slave servers as you choose:

$make_replication_sandbox mysql-5.1.51-osx10.4-i686.tar.gz

installing and starting master

installing slave 1

installing slave 2

starting slave 1

... sandbox server started

starting slave 2

....... sandbox server started

initializing slave 1

initializing slave 2

replication directory installed in $HOME/sandboxes/rsandbox_5_1_51

$cd $HOME/sandboxes/rsandbox_5_1_51

$./m

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 4

Server version: 5.1.51-log MySQL Community Server (GPL)

This software comes with ABSOLUTELY NO WARRANTY. This is free software,

and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost] {msandbox} ((none)) > \q

Bye

$./s1

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 5

Server version: 5.1.51-log MySQL Community Server (GPL)

This software comes with ABSOLUTELY NO WARRANTY. This is free software,

and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

slave1 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 127.0.0.1

Master_User: rsandbox

Master_Port: 26366

Connect_Retry: 60

Master_Log_File: mysql-bin.000001

Read_Master_Log_Pos: 1690

Relay_Log_File: mysql_sandbox26367-relay-bin.000002

Relay_Log_Pos: 1835

Relay_Master_Log_File: mysql-bin.000001

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 1690

Relay_Log_Space: 2003

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

1 row in set (0.00 sec)

slave1 [localhost] {msandbox} ((none)) > \q

Bye

$./stop_all

executing "stop" on slave 1

executing "stop" on slave 2

executing "stop" on master

Once the sandbox is running, experiment with its options.

One gorgeous advantage of using the MySQL Sandbox for a single server is when you need to compare many environments. If you are working with only one version of software on one type of system, you can just load production data from a backup of MySQL onto your development machine. With replication, this would not work, because you will need at least two MySQL instances. And a replication sandbox can dramatically save time, even if you don’t care about versions or custom environments, because it takes only a couple of minutes to install and set up as many MySQL instances as you need.

NOTE

Tools from Workbench Utilities set can help to create a sandbox copy of your production database.

mysqldbcopy

Copies a database, either creating a new database on the same server under a different name or placing the database on a different server with the same name or a different one

mysqlreplicate

Configures and starts replication among two servers

mysqlserverclone

Starts a new instance of a running server

Errors and Logs

Another important troubleshooting technique sounds simple: read and analyze information from the server. This is a very important step. In Chapter 1, I discussed tools that can help you get and analyze information, along with examples. Here I want to add details I skipped before.

Error Information, Again

Error messages are key and should never be ignored. You can find information about errors in the MySQL Reference Manual at http://dev.mysql.com/doc/refman/5.5/en/error-handling.html. This page lists client and server error messages, but omits messages specific to a storage engine. Nor does it explain errors that come from the operating system. Strings of information describing operating system errors can be derived through the perror utility (see Retrieving Error Strings Through perror).

Another very important tool is the mysqld error logfile, which contains information about table corruption, server crashes, replication errors, and much more. Always have it turned on, and analyze it when you encounter a problem. A log from an application cannot always replace the MySQL server error log, because the latter can contain problems and details not visible to the application.

Crashes

I discussed crashes and a general troubleshooting sequence applicable to them in When the Server Does Not Answer. Start by using the techniques described in the previous section: look in the error logfile, and analyze its content. This works in most cases, but this section discusses what to do if the error log does not contain enough information to help you troubleshoot the crash.

The latest versions of the MySQL server can print a backtrace, even in release builds. Therefore, if the error logfile does not contain a backtrace, check whether the mysqld binary is stripped.^[18^] (On Unix-style systems, the file command will report whether an executable file is stripped.) If it is, replace the mysqld binary with the one that came with your MySQL distribution. If you built it yourself, compile a version with symbols for use in testing.

§ Check whether the mysqld binary contains symbols, e.g., is not stripped.

In some cases you may need to run a debug binary. This is a file named mysqld-debug, located in the bin directory under the MySQL installation root.

The debug binary contains assertions that can help to catch the problem at an earlier stage. In this case, you probably will get a better error message because the error will be caught when the server does something wrong, rather than when a memory leak occurs. Using the release binary, you don’t get an error message until the memory leak actually leads to a crash.

The price for using debug binary is a performance decrease.

If the error logfile does not have enough information about the crash to help you find the source of the problem, try the two methods that follow. In any case, always work from evidence, as I did when discussing the backtrace from the error logfile in When the Server Does Not Answer. Don’t just make intuitive guesses, because if you try to solve a problem using a wrong guess, you can introduce even more problems.

§ Always test. Any guess can be wrong.

Core file

Core files contain the memory image of a process and are created (if the operating system is configured to do so) when a process terminates abnormally. You can obtain a core file by starting the MySQL server with the core option, but first you should make sure the operating system allows the file to be created.

To debug using a core file, you need to be acquainted with the MySQL source code. The “MySQL Internals” page on MySQL Forge is good start. I also recommend the book Expert MySQL by Dr. Charles A. Bell (Apress). You can also find useful information in the books Understanding MySQL Internals by Sasha Pachev (O’Reilly) and MySQL 5.1 Plugin Development by Andrew Hutchings and Sergei Golubchik (Packt). At some point, of course, you have to dive into the MySQL source code itself.

I won’t describe how to deal with core files here in detail, because that would require a whole book about the MySQL source code, but I will show a small example.

To enable the creation of core files, start mysqld with the core option and adjust the operating system to allow core files of unlimited size to be created. Different operating systems use different tools to control the creation of core files. For example, Solaris uses coreadm, whereas on my Mac OS X Tiger box I have to edit /etc/hostconfig. On Windows, you should have debugging symbols for both mysqld and the operating system. On Unix-style systems, the simplest method is the ulimit -c command, which should be set to unlimited, but consult your OS manual to find out if you need a configuration change somewhere else too.

After the core file is created, you can access its content using a debugger. I use gdb here, but this is not required; use your favorite debugger.

$gdb ../libexec/mysqld var/log/main.bugXXXXX/mysqld.1/data/core.21965

The command line contains the name of the gdb command followed by the path to the mysqld executable file and the path to core file itself.

GNU gdb (GDB) 7.3.1

License GPLv3+: GNU GPL version 3 or later

<http://gnu.org/licenses/gpl.html7gt;

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "i686-pc-linux-gnu".

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/7gt;...

Reading symbols from /users/ssmirnova/build/mysql-5.1/libexec/mysqld...done.

[New LWP 21984]

[New LWP 21970]

[New LWP 21972]

[New LWP 21974]

[New LWP 21965]

[New LWP 21973]

[New LWP 21967]

[New LWP 21971]

[New LWP 21968]

[New LWP 21969]

warning: Can't read pathname for load map: Input/output error.

[Thread debugging using libthread_db enabled]

Core was generated by `/users/ssmirnova/build/mysql-5.1/libexec/mysqld

--defaults-group-suffix=.1 --de'.

Program terminated with signal 11, Segmentation fault.

#0 0x00832416 in __kernel_vsyscall ()

(gdb)

The backtrace is the first thing we need:

(gdb) bt

#0 0x00832416 in __kernel_vsyscall ()

#1 0x008ce023 in pthread_kill () from /lib/libpthread.so.0

#2 0x085aa6ad in my_write_core (sig=11) at stacktrace.c:310

#3 0x0824f412 in handle_segfault (sig=11) at mysqld.cc:2537

#4 7lt;signal handler called>

#5 0x084bce68 in mach_read_from_2 (b=0xfffffffe 7lt;Address 0xfffffffe out of

bounds>) at ../../storage/innobase/include/mach0data.ic:68

#6 0x084cfdd6 in rec_get_next_offs (rec=0x0, comp=1) at

../../storage/innobase/include/rem0rec.ic:278

#7 0x084e32c9 in row_search_for_mysql (buf=0xb281d7b0 "\371\001", mode=2,

prebuilt=0xb732de68, match_mode=1, direction=0) at row/row0sel.c:3727

#8 0x08476177 in ha_innobase::index_read (this=0xb281d660, buf=0xb281d7b0

"\371\001", key_ptr=0xb2822198 "", key_len=0, find_flag=HA_READ_KEY_EXACT) at

handler/ha_innodb.cc:4443

#9 0x0838f13c in handler::index_read_map (this=0xb281d660, buf=0xb281d7b0

"\371\001", key=0xb2822198 "", keypart_map=0, find_flag=HA_READ_KEY_EXACT) at

handler.h:1390

#10 0x082dd38f in join_read_always_key (tab=0xb28219e8) at sql_select.cc:11691

#11 0x082da39f in sub_select (join=0xb2822468, join_tab=0xb28219e8,

end_of_records=false) at sql_select.cc:11141

#12 0x082da79f in do_select (join=0xb2822468, fields=0xb2834954, table=0x0,

procedure=0x0) at sql_select.cc:10898

#13 0x082f1bef in JOIN::exec (this=0xb2822468) at sql_select.cc:2199

#14 0x082090db in subselect_single_select_engine::exec (this=0xb28358a0) at

item_subselect.cc:1958

From this output, you already have a bit of information. If you want to learn more about using core files, turn to man core, debugger documentation, the MySQL internals manual, the books I mentioned, and the source code.

General log file

Another way to catch what is going on is to use the two solutions I mentioned in Tools That Can Be Customized: the general logfile and the use of a proxy solution. As the concept is similar here, I’ll show how to catch errors with the general query log and let you deduce proxy solutions on your own if you decide to use one. I’ll use the example from When the Server Does Not Answer again, but in this case I’ll run it on my MacBook. The error log contains:

091002 16:49:48 - mysqld got signal 10 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

We will try our best to scrape up some info that will hopefully help diagnose

the problem, but since we have already crashed, something is definitely wrong

and this may fail.

key_buffer_size=8384512

read_buffer_size=131072

max_used_connections=1

max_connections=100

threads_connected=1

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225784

This build does not print backtrace information. If I’m in a situation where I can’t use the debug version of MySQL server, how can I know what is going on?

Here is the place where the general query log can help again. MySQL writes each query to this log before executing it. Therefore, we can find information about a crash in this log. First, set up logging:

mysql> SET GLOBAL general_log=1;

Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL log_output='table';

Query OK, 0 rows affected (0.00 sec)

Wait until the crash happens again, and then check the contents of the general log:

mysql> SELECT argument FROM mysql.general_log ORDER BY event_time

desc \G

*************************** 1. row ***************************

argument: Access denied for user 'MySQL_Instance_Manager'@'localhost'

(using password: YES)

*************************** 2. row ***************************

argument: select 1 from `t1` where `c0` <> (SELECT geometrycollectionfromwkb(`c3`)

FROM `t1`)

The second row in this output is the query that crashed the server.

§ Use the general query log if the error log does not contain enough information about the server crash.

The only situation in which this technique would not help is when the crash happens while the MySQL server is writing into the general query log, or even before it. You can try logging to a file instead of a table if this happens. Proxy and application-side solutions are not affected by thisissue.

^[18^]I have seen setups where the customers manually stripped the mysqld binary to achieve better performance, so I considered it important to include this in the book.

Information-Gathering Tools

Information directs all troubleshooting. It is very important to know what is happening in the server process. I have discussed ways to get this information throughout this book, but here I will add some missing details about the tools discussed.

Information Schema

INFORMATION_SCHEMA is a schema that provides information about database metadata. All SHOW queries are now mapped to SELECT statements from INFORMATION_SCHEMA tables. You can query INFORMATION_SCHEMA tables like any other table; this is their great advantage over other tools. The only problem is that INFORMATION_SCHEMA tables are not optimized to work fast, so queries on them can be slow, especially on tables that contain information about many objects.

I won’t describe each and every table here, because the MySQL Reference Manual contains a great deal of detail about their structure (see http://dev.mysql.com/doc/refman/5.6/en/information-schema.html). Instead, I’ll show a few queries to demonstrate the sort of useful information you can get from the INFORMATION_SCHEMA. You’ll still need the user manual for details. I put the link to the 5.6 MySQL Reference Manual here because I mention a few tables introduced in this version.

To get an idea of what can be done with INFORMATION_SCHEMA, let’s start by extracting an overview of how many tables in each storage engine are in current use. I’m excluding the mysql database from the list because all its tables always use the MyISAM storage engine.

mysql> SELECT count(*), engine FROM tables WHERE table_schema !=

'mysql' GROUP BY engine;

+----------+--------------------+

| count(*) | engine |

+----------+--------------------+

| 255 | InnoDB |

| 36 | MEMORY |

| 14 | MyISAM |

| 17 | PERFORMANCE_SCHEMA |

+----------+--------------------+

4 rows in set (4.64 sec)

This information can be useful if, for example, you want to choose a strategy for a daily backup.^[19^]

Another example is to get a list of the foreign keys that reference a particular table. This can be useful if you get error 150, Foreign key constraint is incorrectly formed, when accessing the parent table and have absolutely no idea which children it is linked to:

mysql> SELECT KU.CONSTRAINT_SCHEMA, KU.CONSTRAINT_NAME,

KU.TABLE_SCHEMA, KU.TABLE_NAME FROM TABLE_CONSTRAINTS AS TC JOIN

KEY_COLUMN_USAGE AS KU ON(TC.CONSTRAINT_NAME=KU.CONSTRAINT_NAME AND

TC.CONSTRAINT_SCHEMA=KU.CONSTRAINT_SCHEMA) WHERE CONSTRAINT_TYPE='FOREIGN KEY'

AND REFERENCED_TABLE_SCHEMA='collaborate2011' AND REFERENCED_TABLE_NAME='items'

and REFERENCED_COLUMN_NAME='id'\G

*************************** 1. row ***************************

CONSTRAINT_SCHEMA: collaborate2011

CONSTRAINT_NAME: community_bugs_ibfk_1

TABLE_SCHEMA: collaborate2011

TABLE_NAME: community_bugs

*************************** 2. row ***************************

CONSTRAINT_SCHEMA: collaborate2011

CONSTRAINT_NAME: customers_bugs_ibfk_1

TABLE_SCHEMA: collaborate2011

TABLE_NAME: customers_bugs

*************************** 3. row ***************************

CONSTRAINT_SCHEMA: collaborate2011

CONSTRAINT_NAME: items_links_ibfk_1

TABLE_SCHEMA: collaborate2011

TABLE_NAME: items_links

*************************** 4. row ***************************

CONSTRAINT_SCHEMA: collaborate2011

CONSTRAINT_NAME: mysql_issues_ibfk_1

TABLE_SCHEMA: collaborate2011

TABLE_NAME: mysql_issues

*************************** 5. row ***************************

CONSTRAINT_SCHEMA: collaborate2011

CONSTRAINT_NAME: oracle_srs_ibfk_1

TABLE_SCHEMA: collaborate2011

TABLE_NAME: oracle_srs

5 rows in set (9.58 sec)

In this output, you can see that five tables reference the table items as a parent. So if a query that runs on the items table fails with error 150, you can quickly find all its children and fix the data, causing the query to execute without that problem.

Now that you have an idea of what INFORMATION_SCHEMA tables are, we can switch to specifics.

InnoDB Information Schema Tables

We already discussed the INNODB_TRX, INNODB_LOCKS, and INNODB_LOCK_WAITS tables in INFORMATION_SCHEMA Tables in the context of concurrency troubleshooting. Here I’ll also give a quick overview of other tables.

INNODB_TRX provides a lot of detail about currently running transactions. You can use it even when locking and concurrency are not an issue, but in the context of locking problems, you can do such things as find transactions that run for long time (replace '00:30:00' with a relevant time for your situation):

SELECT TRX_ID, TRX_MYSQL_THREAD_ID FROM INNODB_TRX

WHERE TIMEDIFF(NOW(),TRX_STARTED) > '00:30:00';

You can find out which threads are waiting on locks:

SELECT TRX_ID, TRX_MYSQL_THREAD_ID, TRX_REQUESTED_LOCK_ID, TRX_WAIT_STARTED

FROM INNODB_TRX

WHERE TRX_STATE = 'LOCK WAIT';

or are waiting on a lock longer than a specific time:

SELECT TRX_ID, TRX_MYSQL_THREAD_ID, TRX_REQUESTED_LOCK_ID, TRX_WAIT_STARTED

FROM INNODB_TRX

WHERE TIMEDIFF(NOW(),TRX_WAIT_STARTED) > '00:30:00';

To get an overview of how large your transactions are, retrieve the number of rows locked (TRX_ROWS_LOCKED), the size of lock structures in memory (TRX_LOCK_MEMORY_BYTES), or the rows modified (TRX_ROWS_MODIFIED):

SELECT TRX_ID, TRX_MYSQL_THREAD_ID, TRX_ROWS_MODIFIED

FROM INNODB_TRX ORDER BY TRX_ROWS_MODIFIED DESC;

You can also check the transaction isolation level, whether foreign key checks are turned on, and other information.

WARNING

Note that transactions appear in INNODB_TRX only after they open an InnoDB table. Exceptions are transactions started with START TRANSACTION WITH CONSISTENT SNAPSHOT, which has the same effect as a START TRANSACTION query followed by a SELECT from every InnoDB table.

Tables whose names begin with INNODB_CMP show how well InnoDB uses compression. Thus, INNODB_CMP and INNODB_CMP_RESET contain status information about compressed tables, whereas INNODB_CMPMEM and INNODB_CMPMEM_RESET contain status information about compressed pages in the InnoDB buffer pool.

The only extra feature added by the _RESET versions of these calls is that they reset statistics in all INNODB_CMP tables to zero after being queried. Therefore, if you want repeatable statistics, query the _RESET tables, and if you want statistics since startup, query only INNODB_CMP andINNODB_CMPMEM.

Since version 5.6.2, tables beginning with INNODB_SYS and an INNODB_METRICS table also exist. The INNODB_SYS tables contain information about how InnoDB tables are stored in the internal dictionary and replace the InnoDB Table Monitor. A great explanation and some examples of their use can be found at the InnoDB Team blog. The INNODB_METRICS table contains all the data related to performance and resource usage counters in a single place. To get these statistics, you need to enable a module. It’s worth studying these counters because they can help you analyze what happens inside the InnoDB storage engine. Again, an explanation and examples are at the InnoDB Team blog.

InnoDB Monitors

We already discussed InnoDB Monitors in SHOW ENGINE INNODB STATUS and InnoDB Monitors. Here is a summary of that section and a few extra useful details.

To enable InnoDB monitors, create InnoDB tables named innodb_monitor, innodb_lock_monitor, innodb_table_monitor, and innodb_tablespace_monitor. These enable periodical writes to STDERR output from standard, lock, table, and tablespace monitors, respectively.

It doesn’t matter which structure you define for these tables or what database you add them to, so long as they use the InnoDB storage engine.^[20^]

The monitors are turned off on shutdown. To re-enable them on startup, you need to re-create the tables. Put DROP and CREATE statements into your init-file option if you want them created automatically.

The standard monitor contains something similar to the output that follows, which comes from version 5.5 of MySQL. I’ll break up the output with explanations.

mysql> SHOW ENGINE INNODB STATUS\G

*************************** 1. row ***************************

Type: InnoDB

Name:

Status:

=====================================

110910 14:56:10 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 7 seconds

-----------------

BACKGROUND THREAD

-----------------

As the last text in the preceding output shows, this output concerns work done by the main background thread.

srv_master_thread loops: 95 1_second, 89 sleeps, 7 10_second, 36 background, 36 flush

The numbers count activity from InnoDB startup. The five numbers in the preceding output show, respectively, the number of iterations of the “once per second” loop, calls to sleep by the “once per second” loop, iterations by the “once per 10 seconds” loop, iterations of the loop named “background_loop” that runs background operations when there is currently no user activity, and iterations of the loop bounced by the “flush_loop” label. All these loops are run by the master thread, which does purge and other background operations.

srv_master_thread log flush and writes: 116

This shows how many times the log was written and flushed.

----------

SEMAPHORES

----------

Here begins information about internal semaphores. We touched on these a bit in Chapter 2. High numbers here can show slow disk I/O or high InnoDB contention. In the latter case, you could try decreasing innodb_thread_concurrency to see whether it causes an improvement. Note that these numbers are taken since the most recent InnoDB startup, so information here about waits does not mean that there are actual waits. You need to query the Performance Schema or check the mutex status to identify whether waits are occurring at the moment.

OS WAIT ARRAY INFO: reservation count 519, signal count 476

This begins a section showing global wait array information. The first number is a count of cell reservations since the array was created, and the second shows how many times an object has been signaled.

Mutex spin waits 212, rounds 6360, OS waits 169

The preceding line shows the number of spin waits on mutex calls, the number of iterations of a spin loop, and the number of waits for OS system calls.

RW-shared spins 171, rounds 5130, OS waits 171

This line shows the number of spin waits on rw-latches that resulted during shared (read) locks, the number of iterations of a spin loop, and the number of waits for OS system calls.

RW-excl spins 55, rounds 5370, OS waits 151

This line shows the number of spin waits on rw-latches that resulted during exclusive (write) locks, the number of iterations of a spin loop, and the number of waits for OS system calls.

Spin rounds per wait: 30.00 mutex, 30.00 RW-shared, 97.64 RW-excl

This shows, for each mutex, the number of iterations of a spin loop per wait for OS system calls.

The following is an example of how values in this section change during the execution of an UPDATE query:

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 1197, signal count 1145

--Thread 6932 has waited at trx0rec.c line 1253 for 0.00 seconds the semaphore:

X-lock (wait_ex) on RW-latch at 03CD2028 created in file buf0buf.c line 898

a writer (thread id 6932) has reserved it in mode wait exclusive

number of readers 1, waiters flag 0, lock_word: ffffffff

Last time read locked in file buf0flu.c line 1292

Last time write locked in file ..\..\..\mysqlcom-pro-5.5.13\storage\innobase\trx\

trx0rec.c line 1253

Mutex spin waits 1163, rounds 33607, OS waits 659

RW-shared spins 248, rounds 7440, OS waits 248

RW-excl spins 47, rounds 8640, OS waits 280

Spin rounds per wait: 28.90 mutex, 30.00 RW-shared, 183.83 RW-excl

The preceding output was taken when the query started executing and tried to reserve a mutex.

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 1324, signal count 1246

--Thread 5680 has waited at buf0buf.c line 2766 for 0.00 seconds the semaphore:

Mutex at 038BE990 created file buf0buf.c line 1208, lock var 1

waiters flag 1

Mutex spin waits 1248, rounds 36397, OS waits 745

RW-shared spins 252, rounds 7560, OS waits 252

RW-excl spins 53, rounds 9750, OS waits 310

Spin rounds per wait: 29.16 mutex, 30.00 RW-shared, 183.96 RW-excl

This was taken a bit later when the mutex defined in the file buf0buf.c at line 2766 was created.

In the semaphores section, you should examine whether values become large and if many operations are waiting for mutexes for a long time.

------------

TRANSACTIONS

------------

We thoroughly discussed the transactions section in Transactions, so here I’ll only touch on a few things.

Trx id counter 4602

The preceding line is the number of the next transaction.

Purge done for trx's n:o < 4249 undo n:o < 0

This shows that all transactions with numbers less than 4249 were purged from the history list, which contains entries used to provide consistent reads for running transactions that accessed the same tables as the transactions in the list, but before their modification at commit time. The second number shows how many records with an undo number less than 4249 were purged from the history.

History list length 123

This is the length of the history list (undo log records for committed transactions that are not purged). If this value grows large, you can expect a performance decrease. There is no linear relation, because purge performance also depends on the total size of the transaction data this list keeps, so it’s difficult to give a precise example of a large value that will cause a performance decrease. A large value in this list can also mean you have long-running transactions that aren’t closed, because entries from here are removed only when no transaction refers to an entry.

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 4601, not started, OS thread id 33716224

MySQL thread id 6906, query id 123 localhost root

show engine innodb status

The preceding lines start a list of all currently running transactions. I described this in detail in Transactions, so I won’t repeat the explanation here.

--------

FILE I/O

--------

This starts a section about internal InnoDB threads that perform various I/O operations. You can use this to find out how many I/O operations InnoDB performs. The rates show how effective they are.

I/O thread 0 state: waiting for i/o request (insert buffer thread)

I/O thread 1 state: waiting for i/o request (log thread)

I/O thread 2 state: waiting for i/o request (read thread)

I/O thread 3 state: waiting for i/o request (read thread)

I/O thread 4 state: waiting for i/o request (read thread)

I/O thread 5 state: waiting for i/o request (read thread)

I/O thread 6 state: waiting for i/o request (write thread)

I/O thread 7 state: waiting for i/o request (write thread)

I/O thread 8 state: waiting for i/o request (write thread)

I/O thread 9 state: waiting for i/o request (write thread)

These show the current status of internal InnoDB threads. The thread name is in parentheses on each line.

Pending normal aio reads: 1 [1, 0, 0, 0] , aio writes: 9 [6, 0, 3, 0] ,

ibuf aio reads: 0, log i/o's: 0, sync i/o's: 1

Pending flushes (fsync) log: 0; buffer pool: 0

This is information about pending operations. aio is an abbreviation for asynchronous input-output.

7204 OS file reads, 10112 OS file writes, 711 OS fsyncs

These show total statistics since InnoDB startup.

21.71 reads/s, 16384 avg bytes/read, 78.13 writes/s, 3.00 fsyncs/s

These show total statistics since the most recent display.

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

As the name says, this starts a section about the insert buffer and adaptive hash statistics. Use this information to find out how effective they are.

Ibuf: size 1, free list len 0, seg size 2, 1724 merges

These are, respectively, the current size of the insert buffer index tree in pages, the length of the free list, the number of allocated pages in the file segment containing the insert buffer tree and header, and the number of pages that were merged.

merged operations:

insert 15, delete mark 1709, delete 0

This shows the number of operations merged for index pages, divided up by type.

discarded operations:

insert 0, delete mark 0, delete 0

This shows the number of operations discarded without merging because the tablespace or index was deleted.

Hash table size 195193, node heap has 1 buffer(s)

This shows the number of cells in the adaptive hash index table and the number of reserved buffer frames.

0.00 hash searches/s, 40.71 non-hash searches/s

This shows the number of successful adaptive hash index lookups and the number of searches down the B-tree when the adaptive hash index could not be used. These statistics are reset each time they are queried.

---

LOG

---

This starts a section of information about activity in the InnoDB log.

Log sequence number 2055193301

Log flushed up to 2055180837

Last checkpoint at 2054187263

These show the current log sequence number (LSN), the number up to which the LSN logfile was flushed, and the LSN of the most recent checkpoint. This information allows you to calculate the age of the checkpoint through the subtraction Log flushed up to - Last checkpoint at, or 993574 in this example. You need to make sure the checkpoint ages do not approach 77% of the value innodb_log_file_size * innodb_log_files_in_group, because at that ratio InnoDB considers the difference between the current log LSN and the LSN of the older page in the buffer pool to be too great and starts aggressive flushing. This can lead to a database freeze.

0 pending log writes, 0 pending chkp writes

357 log i/o's done, 1.29 log i/o's/second

These show the number of pending log writes, pending checkpoint writes, I/O operations since InnoDB started, and I/O operations per second since the most recent display.

----------------------

BUFFER POOL AND MEMORY

----------------------

This indicates the start of information about InnoDB buffer pool and memory usage. Use this to evaluate how effectively the InnoDB buffer pool is used.

Total memory allocated 49938432; in additional pool allocated 0

The preceding line shows the total amount of memory allocated and how much is allocated in the additional pool.

Dictionary memory allocated 23269

This shows the space in bytes occupied by the data dictionary table and index objects.

Buffer pool size 3008

Free buffers 0

This shows the size of the buffer pool in pages and the number of free buffers in it. Here you can see that the buffer is full, and it makes sense to increase it. In this case it was set to the default value on my machine, so I have room for an increase.

Database pages 3007

Old database pages 1090

Modified db pages 860

The InnoDB buffer pool stores objects in a list that uses the least recently used (LRU) algorithm with a midpoint insertion strategy. When a new block needs to be added, InnoDB puts it into the middle of the list. The least recently used block is removed from the list to free room for the new one. These statistics show the length of the current InnoDB buffer LRU queue, the length of the old LRU queue, and the number of pages that need to be flushed.

NOTE

The InnoDB midpoint insertion strategy actually manages two lists: a sublist of new (young) blocks that were accessed recently and a sublist of old blocks that were not accessed recently. Blocks from the old blocks sublist are candidates for eviction.

Pending reads 2

Pending writes: LRU 0, flush list 10, single page 0

The first line shows the number of pending read operations. The second shows the number of pages waiting to be flushed through the LRU algorithm, the number of pages waiting to be flushed in the BUF_FLUSH_LIST, and the number of pages waiting to be flushed in theBUF_FLUSH_SINGLE_PAGE list.^[21^]

Pages made young 3508, not young 0

16.71 youngs/s, 0.00 non-youngs/s

The first line shows the number of pages made young, followed by the number of those that were not made young, because they were first accessed recently. The second line shows rates per second since the most recent display of these values.

Pages read 7191, created 1871, written 9384

21.43 reads/s, 5.57 creates/s, 74.13 writes/s

The first line shows the number of read operations, the number of pages created in the pool but not yet read, and the number of write operations. The second line shows rates per second of these values.

No buffer pool page gets since the last printout

In one of my test outputs, I had not accessed the buffer pool since the most recent display. If I had, more information would be printed in the preceding output.

Buffer pool hit rate 937 / 1000, young-making rate 49 / 1000 not 0 / 1000

This line shows three ratios. The first is the ratio of the number of pages read to the number of buffer pool page gets. The second is the ratio of the number of pages made young to buffer pool page gets. The third is the ratio of the number of pages not made young to buffer pool page gets. All of these values are reset each time they are queried.

Pages read ahead 0.00/s, evicted without access 0.00/s

This is the read-ahead rate and the number of read-ahead pages evicted without access. The measurements are average per-second values since the most recent display.

LRU len: 3007, unzip_LRU len: 0

I/O sum[3937]:cur[1], unzip sum[0]:cur[0]

The first line shows the length of the LRU list and the unzip_LRU list. The latter is a subset of the common LRU list, holding a compressed file page and the corresponding uncompressed page frame. The second line shows the number of I/O operations and I/O for current intervals for both common LRU and unzip_LRU lists.

--------------

ROW OPERATIONS

--------------

The row operations section begins information about the main thread.

1 queries inside InnoDB, 0 queries in queue

1 read views open inside InnoDB

The first line shows how many queries are currently executing and how many are in the innodb_thread_concurrency queue. The second line shows the number of read views.

Main thread id 4192, state: flushing buffer pool pages

The preceding line shows the ID of the main thread and its state. I took this example on Windows. On Linux, it also prints the thread process number.

Number of rows inserted 0, updated 1759, deleted 0, read 1765

0.00 inserts/s, 5.86 updates/s, 0.00 deletes/s, 5.86 reads/s

The first line shows the number of rows inserted, updated, deleted, and read since InnoDB startup. The second line shows rates per second since the most recent display. Knowing which kind of queries you perform most often can help you set options for InnoDB effectively.

----------------------------

END OF INNODB MONITOR OUTPUT

============================

1 row in set (0.00 sec)

I discussed the InnoDB Lock Monitor in SHOW ENGINE INNODB STATUS and InnoDB Monitors in detail, so I won’t say any more about it here.

Two monitors are left to discuss: the InnoDB Tablespace Monitor and the InnoDB Table Monitor.

The InnoDB Table Monitor prints the contents of the internal InnoDB dictionary. You can use this monitor to see how InnoDB stores a table, for example, if you suspect it is corrupted. Sample output looks like:

===========================================

110911 15:27:40 INNODB TABLE MONITOR OUTPUT

===========================================

--------------------------------------

TABLE: name collaborate2011/customers_bugs, id 1110, flags 1, columns 5, indexes 3,

appr.rows 0

COLUMNS: iid: DATA_INT DATA_BINARY_TYPE len 4; bugid: DATA_INT

DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID:

DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7;

INDEX: name GEN_CLUST_INDEX, id 2960, fields 0/5, uniq 1, type 1

root page 3, appr.key vals 0, leaf pages 1, size pages 1

FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR iid bugid

INDEX: name iid, id 2961, fields 2/3, uniq 2, type 2

root page 4, appr.key vals 0, leaf pages 1, size pages 1

FIELDS: iid DB_ROW_ID

FOREIGN KEY CONSTRAINT collaborate2011/customers_bugs_ibfk_1:

collaborate2011/customers_bugs ( iid )

REFERENCES collaborate2011/items ( id )

--------------------------------------

TABLE: name collaborate2011/items, id 1106, flags 1, columns 9, indexes 1,

appr.rows 5137

COLUMNS: id: DATA_INT DATA_BINARY_TYPE DATA_NOT_NULL len 4; short_description:

DATA_VARMYSQL len 765; description: DATA_BLOB len 10; example: DATA_BLOB len

10; explanation: DATA_BLOB len 10; additional: DATA_BLOB len 10; DB_ROW_ID:

DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR:

DATA_SYS prtype 258 len 7;

INDEX: name PRIMARY, id 2951, fields 1/8, uniq 1, type 3

root page 3, appr.key vals 5137, leaf pages 513, size pages 545

FIELDS: id DB_TRX_ID DB_ROLL_PTR short_description description example

explanation additional

FOREIGN KEY CONSTRAINT collaborate2011/community_bugs_ibfk_1: collaborate2011/

community_bugs ( iid )

REFERENCES collaborate2011/items ( id )

FOREIGN KEY CONSTRAINT collaborate2011/customers_bugs_ibfk_1: collaborate2011/

customers_bugs ( iid )

REFERENCES collaborate2011/items ( id )

FOREIGN KEY CONSTRAINT collaborate2011/items_links_ibfk_1: collaborate2011/

items_links ( iid )

REFERENCES collaborate2011/items ( id )

FOREIGN KEY CONSTRAINT collaborate2011/mysql_issues_ibfk_1: collaborate2011/

mysql_issues ( iid )

REFERENCES collaborate2011/items ( id )

FOREIGN KEY CONSTRAINT collaborate2011/oracle_srs_ibfk_1: collaborate2011/

oracle_srs ( iid )

REFERENCES collaborate2011/items ( id )

This output shows information about the table from Example 1-1 and another from the same database. The output is reasonably self-explanatory and explained in detail in the MySQL Reference Manual, so I won’t describe the fields. I just wanted to put it here so you are acquainted with what it looks like.

The InnoDB Tablespace Monitor displays information about the file segments in the shared tablespace. This information helps you find problems with tablespaces, such as fragmentation or corruption. Note that if you use the innodb_file_per_table option, information about individual tablespaces is not displayed by this monitor. Sample output looks like the following:

================================================

110911 20:33:50 INNODB TABLESPACE MONITOR OUTPUT

================================================

FILE SPACE INFO: id 0

size 5760, free limit 5440, free extents 51

not full frag extents 5: used pages 290, full frag extents 3

first seg id not used 857

SEGMENT id 1 space 0; page 2; res 1568 used 1339; full ext 20

fragm pages 32; free extents 0; not full extents 4: pages 27

SEGMENT id 2 space 0; page 2; res 1 used 1; full ext 0

fragm pages 1; free extents 0; not full extents 0: pages 0

SEGMENT id 3 space 0; page 2; res 1 used 1; full ext 0

fragm pages 1; free extents 0; not full extents 0: pages 0

...

The meaning of this output is clearly explained in the “InnoDB Tablespace Monitor Output” section of the MySQL Reference Manual, so once again I won’t bother to repeat it.

Performance Schema

I already discussed how to use Performance Schema to investigate locking problems in PERFORMANCE_SCHEMA Tables, but it has many other performance-related uses. Here I’ll describe a set of tables whose names begin with SETUP_ and that let you control which events are monitored. Here are some examples of their contents:

mysql> SELECT * FROM setup_consumers LIMIT 2;

+----------------------+---------+

| NAME | ENABLED |

+----------------------+---------+

| events_waits_current | YES |

| events_waits_history | YES |

+----------------------+---------+

2 rows in set (0.00 sec)

mysql> SELECT * FROM setup_instruments LIMIT 2;

+---------------------------------------------+---------+-------+

| NAME | ENABLED | TIMED |

+---------------------------------------------+---------+-------+

| wait/synch/mutex/sql/PAGE::lock | YES | YES |

| wait/synch/mutex/sql/TC_LOG_MMAP::LOCK_sync | YES | YES |

+---------------------------------------------+---------+-------+

2 rows in set (0.43 sec)

mysql> SELECT * FROM setup_timers;

+------+------------+

| NAME | TIMER_NAME |

+------+------------+

| wait | CYCLE |

+------+------------+

1 row in set (0.00 sec)

System variables control how many events will be stored in history tables.

Tables whose names end with _INSTANCES document which objects are being instrumented. The type of the object is part of the name of each table.

mysql> SELECT * FROM FILE_INSTANCES WHERE FILE_NAME LIKE '%ITEMS%'

LIMIT 2\G

*************************** 1. row ***************************

FILE_NAME: /users/apple/Applications/mysql-trunk/data/collaborate2011/items_links.ibd

EVENT_NAME: wait/io/file/innodb/innodb_data_file

OPEN_COUNT: 1

*************************** 2. row ***************************

FILE_NAME: /users/apple/Applications/mysql-trunk/data/collaborate2011/items.ibd

EVENT_NAME: wait/io/file/innodb/innodb_data_file

OPEN_COUNT: 1

2 rows in set (0.08 sec)

mysql> SELECT * FROM RWLOCK_INSTANCES LIMIT 2\G

*************************** 1. row ***************************

NAME: wait/synch/rwlock/innodb/index_tree_rw_lock

OBJECT_INSTANCE_BEGIN: 503973272

WRITE_LOCKED_BY_THREAD_ID: NULL

READ_LOCKED_BY_COUNT: 0

*************************** 2. row ***************************

NAME: wait/synch/rwlock/innodb/index_tree_rw_lock

OBJECT_INSTANCE_BEGIN: 503813880

WRITE_LOCKED_BY_THREAD_ID: NULL

READ_LOCKED_BY_COUNT: 0

2 rows in set (0.08 sec)

mysql> SELECT * FROM MUTEX_INSTANCES LIMIT 2\G

*************************** 1. row ***************************

NAME: wait/synch/mutex/innodb/rw_lock_mutex

OBJECT_INSTANCE_BEGIN: 491583300

LOCKED_BY_THREAD_ID: NULL

*************************** 2. row ***************************

NAME: wait/synch/mutex/innodb/rw_lock_mutex

OBJECT_INSTANCE_BEGIN: 345609668

LOCKED_BY_THREAD_ID: NULL

2 rows in set (0.00 sec)

mysql> SELECT * FROM COND_INSTANCES LIMIT 2\G

*************************** 1. row ***************************

NAME: wait/synch/cond/innodb/commit_cond

OBJECT_INSTANCE_BEGIN: 10609120

*************************** 2. row ***************************

NAME: wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond

OBJECT_INSTANCE_BEGIN: 35283728

2 rows in set (0.00 sec)

Tables whose names begin with EVENT_WAITS_ store information about events:

mysql> SELECT COUNT(*), EVENT_NAME FROM EVENTS_WAITS_CURRENT GROUP

BY EVENT_NAME;

+----------+-----------------------------------------+

| count(*) | EVENT_NAME |

+----------+-----------------------------------------+

| 1 | wait/io/table/sql/handler |

| 6 | wait/synch/mutex/innodb/ios_mutex |

| 1 | wait/synch/mutex/innodb/kernel_mutex |

| 1 | wait/synch/mutex/innodb/log_sys_mutex |

| 1 | wait/synch/mutex/innodb/rw_lock_mutex |

| 1 | wait/synch/mutex/innodb/thr_local_mutex |

| 2 | wait/synch/mutex/sql/LOCK_thread_count |

+----------+-----------------------------------------+

7 rows in set (0.26 sec)

I used COUNT here because knowing how many events were executed can help you find how they contribute to your MySQL load.

Tables whose names end in _HISTORY store information about which events happened, and tables whose names end in _SUMMARY contain summaries of these events based on various parameters.

Now I’ll give examples of ways to use these tables. For instance, you can find out which instance is used for the most time or locked for the longest time. This can shed some light on aspects of performance that can be improved.

mysql> SELECT COUNT(*), (TIMER_END-TIMER_START) AS TIME,

EVENT_NAME FROM EVENTS_WAITS_HISTORY_LONG GROUP BY EVENT_NAME ORDER BY TIME

DESC;

+----------+---------+-----------------------------------------+

| count(*) | time | EVENT_NAME |

+----------+---------+-----------------------------------------+

| 9967 | 3289104 | wait/io/table/sql/handler |

| 10 | 2530080 | wait/synch/mutex/innodb/log_sys_mutex |

| 5 | 2439720 | wait/synch/mutex/innodb/kernel_mutex |

| 2 | 1481904 | wait/synch/mutex/mysys/THR_LOCK::mutex |

| 2 | 1102392 | wait/synch/rwlock/sql/MDL_lock::rwlock |

| 1 | 1036128 | wait/synch/rwlock/sql/LOCK_grant |

| 2 | 789144 | wait/synch/mutex/mysys/THR_LOCK_lock |

| 2 | 457824 | wait/synch/mutex/sql/LOCK_plugin |

| 5 | 415656 | wait/synch/mutex/sql/THD::LOCK_thd_data |

| 2 | 343368 | wait/synch/mutex/sql/MDL_map::mutex |

| 2 | 325296 | wait/synch/mutex/sql/LOCK_open |

+----------+---------+-----------------------------------------+

11 rows in set (0.26 sec)

SHOW [GLOBAL] STATUS

I already discussed status variables related to configuration options in Chapter 3. Here I’ll add some information about other status variables. Like server variables, status variables can be global or per-session. When a client connects, the session variables are set to zero. Global variables show the status since server startup or since the most recent FLUSH STATUS query.

When troubleshooting with status variables, don’t just look at isolated values, but instead follow them over time. One huge number in itself may not mean anything; perhaps you’ve just been lucky enough to experience years of uptime. If it’s large and growing rapidly, though, you might be seeing symptoms of a problem.

We recommend to our customers to take SHOW GLOBAL STATUS output at intervals of 5 to 10 minutes during critical loads, and to compare the values of variables at different times. This is the easiest way to find meaningful information.

The following list focuses in on particular types of status variables.

Com_* status variables

These contain the number of statements issued of various types. For instance, Com_select shows how many SELECT queries were run, and Com_begin shows how many transactions were started.

Use these variables to get an overview of your load. For example, if you have a large Com_select value with zero values for Com_insert, Com_update, and Com_delete, you can adjust configuration options to favor SELECT queries.

Handler_*, Select_*, and Sort_* variables

Handler_* variables show what happens in the table internally when a query is run. For example, Handler_delete shows how many rows were actually deleted. You can use this variable to watch the progress of a DELETE that is currently running on a large table.

Select_* variables show the numbers of the various kinds of joins that are used. Sort_* variables show information about sorts. These can help you find out how effective your queries are in terms of performance.

I discussed these variables in Queries That Modify Data, so I won’t spend more time on them here, but they warrant detailed study.

Innodb_* variables

As can be guessed from the name, these variables show the internal status of the InnoDB storage engine. Study them and how they are affected by various InnoDB options if you use this storage engine.

Performance_schema_* variables

The performance schema provides information about those objects for which “instrumentation points” were created in the MySQL server or the storage engine source code. These variables show how many instrumentations could not be loaded or created.

Ssl_* variables

These show statistics about SSL connections.

*open* and *create* variables

Variables that contain these keywords show how many objects of various kinds were opened or created.

The purposes of other status variables either can be deduced from their names or can be found in Chapter 3.

^[19^]MySQL supports different kinds of backups and ways to do them. When planning backups, you need to take into account their effect on tables, such as locking, which depends on the storage engine you use. I will touch on backups in Backups.

^[20^]innodb_monitor, innodb_lock_monitor, innodb_table_monitor, and innodb_tablespace_monitor are not supposed to be used as real tables, but instead provide a method to tell InnoDB to write debugging output into STDERR. Although you can use them as any other table, be prepared for their content to disappear after server restart.

^[21^]There are two flush types for this buffer. BUF_FLUSH_LIST flushes via the flush list of dirty blocks, whereas BUF_FLUSH_SINGLE_PAGE flushes single pages.

Localizing the Problem (Minimizing the Test Case)

I already showed the value of minimizing a test case in Single Server Example, where I reduced an incorrect SELECT to a CREATE TABLE that reproduced the wrong parameters. Here I will discuss the principle of a minimal test case in more general terms.

As an example, take the following huge query^[22^]:

SELECT

IF(TABLE1.FIELD1 = 'R' AND TABLE1.FIELD2 IS NOT NULL AND TABLE1.FIELD3 = '1' AND

TABLE2.FIELD4 = TABLE2.FIELD5 AND TABLE3.FIELD6 = TABLE4.FIELD6, TABLE3.FIELD7,

TABLE4.FIELD7) AS ALIAS1,

IF(TABLE1.FIELD1 = 'R' AND TABLE1.FIELD2 IS NOT NULL AND TABLE1.FIELD3 = '1' AND

TABLE2.FIELD4 = TABLE2.FIELD5 AND TABLE3.FIELD6 = TABLE4.FIELD6, TABLE3.FIELD8,

TABLE4.FIELD8) AS ALIAS2,

SUM(

IF (

(SELECT TABLE5.FIELD7 FROM TABLE4 ALIAS3, TABLE2 ALIAS4, TABLE4 ALIAS5 WHERE

TABLE5.FIELD5 = ALIAS4.FIELD4 AND ALIAS4.FIELD5 = ALIAS5.FIELD5 AND

ALIAS5.FIELD7 = FIELD9 AND TABLE5.FIELD6 = TABLE6.FIELD7 LIMIT 1 ) IS NULL, 0,

TABLE7.FIELD10/TABLE7.FIELD11)

) AS ALIAS11

FROM TABLE4 ,TABLE4 ALIAS6, TABLE8 , TABLE4 ALIAS7, TABLE9 , TABLE7 , TABLE2 ,

TABLE1 FORCE INDEX(FIELD12)

LEFT JOIN TABLE1 ALIAS8 ON TABLE1.FIELD13 = TABLE10.FIELD13

LEFT JOIN TABLE1 ALIAS9 ON ALIAS9.FIELD13 = TABLE10.FIELD2

LEFT JOIN TABLE4 ALIAS10 ON ALIAS10.DFIELD5 = TABLE3.FIELD5

WHERE TABLE1.FIELD14 > DATE_sub(now(), INTERVAL 16 DAY)

and TABLE1.FIELD1 IN ('P', 'R','D')

AND TABLE1.DFIELD5 = TABLE4.FIELD5

AND TABLE1.FIELD15 = TABLE8.FIELD16

AND TABLE6.FIELD7 = TABLE8.FIELD17

AND TABLE4.FIELD18 = TABLE9.FIELD18

AND TABLE1.FIELD19 = TABLE7.FIELD19

AND TABLE1.FIELD20 = TABLE2.FIELD21

AND TABLE4.FIELD6 = TABLE11.FIELD7

GROUP BY TABLE4.FIELD6, FIELD9;

We can isolate the problem into a smaller equivalent:

SELECT

IF(T1.F1 = 'R', A1.F2, T2.F2) AS A4,

IF(T1.F1 = 'R' , A1.F3, T2.F3) AS F3,

SUM( IF ( (SELECT A7.F2

FROM T2 A7, T4 A2, T2 A3

WHERE

A7.F4 = A2.F10

AND A3.F2 = A4

LIMIT 1 ) IS NULL, 0, T3.F5)) AS A6

FROM T2, T3, T1

JOIN T2 A1 ON T1.F9 = A1.F4

GROUP BY A4;

With this equivalent, it is very easy to test a problem. In the following section I describe how I create such small test cases, then return to use cases.

^[22^]This example is based on Community bug #33794.

General Steps to Take in Troubleshooting

Here are the steps I take to localize a problem. They aren’t perfectly linear, because there are different paths to take for different problems. But the sequence can serve as a checklist.

Try to identify the actual query that causes the problem.

I’ve discussed many ways to determine the query or, in the case of a concurrency issue, the set of queries that cause the problem collectively. Once you obtain the query and its environment, you’re halfway done. Either the problem is repeatable, or it has something to do with concurrency and you can investigate those issues.

Check to make sure the query’s syntax is correct.

The easiest way to do this is to run the query in the MySQL CLI. If the query is syntactically incorrect, it will return an error. In either case, once you figure out which part of the syntax is wrong, you have the source of the problem. Otherwise, continue to the next step.

Confirm that the problem is in the query.

If the problem is a wrong result or a performance issue, check to make sure the problem is repeatable in the MySQL CLI.

You may notice that I am branching out in various directions while listing steps to follow. These symptoms cannot always be resolved through a single path, because problems differ and the most effective methods to find the cause can vary.

If the query returns wrong data, try to rewrite it to get correct results, i.e., those that you expect to obtain.

If the problem is repeatable with a single query, it always can be rewritten to reflect your needs. This is the time when you can make your query shorter, reduce the data set, and test your changes. When modifying the query, do not make the typical mistake of forcing a single query to produce the final result. A one-query solution to get the necessary result set is not always faster than a sequence of multiple queries.

If a rewrite does not help, check the server options and try to determine whether they’re affecting results.

This method can be used together with the previous one. Experienced users know when to suspect the effect of a configuration option. When in doubt, try options after simplifying the query.

If you decide that the query is correct, go backward until you find a statement or action that corrupted the data on which the query is acting.

After you are sure the query is absolutely correct and that no configuration option is affecting its results, there’s no longer any sense in blaming the query. Check the data for corruption and find out what was originally responsible. This could be an application or external factors such as physical disk corruption or a modification of database files at the operating system level.

If the problem is not repeatable in the MySQL CLI, examine whether it could be a concurrency issue.

Use all the means from Chapter 2 to find out what is causing the problem. I start from SHOW PROCESSLIST and finish with application-level techniques.

If the problem causes a crash or a hang, check the error log first.

Although information about the latest crash is at the end of the log, use the full file and pay attention to older messages, too. They can contain information about table corruption or similar issues that occurred in the past, thus giving you a better idea about what is going on.

If the error logfile doesn’t give you a clue, try to find the last query before the crash.

As described earlier, you can do this using the general query log, application-level logging, or a proxy.

Use mysqld-debug to generate a core file from a failed server, then analyze it.

Connect a debugger to the running process if this is a repeatable hang not a crash.

Analyze and adjust configuration options.

Options can cause a hang or crash too. Analyze every possible option that can have an effect, and adjust them correspondingly.

Use operating system tools to find out which external process could affect mysqld.

After you try everything related to the MySQL server itself and are sure it works properly, check its environment.

This general action plan splits a large, unknown issue into smaller parts that can be analyzed and solved. The execution path need not always be followed in order, because sometimes you find the reason for a problem right away and can go right to the fix. But the order shown works for cases when the cause is hard to determine and you need guidance to find it. I skipped the actual fixes because earlier parts of the book discussed them in detail.

MINIMIZING THE TEST CASE IN ENVIRONMENTS WITH INCOMPLETE INFORMATION

Finding the smallest test case for a problem is a general strategy that you can use in many situations, not just SQL applications.

When I started working on this book, I found I had not installed the XML plug-in for my favorite editor. I opened the plug-in manager and found that many plug-ins were out of date.

And here I made a mistake: I downloaded and installed them all.

The exact reason this was a bad idea is that my favorite editor runs in the Java virtual machine, and I need an outdated version of it to run a program required for my daily tasks.

But since last year, the editor and its plug-ins switched to the latest version of Java. So the next time I started the editor, I got a lot of errors about the wrong versions of the plug-in, and finally the program hung.

I did not want to reinstall everything, because I did not want to lose my current settings. And I was sure the source of the problem was one of the plug-ins, not the editor itself. But having installed dozens of plug-ins, I’d have a hard time determining which one prevented the editor from starting.

So I opened the editor’s options directory and copied the content of the plug-ins subdirectory to another safe place. I was surprised to see that the editor still did not start.

Next, I determined which files were touched when I had opened the editor the last time, and moved them to another safe place too.

This time, the editor started and re-created its environment. This was good, but I wanted my preferences back.

So I started to add the option files back one by one, restarting the editor after each one, until I found the file that was corrupted. Fortunately, it did not contain the options I wanted to restore, so I just got rid of it and let editor the re-create it.

Now the plug-ins’ turn came. I again added them one by one into the directory until I started to get errors. After examining them all, I got my installation working again.

§ The principle of a minimal test case can be used even in environments where you have limited information.

Testing Methods

Creating a minimal test case can confirm the problem, but you can do even more with it. The minimal test case makes it easy to pinpoint the reason for the problem. When you look at a JOIN that connects several tables with complicated WHERE conditions, it’s hard to say what exactly is wrong. But reducing the query to a couple of tables narrows the possibilities dramatically. Sometimes you fix the problem along the way when minimizing the query.

§ Minimizing the test case usually reveals and fixes the problem.

In this section we’ll consider what to do if you create the test but can’t fix the issue on your own. There can be two reasons for this: a bug in the MySQL code or a misunderstanding of how some feature works. I’ll assume you already read the parts of the MySQL Reference Manual pertaining to the problem and found no answer. There are still some resources you can try.

Try the Query in a Newer Version

The reason for trying a newer version of MySQL is that, in case your problem was actually caused by a bug in the server, you may find that it’s already fixed. Installing and using a new version of MySQL might sound complicated. But if you followed my advice in Sandboxes and have a sandbox, it’s easy, especially if you use a minimal data set and combine the sandbox with your minimal test case.

Check for Known Bugs

If trying a newer version does not work, you can check the bug databases for known bugs. You could even consider searching for a bug report before you try the new version of MySQL. As you encounter more problems and become more experienced, you’ll get a sense of which to do first. One simple rule you can follow is this: if the problem is repeatable with a dataset you can easily create, try a new version of MySQL first; otherwise, search the bug databases first.

You can start the search from the community bug database. If you are an Oracle customer, you can use the internal bug database inside the support portal. If you find a case that seems to match your situation, this will tell you whether it was considered a bug and whether it was already fixed.

If it was a bug, either download and test the version where it was fixed or find a workaround until it is fixed. If there is a known workaround, you’ll find it as a public comment. Search engines are also useful for finding a workaround.

If the problem was not considered a bug, the bug report will point to the part of the MySQL Reference Manual describing proper usage. We don’t explain in detail why one or another feature is not a bug, because that is not the purpose of the MySQL bug database, but learning about the proper behavior of the MySQL software can help you find a solution.

If you don’t find a problem in the bug databases, use your favorite search engine to find mentions of similar problems. And if you still cannot find anything wrong in your logic and the problem is repeatable with the latest version of MySQL, it is time to report a bug.

Workarounds

If the previous sections did not help to solve your problem, you can try creating a workaround yourself. Rewrite your query to exclude the parts that cause the problem, and break the query down into smaller queries that execute correctly.

The following example, which is based on bug #47650, now fixed, illustrates the concept. First, we’ll look at a simple version of the incorrect behavior that triggered the bug report:

mysql> CREATE TABLE `t1` (

-> `id` BIGINT(20) NOT NULL AUTO_INCREMENT,

-> PRIMARY KEY (`id`)

-> ) ENGINE=MyISAM;

Query OK, 0 rows affected (0.04 sec)

mysql> CREATE TABLE `t2` (

-> `id` BIGINT(20) NOT NULL AUTO_INCREMENT,

-> `t1_id` BIGINT(20) DEFAULT NULL,

-> PRIMARY KEY (`id`)

-> ) ENGINE=MyISAM;

Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO `t1` VALUES

(1),(2),(3),(4),(5),(6),(7),(8);

Query OK, 8 rows affected (0.00 sec)

Records: 8 Duplicates: 0 Warnings: 0

mysql> INSERT INTO `t2` VALUES

(1,1),(2,1),(3,1),(4,2),(5,2),(6,2),(7,3),(8,3);

Query OK, 8 rows affected (0.01 sec)

Records: 8 Duplicates: 0 Warnings: 0

mysql> SELECT t1.id AS t1_id, COUNT(DISTINCT t2.id) AS cnt FROM t1

LEFT JOIN t2 ON t1.id = t2.t1_id

-> WHERE t1.id = 1 GROUP BY t1.id WITH ROLLUP LIMIT 100;

+-------+-----+

| t1_id | cnt |

+-------+-----+

| 1 | 8 |

| NULL | 8 |

+-------+-----+

2 rows in set (0.01 sec)

Why do we have eight rows where t1_id=1? Only three rows with t1_id = 1 were inserted:

mysql> INSERT INTO `t2` VALUES (1,1),(2,1),(3,1),

(4,2),(5,2),(6,2),(7,3),(8,3);

Query OK, 8 rows affected (0.01 sec)

Records: 8 Duplicates: 0 Warnings: 0

The problem can be clearly seen if we remove the GROUP BY clause:

mysql> SELECT t1.id AS t1_id, t2.id FROM t1 LEFT JOIN t2 ON t1.id =

t2.t1_id WHERE t1.id = 1;

+-------+------+

| t1_id | id |

+-------+------+

| 1 | 1 |

| 1 | 2 |

| 1 | 3 |

+-------+------+

3 rows in set (0.00 sec)

This listing shows that the data is correct and that GROUP BY causes the problem.

At first glance, the only way to solve the problem is to break the query up into several queries. But there may be another workaround that we can try, knowing something about the optimizer. The query execution plan can change if GROUP BY uses an index, so let’s try adding one:

mysql> ALTER TABLE t2 ADD INDEX(t1_id);

Query OK, 8 rows affected (0.05 sec)

Records: 8 Duplicates: 0 Warnings: 0

Now the problem is solved:

mysql> SELECT t1.id AS t1_id, COUNT(DISTINCT t2.id) AS cnt FROM t1 LEFT JOIN t2

ON t1.id = t2.t1_id WHERE t1.id = 1 GROUP BY t1.id WITH ROLLUP LIMIT 100;

+-------+-----+

| t1_id | cnt |

+-------+-----+

| 1 | 3 |

| NULL | 3 |

+-------+-----+

2 rows in set (0.02 sec)

So play around with the SQL to see whether you can avoid the bug. The example I showed doesn’t have any general application, but it shows that help can come from subtle changes. It also shows again the advantages of using a sandbox.

Special Testing Tools

When you test a problem that has more than one solution, such as a performance issue or the design of your SQL application, you need to test how each suits your needs. This section offers a quick overview of tools that can help in such testing.

Benchmarking Tools

Benchmarking tools test an application’s speed. MySQL benchmarking tools usually test MySQL installations, which is not the same as testing an application, but they can still be useful for testing a particular set of options. If a benchmarking tool allows you to use custom queries written specially for your application, you also can run tests on your own dataset.

The most popular benchmarks for MySQL are sysbench and mysqlslap. The following subsections describe them.

mysqlslap

mysqlslap is a load emulation client that comes with the MySQL distribution. It makes it easy to test concurrent loads on similar queries. Run it with an SQL script, either from a file or specified as an argument:

$ mysqlslap --socket=/tmp/mysql51.sock --user=root --delimiter=";" \

--create-schema=mstest --create="CREATE TABLE mstest(id INT NOT NULL \

AUTO_INCREMENT PRIMARY KEY, f1 VARCHAR(255)) ENGINE=InnoDB" --query="INSERT INTO \

mstest(f1) VALUES(MD5(RAND())); SELECT f1 FROM mstest;" --concurrency=10 \

--iterations=1000

Benchmark

Average number of seconds to run all queries: 0.039 seconds

Minimum number of seconds to run all queries: 0.025 seconds

Maximum number of seconds to run all queries: 0.173 seconds

Number of clients running queries: 10

Average number of queries per client: 2

WARNING

Note that if you specify the create option for mysqlslap, the schema specified in the create-schema option will be dropped. This will also happen if you use a mysqlslap older than 5.1.57 or 5.5.12, even if you don’t use the create option.

§ Use this tool only in a new, empty schema.

SysBench

This benchmark utility, available from Launchpad, measures the performance of a whole system, testing the CPU, file I/O, OS scheduler, POSIX threads performance, memory allocation, transfer speed, and database server performance. We are interested in the last feature in the list.

In early versions, you can test database server options using only the predefined online transaction processing (OLTP) test, which creates a table and runs a concurrency test on it:

$sysbench --test=./sysbench/tests/db/oltp.lua

--mysql-table-engine=innodb --oltp-table-size=1000000

--mysql-socket=/tmp/mysql60.sock --mysql-user=root prepare

sysbench 0.5: multi-threaded system evaluation benchmark

Creating table 'sbtest1'...

Inserting 1000000 records into 'sbtest1'

$sysbench --test=./sysbench/tests/db/oltp.lua

--mysql-table-engine=innodb --oltp-table-size=1000000

--mysql-socket=/tmp/mysql60.sock --mysql-user=root --num-threads=16

--max-requests=100000 run

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 16

Random number generator seed is 0 and will be ignored

Threads started!

OLTP test statistics:

queries performed:

write: 400044

other: 200022

total: 2000220

transactions: 100011 (50.84 per sec.)

deadlocks: 0 (0.00 per sec.)

read/write requests: 1800198 (915.16 per sec.)

other operations: 200022 (101.68 per sec.)

General statistics:

total time: 1967.0882s

total number of events: 100011

total time taken by event execution: 31304.4927s

response time:

min: 18.10ms

avg: 313.01ms

max: 13852.37ms

approx. 95 percentile: 595.43ms

Threads fairness:

events (avg/stddev): 6250.6875/22.15

execution time (avg/stddev): 1956.5308/0.65

The preceding output was taken using the latest version of MySQL, which allows you to point sysbench to a custom test, but this predefined test behaves similarly to the predefined test used in earlier versions.

Since version 0.5, you can write your own tests using the Lua programming language. The easiest way to start is to take oltp.lua as a template. The following simple script shows which functions you must define:

$cat sysbench.lua

function prepare()

local i

db_connect()

print("Creating table 'test'")

db_query("CREATE TABLE test(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

f1 VARCHAR(255))")

print("Inserting 1000 rows")

for i = 1, 1000 do

db_query("INSERT INTO test(f1) VALUES(1000*rand(1000))")

end

function cleanup()

print("Dropping table 'test'")

db_query("DROP TABLE test")

end

function event(thread_id)

db_query("SELECT * FROM test WHERE f1 = " .. sb_rand(1, 1000))

end

In real benchmarks, you can code more complicated scenarios.

The result of the preceding test looks similar to the default OLTP test:

$sysbench

--test=/Users/apple/Documents/web_project/MySQL/examples/sysbench.lua

--mysql-table-engine=innodb

--mysql-socket=/tmp/mysql60.sock --mysql-user=root --num-threads=16

--max-requests=100000 prepare

sysbench 0.5: multi-threaded system evaluation benchmark

Creating table 'test'

Inserting 1000 rows

$sysbench

--test=/Users/apple/Documents/web_project/MySQL/examples/sysbench.lua

--mysql-table-engine=innodb

--mysql-socket=/tmp/mysql60.sock --mysql-user=root --num-threads=16

--max-requests=100000 run

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 16

Random number generator seed is 0 and will be ignored

Threads started!

OLTP test statistics:

queries performed:

write: 0

other: 0

total: 100001

transactions: 0 (0.00 per sec.)

deadlocks: 0 (0.00 per sec.)

read/write requests: 100001 (37.08 per sec.)

other operations: 0 (0.00 per sec.)

General statistics:

total time: 2697.2491s

total number of events: 100001

total time taken by event execution: 43139.9169s

response time:

min: 7.54ms

avg: 431.39ms

max: 2304.82ms

approx. 95 percentile: 913.27ms

Threads fairness:

events (avg/stddev): 6250.0625/27.35

execution time (avg/stddev): 2696.2448/0.57

Gypsy

Gypsy, also available from Launchpad, is a tool written for load testing by MySQL Support Engineer Shane Bester. We actively use this tool when testing concurrent loads. It’s not a benchmarking tool, but an aid to finding locking issues or other problems with concurrency.

Gypsy is scriptable. Its syntax for query files is easy:

i|1|DROP TABLE IF EXISTS t1|

i|1|CREATE TABLE t1( id INT, f1 INT, PRIMARY KEY(id)) ENGINE=InnoDB|

i|1|SET GLOBAL SQL_MODE='strict_trans_tables'|

n|100|INSERT INTO t1 SET id = ?, f1 = 1 ON DUPLICATE KEY UPDATE f1 = f1 + 1|tinyint

Rows marked with i are part of the initial setup and are run only once. Rows marked with n denote queries that emulate the load. You can run Gypsy as follows:

$gypsy --host=127.0.0.1:3351 --user=root --password= --database=test

--queryfile=bug42644.query --threads=2 --duration=100

[INFO] 04:08:15 [0290] 2684407808 - 32-bit version of Gypsy

[INFO] 04:08:15 [0291] 2684407808 - sizeof(long long int) = 8

[INFO] 04:08:15 [0300] 2684407808 - using 1 hosts

[WARN] 04:08:15 [2950] 2684407808 - setting statement on line 1 to non-prepared by

default

[WARN] 04:08:15 [2950] 2684407808 - setting statement on line 2 to non-prepared by

default

[WARN] 04:08:15 [2950] 2684407808 - setting statement on line 3 to non-prepared by

default

[INFO] 04:08:15 [0362] 2684407808 - client library version: 5.0.92

[ALWAYS] 04:08:15 [0376] 2684407808 - server 00: '5.1.60-debug', host:

'127.0.0.1 via TCP/IP', SSL: 'NULL', protocol: 10, charset: latin1

[ALWAYS] 04:08:15 [0414] 2684407808 - thrd = 2

[INFO] 04:08:15 [0459] 2684407808 - read 4 valid queries from query file

[INFO] 04:08:15 [0556] 2684407808 - spawning data generation thread

[INFO] 04:08:15 [0693] 25182208 - creating new char data for the first time

[INFO] 04:08:15 [0711] 25182208 - refreshing char data

[INFO] 04:08:15 [0718] 25182208 - char data has been generated, char_increment=2

[INFO] 04:08:15 [0603] 2684407808 - now running for 100 seconds.

[INFO] 04:08:15 [0609] 2684407808 - running initialization queries

[INFO] 04:08:15 [1443] 25183232 - thread 0 connecting to host 0

[INFO] 04:08:15 [1456] 25183232 - thread 0 has 1 alive hosts connected

[WARN] 04:08:16 [2182] 25183232 - thread[00] didn't complete entire query

file. Might need longer --duration=

[INFO] 04:08:16 [0636] 2684407808 - about to create all 'populate' scripts from I_S

[INFO] 04:08:16 [0691] 2684407808 - spawning database stats thread

[ALWAYS] 04:08:16 [0708] 2684407808 - spawning 2 new thread(s)

[INFO] 04:08:16 [1443] 25184256 - thread 0 connecting to host 0

[INFO] 04:08:16 [0957] 25183232 - writing server status variables to

'report_18098_host_00.txt'

[INFO] 04:08:16 [1456] 25184256 - thread 0 has 1 alive hosts connected

[INFO] 04:08:16 [1443] 25188352 - thread 1 connecting to host 0

[INFO] 04:08:16 [1456] 25188352 - thread 1 has 1 alive hosts connected

[INFO] 04:08:17 [0736] 2684407808 - completed spawning new database worker threads

[INFO] 04:08:28 [0777] 2684407808 - 02 threads running, 0030487 successful

queries. 0000000 failed queries (2540.583333 QPS).

[INFO] 04:08:39 [0777] 2684407808 - 02 threads running, 0059212 successful

queries. 0000000 failed queries (2393.750000 QPS).

[INFO] 04:08:50 [0777] 2684407808 - 02 threads running, 0084904 successful

queries. 0000000 failed queries (2141.000000 QPS).

[INFO] 04:09:01 [0777] 2684407808 - 02 threads running, 0110477 successful

queries. 0000000 failed queries (2131.083333 QPS).

[INFO] 04:09:12 [0777] 2684407808 - 02 threads running, 0133212 successful

queries. 0000000 failed queries (1894.583333 QPS).

[INFO] 04:09:23 [0777] 2684407808 - 02 threads running, 0148816 successful

queries. 0000000 failed queries (1300.333333 QPS).

[INFO] 04:09:34 [0777] 2684407808 - 02 threads running, 0165359 successful

queries. 0000000 failed queries (1378.583333 QPS).

[INFO] 04:09:45 [0777] 2684407808 - 02 threads running, 0178743 successful

queries. 0000000 failed queries (1115.333333 QPS).

[ALWAYS] 04:09:56 [0792] 2684407808 - waiting for threads to finish

[INFO] 04:09:56 [0808] 2684407808 - running cleanup queries

[INFO] 04:09:56 [1443] 25188352 - thread 0 connecting to host 0

[INFO] 04:09:56 [1456] 25188352 - thread 0 has 1 alive hosts connected

[WARN] 04:09:56 [2182] 25188352 - thread[00] didn't complete entire query file.

Might need longer --duration=

[INFO] 04:09:56 [0835] 2684407808 - now about to tell stats thread to exit

[INFO] 04:09:56 [0842] 2684407808 - now about to tell data generation thread to exit

[ALWAYS] 04:09:56 [0884] 2684407808 - done!!!

[ALWAYS] 04:09:56 [0885] 2684407808 - press a key to continue!!

MySQL Test Framework

The MySQL Test Framework is also called MTR, which is an abbreviation of its main command, mysql-test-run. It is a test automation package used by MySQL developers, and is available in all full MySQL packages. In concept, it’s similar to unit testing. It involves test cases, each of which is a pair consisting of a test and a result file. Test files contain sets of MySQL queries along with special MTR commands, and result files contain the expected results.

Here’s an example of a test file that creates a table, inserts one row, and then selects it:

--source include/have_innodb.inc

CREATE TABLE t1(f1 int NOT NULL AUTO_INCREMENT PRIMARY KEY, f2 VARCHAR(255))

ENGINE=InnoDB;

INSERT INTO t1 (f2) VALUES('test');

SELECT f1, f2 FROM t1;

DROP TABLE t1;

Let’s suppose you’ve stored this in a file named book.test in MTR’s t subdirectory. If you are sure your version of MySQL works fine, you can automatically record a result file by issuing:

$./mtr --record book

Logging: ./mtr --record book

110915 3:58:38 [Warning] Setting lower_case_table_names=2 because file system

for /tmp/PrqusdwLQa/ is case insensitive

110915 3:58:39 [Note] Plugin 'FEDERATED' is disabled.

110915 3:58:39 [Note] Plugin 'ndbcluster' is disabled.

MySQL Version 5.1.60

Checking supported features...

- skipping ndbcluster

- SSL connections supported

- binaries are debug compiled

Collecting tests...

vardir: /users/apple/bzr/mysql-5.1/mysql-test/var

Checking leftover processes...

Removing old var directory...

Creating var directory '/users/apple/bzr/mysql-5.1/mysql-test/var'...

Installing system database...

Using server port 56970

==============================================================================

TEST RESULT TIME (ms)

------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009

main.book [ pass ] 124

------------------------------------------------------------

The servers were restarted 0 times

Spent 0.124 of 20 seconds executing testcases

Completed: All 1 tests were successful.

The contents of the result file are now:

$cat r/book.result

CREATE TABLE t1(f1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, f2 VARCHAR(255))

ENGINE=InnoDB;

INSERT INTO t1 (f2) VALUES('test');

SELECT f1, f2 FROM t1;

f1 f2

1 test

DROP TABLE t1;

When you run the same test without the --record option, MTR compares the actual result with the contents of the result file and fails if they are different. You can move the test and result files to another server to test whether execution has changed.

MTR allows you to create suites for separate products starting with version 2.0, which comes with MySQL server 5.1 and higher. So you can create your own testing suites like the following:

$./mtr --suite=./suite/book book

Logging: ./mtr --suite=book book

110915 4:05:29 [Warning] Setting lower_case_table_names=2 because file system

for /tmp/7npx97ZLbz/ is case insensitive

110915 4:05:29 [Note] Plugin 'FEDERATED' is disabled.

110915 4:05:29 [Note] Plugin 'ndbcluster' is disabled.

MySQL Version 5.1.60

Checking supported features...

- skipping ndbcluster

- SSL connections supported

- binaries are debug compiled

Collecting tests...

vardir: /users/apple/bzr/mysql-5.1/mysql-test/var

Checking leftover processes...

Removing old var directory...

Creating var directory '/users/apple/bzr/mysql-5.1/mysql-test/var'...

Installing system database...

Using server port 56998

==============================================================================

TEST RESULT TIME (ms)

------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009

book.book [ pass ] 72

------------------------------------------------------------

The servers were restarted 0 times

Spent 0.072 of 12 seconds executing testcases

Completed: All 1 tests were successful.

Outside of MySQL development, this tool can be useful to automatically check how installations with different options or MySQL versions run a custom query.

You can find more information in the the MTR user guide.

Maintenance Tools

The tools in this section are useful in a day-to-day DBA job, not just when trouble occurs. If you use them regularly, you can avoid most troublesome situations. That’s why I’m describing them here: one of my reasons for writing this book was to help you have trouble-free MySQL installations.

These tools have great user guides as well as reference manuals, so I’ll just introduce them without details.

Tools from the MySQL distribution

MySQL comes with set of tools that you can find in the bin directory of the MySQL installation. Become acquainted with them. They are described in the section of the MySQL Reference Manual about MySQL programs.

Percona Toolkit

This toolkit, available from Percona, was put together from the Aspersa and Maatkit distributions. It contains a lot of powerful instruments to control the server and its tables.

MySQL WB Utilities

Although they are part of the MySQL Workbench installation, these run from the command line. They are independent from MySQL Workbench and can be run on platforms that don’t run MySQL Workbench. The tools are written in the Python language. They work mostly with database structure, not with data, and can help automate processes such as replication setup and grant migration.

Monitoring tools

It is very important to monitor your system for statistics related to its day-to-day operation. Here I describe two tools that can help you monitor a MySQL installation:

MySQL Enterprise Manager

The MySQL team provides MySQL Enterprise Manager (MEM) for monitoring purposes.

MEM is commercial software available for Enterprise customers. It runs as a server on a dedicated machine, along with lightweight agents that should run on the same machine as the one where the monitored MySQL server is running. These agents collect information about both MySQL and the operating system and send it to MEM server. Through a web-based GUI, a DBA can get a graphical overview of what is going on across multiple servers. MEM also suggests improvements in server setup.

You can read about MEM in detail in the book MySQL High Availability by Charles Bell et al. (O’Reilly), or in the official documentation.

dim_STAT

dim_STAT is a tool for monitoring performance of Unix, Linux, and Solaris systems. It collects output received from operating system monitoring tools such as vmstat, and iostat, has its own plug-ins that collect different statistics, draws graphs, and much more. dim_STAT supports multiple hosts.

dim_STAT also has MySQL add-ons that monitor MySQL and InnoDB usage.

You can examine statistics and see graphs using a web-based GUI. dim_STAT also has a CLI solution that can generate a single graph in PNG format for a given database, collect ID, and time interval.

You can download dim_STAT from http://dimitrik.free.fr/.