Effects of Server Options - MySQL Troubleshooting (2012)

MySQL Troubleshooting (2012)

Chapter 3. Effects of Server Options

The MySQL server provides an enormous number of options that you can set in various ways: in the my.cnf configuration file, on the command line that starts the server, or by setting variables at runtime. Most MySQL variables can be set dynamically, and there is generally one variable corresponding to each configuration option.

Some of the options are global, some apply to specific storage engines, some, called session, apply to connection, and some apply to particular activities, such as replication. This chapter is not a general guide to MySQL server options, but covers the problems that some can create or the ways in which changing an option can help you troubleshoot MySQL.

NOTE

Before embarking on this chapter, we need to agree on certain terms.

I will use both option and variable to denote a server option. MySQL uses a separate syntax for options and variables: options are usually spelled with hyphens (variable-name), whereas the corresponding variables are spelled with underscores (variable_name). Usually the MySQL server supports both types of syntax for the command line and configuration file, but supports only thevariable_name syntax for variables. Therefore, we will use the latter syntax in this book whenever it is supported.

Variables can be split into different groups depending on their purpose: to point the server to directories, to limit hardware resources, to change how mysqld treats one or another situation, and so on. They can also be split into different groups depending on when they are allocated: at server start, when a connection thread is created, or when the server starts a particular operation.

Service Options

I use the term service options because this single word explains all the varieties of their functionality: pointing the server to directories and files, telling it whether a particular log should be turned on, and so on. These options usually don’t create problems. I found only two typical troubleshooting cases with them: when an option points to a wrong path and when a particular feature, when turned on, changes the behavior of the mysqld command. The latter case can be hard to diagnose because you simply cannot expect these changes.

When an option uses the wrong path, you’ll usually notice the problem at server startup. For example, if you point to the wrong datadir, mysqld will refuse to start and will print a message such as:

$./bin/mysqld --datadir=/wrong/path &

[1] 966

$110815 14:08:50 [ERROR] Can't find messagefile

'/users/ssmirnova/blade12/build/mysql-trunk-bugfixing/share/errmsg.sys'

110815 14:08:50 [Warning] Can't create test file /wrong/path/blade12.lower-test

110815 14:08:50 [Warning] Can't create test file /wrong/path/blade12.lower-test

./bin/mysqld: Can't change dir to '/wrong/path/' (Errcode: 2)

110815 14:08:50 [ERROR] Aborting

110815 14:08:50 [Note] Binlog end

110815 14:08:50 [Note]

[1]+ Exit 1 ./bin/mysqld --datadir=/wrong/path

But of course you don’t see this message on the command line if mysqld is started in a system startup file as a daemon. In that case, users usually notice the problem when their first connection attempt fails with an error like the following:

$./bin/mysql -uroot -S /tmp/mysql_ssmirnova.sock

ERROR 2002 (HY000): Can't connect to local MySQL server through socket

'/tmp/mysql_ssmirnova.sock' (2)

The error simply indicates that no server is running. In such cases, you need to examine the error logfile or, if there is no error logfile, check the operating system log for messages concerning mysqld. The MySQL error logfile would contain the same message as we saw in the earlier listing. An operating system message can vary but usually says that some automatic script, such as mysql.server from the MySQL installation, failed to start mysqld. You can also check that no MySQL server is running by looking at a process listing. Here is an example for Linux, showing thatmysqld doesn’t appear anywhere in the system’s processes:

$ps -ef | grep mysqld

10149 7076 6722 0 23:35 pts/0 00:00:00 grep mysqld

The mysqladmin utility has ping command that reports whether the MySQL server is alive or stopped:

$mysqladmin -h127.0.0.1 -P3306 ping

mysqladmin: connect to server at '127.0.0.1' failed

error: 'Can't connect to MySQL server on '127.0.0.1' (10061)'

Check that mysqld is running on 127.0.0.1 and that the port is 3306.

You can check this by doing 'telnet 127.0.0.1 3306'

A few options pointing to specific paths do not prevent the MySQL server from starting, but simply turn off a particular option. For example, let’s see what a corrupt InnoDB startup sequence might look like:

110815 14:14:45 [Note] Plugin 'FEDERATED' is disabled.

110815 14:14:45 [Note] Plugin 'ndbcluster' is disabled.

110815 14:14:45 [ERROR] InnoDB: syntax error in innodb_data_file_path

110815 14:14:45 [ERROR] Plugin 'InnoDB' init function returned error.

110815 14:14:45 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

110815 14:14:45 [Note] Event Scheduler: Loaded 0 events

110815 14:14:45 [Note] ./libexec/mysqld: ready for connections.

Version: '5.1.60-debug' socket: '/tmp/mysql_ssmirnova.sock' port: 33051

Source distribution

The server was successfully started, but the InnoDB engine was not loaded:

mysql> SHOW ENGINES\G

*************************** 1. row ***************************

Engine: ndbcluster

Support: NO

Comment: Clustered, fault-tolerant tables

Transactions: NULL

XA: NULL

Savepoints: NULL

*************************** 2. row ***************************

Engine: MRG_MYISAM

Support: YES

Comment: Collection of identical MyISAM tables

Transactions: NO

XA: NO

Savepoints: NO

*************************** 3. row ***************************

Engine: BLACKHOLE

Support: YES

Comment: /dev/null storage engine (anything you write to it disappears)

Transactions: NO

XA: NO

Savepoints: NO

*************************** 4. row ***************************

Engine: CSV

Support: YES

Comment: CSV storage engine

Transactions: NO

XA: NO

Savepoints: NO

*************************** 5. row ***************************

Engine: MEMORY

Support: YES

Comment: Hash based, stored in memory, useful for temporary tables

Transactions: NO

XA: NO

Savepoints: NO

*************************** 6. row ***************************

Engine: FEDERATED

Support: NO

Comment: Federated MySQL storage engine

Transactions: NULL

XA: NULL

Savepoints: NULL

*************************** 7. row ***************************

Engine: ARCHIVE

Support: YES

Comment: Archive storage engine

Transactions: NO

XA: NO

Savepoints: NO

*************************** 8. row ***************************

Engine: MyISAM

Support: DEFAULT

Comment: Default engine as of MySQL 3.23 with great performance

Transactions: NO

XA: NO

Savepoints: NO

8 rows in set (0.00 sec)

I turned the error logfile off so that we could see the error on the console, but in production, the error log would be the place to look for the error message. Therefore, if you find out that one of the features you need does not exist, check the error logfile first.

It is very important to understand how a desired feature affects the work a server does. For example, when InnoDB is not available, we still can create tables successfully if the SQL mode does not contain NO_ENGINE_SUBSTITUTION:

mysql> CREATE TABLE t1(f1 INT) ENGINE=InnoDB;

Query OK, 0 rows affected, 2 warnings (0.01 sec)

This example shows that it’s always important to check warnings. In this case, the table was created using the wrong storage engine because we had an error trying to start InnoDB:

mysql> SHOW WARNINGS;

+---------+------+--------------------------------------------+

| Level | Code | Message |

+---------+------+--------------------------------------------+

| Warning | 1286 | Unknown table engine 'InnoDB' |

| Warning | 1266 | Using storage engine MyISAM for table 't1' |

+---------+------+--------------------------------------------+

2 rows in set (0.00 sec)

If you don’t check the warnings, a user might start using this table and discover the problem only when your whole application is affected. As you know from the previous chapter, MyISAM and InnoDB use different locking methods, so an application written to use the advantages of InnoDB can run into huge issues if the tables are MyISAM instead. And I haven’t even talked about the absence of transactions!

§ Check whether a feature you rely on exists in the server instance if you experience problems with it.

The other major set of problems with service options concerns options that change the behavior of MySQL, although their main purpose is different. When setting a service option, you can expect it will provide one feature or another, but you may not expect it to affect your queries.

A trivial example is the effect of binary logging on creating stored functions. When enabled, you can expect it will store all events that modify data, but you might not be aware of its other side effects.

First, I will show how one can create a dummy stored function if the server does not use the binary log:

root> GRANT ALL ON test.* TO sveta@'%';

Query OK, 0 rows affected (0.01 sec)

Then, I connect as user sveta and run:

sveta> CREATE FUNCTION f1() RETURNS INT RETURN 1;

Query OK, 0 rows affected (0.02 sec)

Everything is fine. But things change when I start mysqld with the log_bin option:

$./libexec/mysqld --defaults-file=support-files/my-small.cnf \

--basedir=. --datadir=./data --socket=/tmp/mysql_ssmirnova.sock --port=33051 \

--log_error --log_bin &

[1] 3658

and try to recreate the same function:

sveta> DROP FUNCTION f1;

Query OK, 0 rows affected (0.00 sec)

sveta> CREATE FUNCTION f1() RETURNS INT RETURN 1;

ERROR 1418 (HY000): This function has none of DETERMINISTIC, NO SQL, or READS

SQL DATA in its declaration and binary logging is enabled (you *might* want to

use the less safe log_bin_trust_function_creators variable)

The error message clearly explains the issue. I wanted to show this example to demonstrate how an option can change server behavior, even if its main purpose is not to affect user queries. Usually when a user meets such a problem, the cause is not so clear and can be confusing.

Variables That Are Supposed to Change the Server Behavior

Another set of variables affects how the MySQL server handles user input.

I will show a trivial example that clearly shows the effect of setting such a variable. In this case, we will set SQL Mode to STRICT_TRANS_TABLES so that attempts to insert invalid data into transactional tables will be rejected instead of being smoothed over. However, we expect the server to fix the statements, if possible, for nontransactional tables instead of rejecting the statements:

mysql> SET @@sql_mode = 'strict_trans_tables';

Query OK, 0 rows affected (0.03 sec)

mysql> CREATE TABLE `myisam` (

-> `id` bigint(20) NOT NULL AUTO_INCREMENT,

-> `a` varchar(50) NOT NULL,

-> `b` varchar(50) NOT NULL,

-> PRIMARY KEY (`id`)

-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;

Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO `myisam` (id,a) VALUES (1,'a');

ERROR 1364 (HY000): Field 'b' doesn't have a default value

I deliberately issued an erroneous INSERT, omitting a value for the b column. I expect the server to insert an empty string for b. But even though this table uses the MyISAM storage engine, the insert fails with an error message.

The MySQL Reference Manual explains the behavior (see http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html):

STRICT_TRANS_TABLES: If a value could not be inserted as given into a transactional table, abort the statement. For a nontransactional table, abort the statement if the value occurs in a single-row statement or the first row of a multirow statement.

My INSERT statement was a single-row statement, so the server refused to correct it. But this doesn’t seem intuitive, does it?

§ Carefully check what an option does if you see behavior you don’t expect.

Options That Limit Hardware Resources

The options in this group set limits on various hardware resources. They can be used for two purposes: to tune performance and to limit use for certain operations. The latter options are useful when you want to limit traffic between clients and the server or prevent Denial of Service attacks. It’s better for particular users to get graceful errors because of lack of resources than for mysqld to die because it can’t handle all incoming requests.

Later in this chapter, I will describe the tactics one should follow when adjusting these options. Here I want to point to cases when such variables lead to different and perhaps unexpected behavior compared to setups where the variables were not set. As always, I will show by example.

In my day-to-day job, I see many users who are affected by ignoring the value of max_allowed_packet. This variable limits the number of bytes that can be set in a single packet between the server and the client. In this example, I lower the default 1MB value of max_allowed_packetjust to demonstrate its effect:

mysql> SELECT repeat('a',1025);

+------------------+

| repeat('a',1025) |

+------------------+

| NULL |

+------------------+

1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G

*************************** 1. row ***************************

Level: Warning

Code: 1301

Message: Result of repeat() was larger than max_allowed_packet (1024) - truncated

1 row in set (0.00 sec)

This time, it is clear from the warning message why the error happened. But sometimes it is not so clear:

$./my sql test <phpconf2009_1.sql

ERROR 1064 (42000) at line 33: You have an error in your SQL syntax; check the

manual that corresponds to your MySQL server version for the right syntax to use

near

'00000000000000000000000000000000000000000000000000000000000000000000000000000000'

at line 2

The exact message you get depends on the particular statement you sent to the server, usually when selecting data from very large table or accessing a BLOB field.

§ If you start getting syntax errors for valid queries, check whether their size exceeds max_allowed_packet.

Using the --no-defaults Option

It is hard to remember by heart what every variable does. Even I cannot do it, and I work with them every day. One solution is to go to the list of options and exclude each of them one by one. But that’s not easy. Version 5.1 of MySQL has 291 variables, my outdated 5.5 installation has 321, and no one can guarantee that this value will not grow in the future. If you use a custom storage engine, it can have its own options as well.

It is much easier to check how the MySQL server should work with no options specified, i.e., if it uses the defaults for all values. If you have a rough guess about how mysqld should work with no custom options set, you can start the server with the --no-defaults option and compare the results to those you get when custom options are set.

WARNING

Both the --no-defaults and --defaults-file options must be specified as the first options passed to mysqld. Otherwise, the server will not recognize them.

If the results are different, start adding options that you used before one by one, and test to see if the wrong behavior comes back. Once you find which variable causes the changes, you can refer to its documentation and adjust it accordingly.

Performance Options

These options generally don’t cause errors, but can have a dramatic effect on performance. Usually you run the server under various real-life loads while tuning them until you have a combination that works well for your particular environment.

However, there is one situation when such an option can cause an error, and removing it from the configuration file or reducing its value can make sense. This is when your server hits an out-of-resources error. The most common cases involve a lack of memory or file descriptors. If you find yourself in such a situation, use the --no-defaults method from the previous section to find the option that’s too big.

Haste Makes Waste

This popular English proverb has an equivalent in many other languages. Russians have one that can be translated literally as “sliding slowly, arriving further.” I think this wisdom is good when you tune the MySQL server too, at least when you are not 100% sure what you are doing.

§ Unless you are 100% sure what is wrong, add options one by one, and then test the configuration each time.

This means that if you think some set of options can make the MySQL server’s behavior better for your application, change one option first, then test, and then, if you are happy with the result, add another option, and so on, until you have checked every relevant option. This can appear slow, but if something goes wrong, you can safely roll back the most recent change and quickly return your server to a working stage.

This method is especially important when you adjust buffers or other options that limit hardware resources, but can be used for options that change server behavior as well. Even with a good knowledge of what variables are doing, it is much easier to find and fix a single mistake than find out what is wrong in dozens of options.

WARNING

When using this method, save the results of every test. For example, if you are working on improving performance, run benchmarks or measure query execution time before changing anything, and then repeat the same test after modifying each option.

The SET Statement

MySQL supports two kinds of variables: SESSION and GLOBAL. SESSION variables are set for the current connection only and do not affect others. GLOBAL variables apply to all connections created after you set such a variable. But setting a GLOBAL variable does not affect the current connection,[10] so you should either set both SESSION and GLOBAL variables or reconnect if you need to use a new value in the current connection.

You can set a session variable using the statement:

SET [SESSION] var_name=value

I put SESSION in square brackets because you can omit this keyword; the set command uses SESSION as the default.

To set a GLOBAL variable, use the statement:

SET GLOBAL var_name=value

When you test options, I suggest you try to use a SESSION variable whenever possible. After you are happy with the results, you can use a GLOBAL variable to change the running server’s configuration, and then change the configuration file so that this value will be applied after a restart.

NOTE

Using SET SESSION is also very helpful when you want to check the effect of an option on a particular query. In this case, you can set the variable before the query, test, and then return back to the default value using the statement SET [SESSION] variable_name=DEFAULT.

If an option is shared among threads, you can start by setting a GLOBAL variable, and then examine how the server behaves. After you are happy with the result, change the configuration file to include the new variable and its setting.

This method allows you to test changes without interrupting the application, because it puts your desired change into effect while delaying the server restart until a scheduled time.

There are a few options that cannot be set dynamically. In these cases, you have to restart the MySQL server, even if you just want to test their effects.


[10] There are few exceptions, such as SET GLOBAL general_log=1. Usually such exceptions do not have a SESSION equivalent.

How to Check Whether Changes Had an Effect

There are status variables that show the current server status. As opposed to configuration variables, these do not affect server behavior, but instead contain information about what is happening in the mysqld process. Status variables are read-only, i.e., it is the MySQL server that changes them, not the user. They show such things as how many queries were executed and of what kind, network traffic, how indexes are used (you can find an example in Queries That Modify Data), buffer usage, how many tables are open, how many temporary tables were created, and a lot of other useful information. I won’t describe each of them here, but will note which status variable to watch while covering the variables themselves later in this chapter, as I did in Queries That Modify Data.

In the context of altering variables, the status variables are useful mostly in Performance-Related Options. We will also discuss how to get information about what is going on in your MySQL configuration in Chapter 6.

Like other variables, status variables can apply to both individual sessions and all sessions (global). Session variables show the status for the current session, whereas global variables show the status since the server was started or since the last FLUSH STATUS command was executed.

NOTE

Some variables are not independent. For example, variables that control the query cache change nothing when query_cache_size is set to zero. When tuning such options, always think about the effect of the whole group, not just a single variable.

When you change a server option, it can affect changes to status variables. For example, if you change the table cache, you should look at the dynamics of the Open_tables and Opened_tables status variables. Opened_tables should not grow, whereas all Open_tables should be in the cache if the table cache is set properly.

WARNING

Sometimes a variable’s value that was just set can be discarded because the specified value was either too big or too small. If you suspect that your change had no effect, check whether it was really made by using the query SHOW [SESSION|GLOBAL] VARIABLES LIKE 'variable_name' or the query SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.[SESSION|GLOBAL]_VARIABLES WHERE VARIABLE_NAME='variable_name'.

Descriptions of Variables

Now that you are familiar with risk-free tuning methods for server variables, we are ready to discuss a few important ones. This is not a complete guide, but a starting point for further research.

You can read the rest of this section in any order: go through it from beginning to end, check the particular topics of most interest to you at the moment, or even skip it and use it as a reference when you encounter a problem. I don’t describe every option, but concentrate on those that I’ve found are often used improperly or require an expanded understanding.

NOTE

At first, I doubted whether I should devote a section of this book to individual variables because each one is fully described in other sources. But the MySQL Reference Manual does not describe them specifically from a troubleshooting approach, so I decided to offer this short overview.

Options That Affect Server and Client Behavior

This section discusses general server options as well as options related to replication, connections, and storage engines. In the latter set, I’ll cover only the MyISAM and InnoDB storage engines.

Server-related options

These affect all connections and statements.

Limits and max_* variables

You already saw how max_allowed_packet affects an application earlier in Options That Limit Hardware Resources. Other options restrict the size of result sets. Examples include group_concat_max_len, which limits the number of bytes that the GROUP_CONCAT function can return. Just compare:

mysql> SELECT @@group_concat_max_len;

+------------------------+

| @@group_concat_max_len |

+------------------------+

| 1024 |

+------------------------+

1 row in set (0.00 sec)

mysql> SELECT group_concat(table_name) FROM tables WHERE

table_schema='mysql'\G

*************************** 1. row ***************************

group_concat(table_name):

columns_priv,db,event,func,general_log,help_category,help_keyword,help_relation,

help_topic,host,ndb_binlog_index,plugin,proc,procs_priv,servers,slow_log,

tables_priv,time_zone,time_zone_leap_second,time_zone_name,time_zone_transition,

time_zone_transition_type,user

1 row in set (0.15 sec)

and:

mysql> SET group_concat_max_len=100; 1

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT group_concat(table_name) FROM tables WHERE

table_schema='mysql'\G

*************************** 1. row ***************************

group_concat(table_name): columns_priv,db,event,func,general_log,help_category,

help_keyword,help_relation,help_topic,host,ndb_

1 row in set, 1 warning (0.06 sec)

mysql> SHOW WARNINGS\G

*************************** 1. row ***************************

Level: Warning

Code: 1260

Message: Row 11 was cut by GROUP_CONCAT()

1 row in set (0.00 sec)

1

I decreased the value of group_concat_max_len to show an example that can fit in this book. In real life, this problem usually happens when the user runs this function on a large data set for which the default value is too small.

I won’t describe each variable of the max_* group. Just check their values if you find that mysqld is limiting either the size of the statement you’re sending or the results returned.

Permissions

Another possibility you should consider, if a statement fails, is whether your user has the permissions to issue it, or rights to a particular database or table. For example, the local_infile option can allow or disallow running the LOAD DATA LOCAL INFILE query. The MySQL server usually gives a clear error message to explain why one or another operation is not allowed.

SQL modes

The MySQL server defines SQL modes that can change how the server treats client input. You already saw how NO_ENGINE_SUBSTITUTION and STRICT_TRANS_TABLES can affect an application. Other modes can modify other behavior.

Here is another example that uses the ANSI_QUOTES mode. This mode tells the MySQL server to use quotes defined in the ANSI SQL standard instead of MySQL’s default. The problem I describe here happens not when this mode is in use, but when the user relies on the default empty mode in the hope that the server will reject ANSI quotes set by mistake.

mysql> SELECT @@sql_mode;

+------------+

| @@sql_mode |

+------------+

| |

+------------+

1 row in set (0.00 sec)

mysql> SELECT * FROM t1 WHERE "f1"=1;

Empty set, 1 warning (0.00 sec)

mysql> SET SQL_MODE='ansi_quotes';

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t1 WHERE "f1"=1;

+----+----------------------------------+

| f1 | f2 |

+----+----------------------------------+

| 1 | f9f760a2dc91dfaf1cbc95046b249a3b |

+----+----------------------------------+

1 row in set (0.21 sec)

In the first case where the default is in effect, the MySQL treats "f1" as a string and converts it to DOUBLE. This is a valid value in the default SQL mode, but is treated differently from what the user expects.

mysql> SHOW WARNINGS;

+---------+------+----------------------------------------+

| Level | Code | Message |

+---------+------+----------------------------------------+

| Warning | 1292 | Truncated incorrect DOUBLE value: 'f1' |

+---------+------+----------------------------------------+

1 row in set (0.09 sec)

When converted to a DOUBLE, the value "f1" is translated to 0, and “0” is not equal to “1”. In the ANSI_QUOTES mode, "f1" is treated as a field name, so the query works. This is very common issue that can be easily missed in complicated queries with multiple conditions.

§ When you meet “strange” query results, check the SQL mode and analyze whether it can be affecting your query.

I included several examples of SQL mode in this book to show how they can make different aspects of the server behave differently. I recommend that you study the list of SQL modes and what they do in the MySQL Reference Manual.

One detail worth mentioning is that, starting with version 5.1.38, the InnoDB Plugin has an innodb_strict_mode option, which, if set, turns on strict checks of data inserted into InnoDB tables. This behaves similarly to, but slightly differently from, the strict SQL modes. So if you are using the InnoDB Plugin, you need to check the description of this variable. The option is turned off by default.

Character sets and collations

Understanding these variables is critical for those who use MySQL to store data in non-English languages, which applies when the latin1 character set doesn’t suit your needs.

NOTE

The character set is a map that matches a character or symbol to a byte sequence that represents it. Collation is a sorting rule. A character set can have multiple collations.

This topic is huge, so I will not try to cover it here exhaustively, but will give you a few starting points.

Character set and collation support in MySQL is really good, but a lot of its aspects can be tuned, and therefore people often get confused by them. When you suspect a character-set-related error, I recommend you study the chapter about character sets and collations in the MySQL Reference Manual carefully. Usually you will find answers to your questions there.

The following example shows how changing just the collation for a table can affect your data:

mysql> SET NAMES latin1;

Query OK, 0 rows affected (0.13 sec)

mysql> CREATE TEMPORARY TABLE t1(f1 VARCHAR(255)) DEFAULT

CHARSET=latin1 COLLATE=latin1_german2_ci;

Query OK, 0 rows affected (0.23 sec)

mysql> INSERT INTO t1 VALUES('Sveta'), ('Andy');

Query OK, 2 rows affected (0.29 sec)

Records: 2 Duplicates: 0 Warnings: 0

mysql> CREATE TEMPORARY TABLE t2 AS SELECT 'Sveta' AS

f1;

Query OK, 1 row affected (0.21 sec)

Records: 1 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM t1 JOIN t2 USING(f1);

ERROR 1267 (HY000): Illegal mix of collations (latin1_german2_ci,IMPLICIT) and

(latin1_swedish_ci,IMPLICIT) for operation '='

Why did the query with a JOIN fail? We specified the latin1_german2_ci collation for the first table, whereas the second one used the default collation for the connection.

I’ll return to this example in a moment, but first I’ll show two queries that are extremely helpful for diagnosing such issues:

mysql> SHOW VARIABLES LIKE '%char%';

+--------------------------+----------------------------------------------+

| Variable_name | Value |

+--------------------------+----------------------------------------------+

| character_set_client | latin1 |

| character_set_connection | latin1 |

| character_set_database | utf8 |

| character_set_filesystem | binary |

| character_set_results | latin1 |

| character_set_server | utf8 |

| character_set_system | utf8 |

| character_sets_dir | /Users/apple/mysql-5.1/share/mysql/charsets/ |

+--------------------------+----------------------------------------------+

8 rows in set (0.09 sec)

mysql> SHOW VARIABLES LIKE '%coll%';

+----------------------+-------------------+

| Variable_name | Value |

+----------------------+-------------------+

| collation_connection | latin1_swedish_ci |

| collation_database | utf8_general_ci |

| collation_server | utf8_general_ci |

+----------------------+-------------------+

3 rows in set (0.01 sec)

Run these every time you suspect something is wrong with the character set or collation, then analyze the result and context of the query. The general safe rule is to have all character_set_* variables, collation_* variables, and create options the same for any tables and connections that work together. The easiest way to set client options is to use the SET NAMES statement. Of course, there can be cases when you need different character sets or collations, but you should understand their effects.

If we go back to our collation_connection example, the different collations lie behind why the JOIN query cannot be executed. We can confirm this if we change the value of the variable:

mysql> SET COLLATION_CONNECTION='latin1_german2_ci';

Query OK, 0 rows affected (0.00 sec)

mysql> DROP TABLE t2;

Query OK, 0 rows affected (0.04 sec)

mysql> CREATE TEMPORARY TABLE t2 AS SELECT 'Sveta' AS

f1;

Query OK, 1 row affected (0.01 sec)

Records: 1 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM t1 JOIN t2 USING(f1);

+-------+

| f1 |

+-------+

| Sveta |

+-------+

1 row in set (0.00 sec)

§ Check character set options and table definitions when you encounter issues while doing sorts or comparisons.

Operating system handling for lower_case* options

The lower_case_filesystem and lower_case_table_names options are similar to those that affect character sets. These variables determine how the operating system handles the case of database objects.

It is better not to touch these values, especially if your operating system is case-insensitive. Changing them can lead to unexpected behavior, as in the following example:

mysql> SELECT @@lower_case_table_names;

+--------------------------+

| @@lower_case_table_names |

+--------------------------+

| 0 |

+--------------------------+

1 row in set (0.10 sec)

mysql> CREATE TABLE Table1(F1 INT NOT NULL AUTO_INCREMENT PRIMARY

KEY) ENGINE=InnoDB;

Query OK, 0 rows affected (0.27 sec)

mysql> CREATE TABLE Table2(F1 INT, CONSTRAINT F1 FOREIGN KEY(F1)

REFERENCES Table1(F1)) ENGINE=InnoDB;

Query OK, 0 rows affected (0.04 sec)

mysql> \q

Bye

$mysqldump --socket=/tmp/mysql50.sock -uroot test Table1

Table2

-- MySQL dump 10.11

<skipped>

--

-- Table structure for table `Table1`

--

DROP TABLE IF EXISTS `Table1`;

SET @saved_cs_client = @@character_set_client;

SET character_set_client = utf8;

CREATE TABLE `Table1` (

`F1` int(11) NOT NULL auto_increment,

PRIMARY KEY (`F1`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

SET character_set_client = @saved_cs_client;

--

-- Dumping data for table `Table1`

--

LOCK TABLES `Table1` WRITE;

/*!40000 ALTER TABLE `Table1` DISABLE KEYS */;

/*!40000 ALTER TABLE `Table1` ENABLE KEYS */;

UNLOCK TABLES;

--

-- Table structure for table `Table2`

--

DROP TABLE IF EXISTS `Table2`;

SET @saved_cs_client = @@character_set_client;

SET character_set_client = utf8;

CREATE TABLE `Table2` (

`F1` int(11) default NULL,

KEY `F1` (`F1`),

CONSTRAINT `F1` FOREIGN KEY (`F1`) REFERENCES `table1` (`F1`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

SET character_set_client = @saved_cs_client;

%lt;skipped>

-- Dump completed on 2008-02-02 21:12:22

I have put the important part of the output in bold. A foreign key definition refers to a table named table1, but no table of that name exists. You could not restore this dump taken on a case-insensitive operating system (e.g., Windows) on a case-sensitive operating system (e.g., Linux).

It also is best to use consistent naming for tables in all statements and to not mix lowercase and uppercase in names on case-insensitive operating systems.

Init SQL

These options determine whether the server should execute some SQL statements automatically at various times:

init_file

Points to a file containing SQL statements that should be executed at server startup

init_connect

Contains an SQL string that should run when each client connects

init_slave

Contains an SQL string that should run when the server starts its SQL thread as a slave

There are two typical usage errors with these options.

The first problem is that it’s easy to forget about what you put in the options. Usually the options are used to set some defaults for connections. So if you get results different from those that are returned when the default options are in use, check to see whether you have set one or more of these variables.

You can also issue SHOW [GLOBAL] VARIABLES to find out which defaults your connection uses. If you connect using a programming API, check the variable values using the same API because other environments, such as the MySQL command-line client, might use a different configuration file and thus have different default values.

The content of the init_connect option is executed only if the connecting user does not have the SUPER privilege. This was done to let a user with the SUPER privilege connect even if init_connect contains errors. This is another common usage mistake, when a user connects as aSUPER user and expects the content of init_connect to be executed.

open_files_limit

This important option limits the number of file handles the MySQL server can open simultaneously. The higher this limit, the more table files and temporary tables you can have open, and therefore the greater the number of simultaneous connections you can handle. If this limit is too low for your environment, you will get errors when trying to connect, open a table, or execute a query that requires creating temporary tables.

Because the setting for this option reflects hardware limitations, we will discuss it further in the next chapter.

log_warnings

When turned on (nonzero), this option writes warnings into the server’s error logfile. These are not warnings that happen during SQL execution, but rather debugging messages that show what is going on inside the server.

If set to 2, this option tells the server to log connection errors. This is very important when you are troubleshooting situations where clients cannot connect or are losing their connections. The log cannot always identify the problem, but its warning message can often shed some light on what is going on. It is very important to have this option enabled on the master server when using replication because you can identify when a slave I/O thread loses its connection. This in turn can be a symptom of a network failure, which can lead to worse problems in the future.

When set to 1 (the default) on a slave, it prints its own diagnostic messages, such as its positions in the binary and relay logs and its replication status. Starting with version 5.1.38, you need to enable this option enabled in order for the slave to print information about statements that are not safe in statement-based replication. (Before 5.1.38, the slave printed such messages in any case.) Starting with 5.1.38, you can turn off this option (set it to zero) to get rid of the log if you are sure you don’t need the messages.

Replication options

These options determine the relationship between the master and slaves.

binlog-* and replicate-* filters

MySQL has the ability to filter objects subject to replication using the binlog-do-*, replicate-do-*, binlog-ignore-*, and replicate-ignore-* options. binlog-* options reduce the events that go into the binary logfile on the master, whereas replicate-* specify those that go into the relay logfile on the slave. Slaves also have replicate-wild-do-* and replicate-wild-ignore-*, which allow them to specify, through patterns, objects that should or should not be replicated.

The most common problems with these options are:

§ People forget they specified the options.

§ Filters specified in binlog-do-*, replicate-do-*, binlog-ignore-*, and replicate-ignore-* work if and only if USE dbname was called explicitly.

Symptoms of the problems are:

§ A particular query was not replicated.

§ An “unknown table xxx on query” error on the slave.

§ When you use Statement-Based Replication (SBR) and find that some queries were either not replicated or replicated by mistake, check whether you have set these options and issued USE dbname.

It’s always better to use the replicate-wild-* variants of these options, because these do not depend on calling USE.

Binary log formats

The binlog_format variable allows you to choose the format for replication: STATEMENT, ROW, or MIXED.

This is a dynamic variable that can be changed at the session level. If you don’t want to use the mode currently set as the default for a particular query, you can temporarily switch formats using SET binlog_format='row' or SET binlog_format='statement'.

binlog_direct_non_transactional_updates

This option specifies when updates to nontransactional tables should be written into the binary log.

By default, when transactions are used, MySQL writes updates to nontransactional tables into the transaction cache and flushes this cache to the binary log only after a commit. This is done so that the slave is more likely to end up with the same data as the master, even if updates on nontransactional tables depend on data in transactional tables and the master is updating the same tables in many concurrent threads simultaneously.

But this solution causes problems when another transaction causes changes based on data that was modified by an uncommitted parallel transaction in a nontransactional table. We discussed a similar issue in the example in Mixing Transactional and Nontransactional Tables. If you experience such a problem, you can turn this option on. Before doing this, be sure that data in nontransactional tables cannot be modified by any statement that uses a transactional table.

This is a dynamic variable that can be changed at the session level, so you can use it for particular statements. It works, and makes sense, only if statement-based replication is used.

log_bin_trust_function_creators

This option tells mysqld not to fire a warning when a user without SUPER privileges tries to create a nondeterministic function on the master. See the example of default behavior in Service Options.

binlog_cache_size and friends

This item covers the following options:

§ binlog_cache_size

§ binlog_stmt_cache_size

§ max_binlog_cache_size

§ max_binlog_stmt_cache_size

These are caches that hold transactional and nontransactional statements issued during a transaction before writing to the binary log. If max_binlog_cache_size is reached, the statement aborts with the error "Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage".

Check the Binlog_cache_use, Binlog_stmt_cache_use, Binlog_cache_disk_use, and Binlog_stmt_cache_disk_use status variables to find out how often the binlog cache was used and how often the size of the transaction exceeded Binlog_cache_use andBinlog_stmt_cache_use. When the size of the transaction exceeds the cache size, a temporary table is created to store the transaction cache.

slave_skip_errors

This option allows the slave SQL thread to run even when it encounters certain types of errors. For example, if the master is run with a forgiving SQL mode while the slave has a strict one, you can set slave_skip_errors to, say, 1366 (ERROR 1366 (HY000): Incorrect integer value) so the slave does not fail due to a data format inconsistency when one inserts a string into an integer field.

Using this option can lead to data inconsistencies between master and slave that are hard to diagnose, so if you meet such an issue, check whether the option was not set.

read_only

This option makes the slave server read-only. This means that only the slave SQL thread can update its data, and other connections can only read data. This is valuable to preserve data consistency on the slave. However, the option does not restrict a user with the SUPER privilege from changing tables. Additionally, all users are still allowed to create temporary tables.

Engine options

This section describes options specific to particular storage engines. I will discuss a few InnoDB and MyISAM options here. Options related to performance are covered in a subsequent section. In a troubleshooting situation, you should look through and acquaint yourself with all options of the storage engine you use.

InnoDB options

We will start with options of InnoDB storage engine.

innodb_autoinc_lock_mode

This option defines which locking mode InnoDB will use when inserting into auto-increment fields. There are three modes: traditional (which was used before version 5.1), consecutive (the default starting with 5.1), and interleaved. The safest is consecutive. The two others can be used for better performance, and traditional can also be used for backward compatibility.

I won’t describe the differences between these lock types here, because the MySQL Reference Manual contains detailed information about how each of them works. One thing you should keep in mind, though: if your application sets autoincrement values in a fashion that surprises you, check this mode and experiment with how different values affect autoincrementing. I actually don’t recommend switching from the safe consecutive mode to any other mode, but in a few environments this can be acceptable.

innodb_file_per_table

By default, InnoDB saves table and index data in a shared tablespace. Using this option, you can tell it to save the index and data for each table in separate files. The shared tablespace is still used for table definitions. This option takes effect only on tables created after it is set; tables created earlier continue to use the shared tablespace.

Using this variable is actually a good practice because it helps InnoDB tables operate more effectively. Besides letting you watch the actual space occupied by a table, it also lets you create partial backups using MySQL Enterprise Backup or even restore a table on a different MySQL installation using the method described in Chris Calender’s blog at http://www.chriscalender.com/?p=28.

innodb_table_locks

This variable defines how InnoDB handles table lock requests made by LOCK TABLES statements. By default (when it is set) it returns immediately and locks the table internally. When turned off (set to 0), it honors LOCK TABLE statements, so the thread does not return from LOCK TABLES … WRITE until all locks are released.

innodb_lock_wait_timeout

This is the number of seconds that InnoDB waits for a row lock before giving up. After innodb_lock_wait_timeout seconds, it returns the error "ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction" to the client. I frequently see situations where people set this variable to very large values to prevent their queries from failing as often, only to experience worse problems because many stalled transactions lock each other. Try to handle lock wait errors at the application level, and don’t set this value too high. The best value for this option depends on your application and should be about the amount of time your normal transaction takes. The default value of this variable, 50 seconds, can be too big for applications that need to return results almost immediately. This is true for the majority of web shops.

innodb_rollback_on_timeout

When a query is aborted due to a lock wait error, only the last statements rolled back, and the whole transaction is not aborted yet. You can change this behavior if you set this option to 1. In this case the transaction will be rolled back immediately after a lock wait timeout.

innodb_use_native_aio

This option, introduced in InnoDB Plugin 1.1, specifies whether InnoDB should use the native AIO interface on Linux or use its own implementation, called “simulated AIO.” If innodb_use_native_aio is set, InnoDB dispatches I/O requests to the kernel. This improves scalability because modern kernels can handle more parallel I/O requests than simulated AIO.

This option is on by default and should not be changed during normal operation. It can be turned off if you experience issues on operating systems with asynchronous I/O subsystems that prevent InnoDB from starting. A typical error message informing you to turn this option off is error while loading shared libraries: libaio.so.1: cannot open shared object file: No such file or directory.

innodb_locks_unsafe_for_binlog

This variable defines how InnoDB uses gap locking for searches and index scans. With the default value (set to 0), gap locking is enabled. If set to 1, the option disables gap locking for most operations. It works similar to the READ COMMITTED transaction isolation level, but is less tunable and should be avoided. Even if it allows you to handle problems with locking, it creates new problems when parallel transactions insert new rows into the gaps. So READ COMMITTED is recommended instead if you want to get that behavior. This variable cannot be set at session level, so it affects all transactions.

MyISAM options

We will discuss only two options here and return to the rest in the next section.

myisam_data_pointer_size

Sets the default pointer size used when creating MyISAM tables without specifying the MAX_ROWS parameter. The default value is 6, and the allowed range is 2 to 7. The larger the pointer, the more rows a table can have. The default value, 6, allows you to create tables that take up 256TB. If you get a "Table is full" error when using MyISAM tables, this means the pointer size is too small for your table data (see the sidebar ).

HOW BIG CAN THE TABLE BE?

You can use myisamchk -dvi to check exactly how big the table can become with a particular pointer size and how many rows it can store if the FIXED row format is used:

mysql> CREATE TABLE t1(f1 INT, f2 VARCHAR(255)) ENGINE=MyISAM;

Query OK, 0 rows affected (0.16 sec)

mysql> SET GLOBAL myisam_data_pointer_size=2;

Query OK, 0 rows affected (0.00 sec)

mysql> \q

Bye

C:\Program Files\MySQL\MySQL Server 5.5>.\bin\mysql -uroot test

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 3

Server version: 5.5.13-enterprise-commercial-advanced

MySQL Enterprise Server - Advanced Edition (Commercial)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE t2(f1 INT, f2 VARCHAR(255)) ENGINE=MyISAM;

Query OK, 0 rows affected (0.13 sec)

mysql> \q

Bye

C:\Program Files\MySQL\MySQL Server 5.5>.\bin\myisamchk.exe -dvi

"C:\ProgramData\MySQL\MySQL Server 5.5\Data\test\t1"

MyISAM file: C:\ProgramData\MySQL\MySQL Server 5.5\Data\test\t1

Record format: Packed

Character set: utf8_general_ci (33)

File-version: 1

Creation time: 2011-11-02 14:43:40

Status: checked,analyzed,optimized keys,sorted index pages

Data records: 0 Deleted blocks: 0

Datafile parts: 0 Deleted data: 0

Datafile pointer (bytes): 6 Keyfile pointer (bytes): 3

Datafile length: 0 Keyfile length: 1024

Max datafile length: 281474976710654 Max keyfile length: 17179868159

Recordlength: 774

table description:

Key Start Len Index Type Rec/key Root Blocksize

C:\Program Files\MySQL\MySQL Server 5.5>.\bin\myisamchk.exe -dvi

"C:\ProgramData\MySQL\MySQL Server 5.5\Data\test\t2"

MyISAM file: C:\ProgramData\MySQL\MySQL Server 5.5\Data\test\t2

Record format: Packed

Character set: utf8_general_ci (33)

File-version: 1

Creation time: 2011-11-02 14:44:35

Status: checked,analyzed,optimized keys,sorted index pages

Data records: 0 Deleted blocks: 0

Datafile parts: 0 Deleted data: 0

Datafile pointer (bytes): 2 Keyfile pointer (bytes): 3

Datafile length: 0 Keyfile length: 1024

Max datafile length: 65534 Max keyfile length: 17179868159

Recordlength: 774

table description:

Key Start Len Index Type Rec/key Root Blocksize

myisam_recover_options

This option tells the MySQL server to check, each time it opens a MyISAM table, whether the table is corrupted or was not closed properly. If the check fails, MySQL runs CHECK TABLE on it and, if needed, repairs it. Possible values are OFF, DEFAULT (not the default for this option, but denoting a recovery method without backup, forcing a quick check), BACKUP (creates a backup of the table data’s .MYD file), FORCE (instructs the server to run a recovery action, even if there is a risk of losing one or more rows from the .MYD file), and QUICK (tells the server not to run a recovery action if there are no deleted blocks in the table). You can use two or more options at the same time. The most popular value for this variable, if set, is BACKUP,FORCE because it fixes all errors and is safe because it creates a backup file. By default, this option is turned off.

Connection-related options

The most important of these options from a troubleshooting standpoint concern timeouts. I’ll also discuss some options related to security that commonly cause (or solve) problems.

Timeouts

You are already acquainted with innodb_lock_wait_timeout, which interrupts a query that is waiting for a row lock.

A similar option is lock_wait_timeout, which applies to metadata locks. This lock is set for all operations that acquire the metadata lock: DML (Data Manipulation Language statements, such as INSERT, UPDATE, and DELETE), DDL, LOCK TABLES, and so on. Its default value is 3153600 seconds, which is 1 year. So by default, MDL locks effectively never die. However, you can change this value to anything, starting from 1 second. It is a dynamic variable that can be changed at the session level.

There is also a set of timeout variables that are independent from queries you run and that limit waiting times for result sets, client data, or authorization packets. These are:

connect_timeout

The timeout used when the MySQL server and client exchange authorization packets. Starting with version 5.1, the default value is 10 seconds.

interactive_timeout

How much time to wait for activity from an interactive client before killing the connection, i.e., how long the server will wait to read the next command. The term “interactive client” is used for clients that directly run queries sent by a human. For instance, the MySQL command-line client, mysql, is interactive, whereas a web application is not by default. When writing an application, you should explicitly specify if you want it to be considered interactive.

wait_timeout

How much time to wait for activity from any client before killing the connection. If a client is interactive and the value of interactive_timeout is different from wait_timeout, interactive_timeout applies.

net_read_timeout

How much time to wait for an answer from a client writing to the MySQL server. For example, this timeout applies if the client is executing a large insert.

net_write_timeout

How much time to wait for an answer from a client that is reading from the server. For example, when a client sends a SELECT query and is reading its result, this timeout kills the connection if the client fails to read data that is waiting for this amount of time. If a client needs to do some job before processing results, check whether that job lasts longer than this timeout.

The symptom of hitting most of these limits is a "MySQL server has gone away" error or a "Lost connection to MySQL server during query" error. The exception is connect_timeout. If you hit this limit, you will get the error "Lost connection to MySQL server at 'reading authorization packet'". You could get a similar error when the slave I/O thread cannot connect to the master.

If you hit one of the limits described earlier, don’t just increase them blindly; search for the actual reason for the problem. If a timeout is caused by a flaky network, you should fix the network rather than increasing the timeout. Here is a course of action that can be used when you suspect a timeout problem: Temporarily increase the *timeout variable, and rerun the application. If the timeout happens less frequently under these conditions, you can confirm that a timeout is the problem, but search for the real cause of the error. This can be a long-running application, slow access to a huge table, or a flaky network.

Security-related options

These options control permissions and other aspects of MySQL server security.

skip-grant-tables

Another issue with client authorization kicks in when the client lacks the proper user permissions when connecting to the server. We discussed this problem a bit in Permission Issues. Here I just want to mention how to save yourself if you forget a password. You need to start the server with the skip-grant-tables option, edit the privilege tables in the mysql database manually, and then run the query FLUSH PRIVILEGES. After that, new privileges are in effect. Don’t forget to restart the server without the skip-grant-tables option. Otherwise, anybody can connect to your server after the restart. To do the operation safely, include the option skip_networking along with skip-grant-tables, so that only local connections are allowed during the time when there are no restrictions on access to the MySQL server.

safe-user-create

Does not allow the creation of users using the GRANT statement, unless the user adding the grant has the INSERT privilege into the mysql.user table.

secure_auth

Does not allow clients earlier than version 4.1 to connect to modern servers. Version 4.1 was chosen because a new security model was added to the connection protocol at that time.

secure_file_priv

Restricts the LOAD_FILE function and the LOAD DATA and SELECT … INTO OUTFILE statements to using only the specified directory.

Performance-Related Options

Here I’ll offer a short overview of options that affect performance. Again, I am not describing all of them, but just those that are used most frequently. In contrast with options from the previous section, these options don’t lead to different results.[11]

First I will discuss options that affect server behavior as a whole, then some engine-specific options.

Buffers and maximums

The first group of options controls the amount of memory used internally by the server and certain upper limits on memory usage.

join_buffer_size

This is the minimum size of the buffer allocated for joins that use plain index scans, and range scans, and joins that do not use indexes. The buffer is allocated for each full join between two tables. Thus, a query joining two tables allocates one buffer, a query joining three tables allocates two buffers, and so on. This can be used as a session variable and can be set for a particular join.

To find out whether you need to increase join_buffer_size, you can check the Select_scan status variable, which contains the number of joins that do a full scan of the first table, as well as Select_full_range_join, which contains the number of joins that use a range search. The values of these status variables will not change if you change the value of join_buffer_size, so you can use them only to find out if you need a large join_buffer_size, not to measure the effectiveness of a change to that value.

net_buffer_length

The size of a buffer that the server creates right after a client connects to hold the request and the result. This size is increased to max_allowed_packet when needed. You usually don’t need to change the default value (16384 bytes), but you should keep the value in mind when setting the max_connections option.

query_prealloc_size

This buffer is allocated for statement parsing and execution. The buffer is not freed between statements. It makes sense to increase this buffer if you run complex queries, so that mysqld does not have to spend time allocating memory during query execution. Increase it to the size in bytes of your largest query.

read_buffer_size

Each thread that does a sequential scan allocates this buffer for each table scan.

read_rnd_buffer_size

This controls the size of the buffer that holds read results between sorting them and sending them to the client. A large value can improve the performance of queries that include ORDER BY.

sort_buffer_size

Each thread that needs to do a sort allocates this buffer for it. To find out whether you need to increase the size of this buffer, check the Sort_merge_passes status variable. You can also check the values of Sort_range, Sort_rows, and Sort_scan to find out how many sorting operations you do.

These status variables show only the number of sort operations. To find out which size to use for the buffer, you need to check how many rows one or another query can sort and multiply it by the row size. Or simply try the different values until Sort_merge_passes stops growing.

WARNING

The sort_buffer_size buffer is allocated too often, so having a large global value can decrease performance rather than increase it. Therefore, it is better not to set this option as a global variable, but increase it when needed using SET SESSION.

sql_buffer_result

When this variable is set, the server buffers the results of each SELECT in temporary tables. This can help to release table locks earlier when a client requires a lot of time to retrieve results. After results are stored in the temporary table, the server can release the locks on the original table, making it accessible to other threads while the first client is still retrieving results.

To find out whether a query is spending a lot of time sending its result set, run SHOW PROCESSLIST and check the amount of time the query is in the state “Sending data.”

WARNING

A status of “Sending data” in SHOW PROCESSLIST output means a thread is reading and processing rows, then sending data to the client. As you see, this is more complicated than the words imply, and does not necessarily mean a query is stuck sending data.

thread_cache_size

The number of threads that should be cached for future use. When a client disconnects, usually its threads are destroyed. If this option is set to a positive value, that number of threads will be put in a cache upon disconnect. This option does not dramatically improve performance on systems with a good thread implementation, but still can be useful if an application uses hundreds of connections.

thread_stack

The stack size for each thread. If set too small, this value limits the complexity of SQL statements, the recursion depth of stored procedures, and other memory-consuming actions on the server. The default value (192KB on 32-bit systems and 256KB on 64-bit systems) works fine for most installations. Increase this variable if you start getting error messages like "Thread stack overrun".

tmp_table_size

The maximum size of the internal temporary table in memory. By default, the server sets this to the minimum of max_heap_table_size and tmp_table_size. Increase this variable if you have enough memory and the status variable Created_tmp_disk_tables is increasing. Having all results that require a temporary table in memory can improve performance a lot.

query_cache_size

The size of the buffer where the MySQL server stores queries and their results. Increasing the value can increase performance because after the query is inserted into the cache, subsequent executions of the same query will take results from the cache, eliminating the work of query parsing, optimization, and execution. But don’t set this variable too large, because when the query needs to be removed from the cache—i.e., when you have modified data in the table—the mutex contention can block parallel queries. This is especially true on multicore machines and highly concurrent environments when more than eight user sessions access the query cache concurrently. Reasonable values for this variable are less than 100 MB, although you can set it much larger if you aren’t frightened by a possible sudden slow down.

NOTE

Best practice can be to set query_cache_size to a reasonably small value and periodically defragment it using the query FLUSH QUERY CACHE, rather than increasing this value.

To determine whether the query cache is being used effectively, check the status variables Qcache_free_blocks, Qcache_free_memory , Qcache_hits, Qcache_inserts, Qcache_lowmem_prunes, Qcache_not_cached, Qcache_queries_in_cache, andQcache_total_blocks.

table_definition_cache

The number of table definitions that are stored in the cache. When you have a lot of tables, you can increase this value. Tune it if necessary to keep Opened_table_definitions smaller than or equal to Open_table_definitions since the most recent table flush (FLUSH TABLESquery).

table_open_cache

The number of table descriptors that are stored in the cache. Try to tune this option so that Opened_tables remains smaller than or equal to Open_tables.

Options that control the optimizer

These variables can be set at the session level, so you can experiment with how they affect particular queries.

optimizer_prune_level

If this variable is on, the optimizer prunes less effective plans discovered by intermediate searches. If the variable is off, the optimizer uses an exhaustive search. The default value is 1 (on). Change the option if you suspect the optimizer is not choosing the best plan for your query.

optimizer_search_depth

The maximum depth of the search performed by the optimizer. The larger this value, the more likely it is that the optimizer can find the best plan for a complex query. The price for raising the value is the time spent by the optimizer while searching for a plan. If set to 0, the server automatically picks a reasonable value. The default is 62 (the maximum value).

optimizer_switch

This variable controls various optimizer features. I will touch on them only slightly here. Intelligent use of this variable requires knowledge of how the optimizer works and a lot of experimentation.

index_merge

Enables or disables index merge optimization. This optimization retrieves rows from several merge scans and merges their results into one. This is shown as “Merge” in the Type column of EXPLAIN output.

index_merge_intersection

Enables or disables the index merge intersection access algorithm. This algorithm is used when a WHERE clause contains several range conditions that use a key and are combined with the AND keyword. An example is:

key_col1 < 10 AND key_col2 = 'foo'

Even though the key_col2 = 'foo' comparison involves a single value, the optimizer treats it as a range condition, as explained in the MySQL Reference Manual in the section “The Range Access Method for Single-Part Indexes” (http://dev.mysql.com/doc/refman/5.5/en/range-access-single-part.html).

index_merge_union

Enables or disables the index merge union access algorithm. This algorithm is used when a WHERE clause contains several range conditions that use a key and are combined with the OR keyword. An example is:

key_col1 = 'foo' OR (key_col2 = 'bar' AND key_col3 = 'baz')

index_merge_sort_union

Enables or disables the index merge sort union access algorithm. This algorithm is used when a WHERE clause contains several range conditions that use a key and are combined with the OR keyword, but where the index merge union access algorithm is not applicable. An example is:

(key_col1 > 10 OR key_col2 = 'bar') AND key_col3 = 'baz'

max_join_size

Prevents the optimizer from executing SELECT statements that it estimates will exceed certain limits (for instance, examining more than max_join_size rows). This option is useful while debugging when you want to find which queries do not use indexes.

max_length_for_sort_data

When doing ORDER BY optimization on conditions where indexes cannot be used, MySQL uses a filesort algorithm. There are two variations of this algorithm. The original algorithm reads all matched rows and stores pairs of keys and row pointers in a buffer whose size is limited tosort_buffer_size. After the values in the buffer are sorted, the algorithm reads rows from the table a second time, but in sorted order. The disadvantage of this algorithm is that rows must be read twice.

The modified approach reads the whole row into the buffer, then sorts the keys and retrieves rows from the buffer. The problem with this approach is that the result set is usually larger than sort_buffer_size, so disk I/O operations make the algorithm slower for large data sets. Themax_length_for_sort_data variable limits the size of the pairs for this algorithm, so the original algorithm is used if the total size of the extra columns in the pairs exceeds this limit.

High disk activity together with low CPU activity is a signal that you need to lower the value of this variable.

Check the “ORDER BY Optimization” part of the MySQL Reference Manual for further details.

max_seeks_for_key

Sets the threshold, in terms of the number of rows that a table scan must check, for when to use a key instead of a table scan. Setting this option to a small value, such as 100, can force the optimizer to prefer index lookups over table scans.

max_sort_length

Sets the number of initial bytes taken from a BLOB or TEXT value when doing a sort. Latter bytes are ignored.

Engine-related options

The variables in this section affect the performance of a specific storage engine. As elsewhere in this book, we consider only InnoDB and MyISAM options.

InnoDB options

We will start with options of InnoDB storage engine as before.

innodb_adaptive_hash_index

Disables or enables (default) InnoDB adaptive hash indexes. In most cases it is good to have this option on, but there are a few known exceptions when an adaptive hash index can decrease performance, for example, when the number of similar query results is huge and this index takes 30% or more of the buffer pool. This information is shown in the InnoDB monitor output. I will not describe all of them here, because considerations may change as InnoDB evolves, but I recommend you search the Web for actual test cases if you suffer from bad performance.

innodb_additional_mem_pool_size

This pool holds information about the data dictionary and internal data structures. In general, the more tables you have, the larger this option should be. But because InnoDB writes messages into the error log when this pool is too small, you should wait to see these messages before tweaking the value.

innodb_buffer_pool_size

The size of the memory that InnoDB allocates to store data, indexes, table structures, adaptive hash indexes, and so on. This is the most important option for InnoDB performance. You can set it to up to 80% of your physical RAM. Ideally, the buffer would be large enough to contain all your actively used InnoDB tables, along with extra space. Take other buffers into account, too, and find a good balance.

Status variables matching the pattern Innodb_buffer_pool_% show the current state of the InnoDB buffer pool.

innodb_buffer_pool_instances

This option sets the number of instances that the buffer pool should be split into. Each instance has its own free lists, flush lists, lists of objects stored using least recently used algorithms, and other data structures, and is protected by its own mutex. Setting this variable greater than 1 can improve concurrency on large systems. The size of each instance is innodb_buffer_pool_size divided by innodb_buffer_pool_instances, and should be at least 1GB. This option does not take effect if innodb_buffer_pool_size is less than 1GB.

innodb_buffer_pool_instances splits the buffer pool mutex, so if you have eight or more concurrent sessions that access the InnoDB buffer pool concurrently, set it at least to 4, then up to 16. The number depends on the value of innodb_buffer_pool_size and the RAM available on your box.

innodb_checksums

By default, InnoDB uses checksum validation on all pages read from disk. This lets it immediately identify whether a datafile was corrupted due to a broken disk or some other intervention. Usually you should keep this feature on, but in rare cases when you don’t care about data (for instance, a read-only slave that is not used as a backup), you can get a performance improvement by turning it off.

innodb_commit_concurrency

The number of threads that can commit at the same time. The default value is 0 (unlimited).

innodb_thread_concurrency

The number of threads that can run concurrently inside of InnoDB. Don’t mix this value up with the number of connection threads the MySQL server creates. The default value is 0: infinite concurrency or no concurrency checking.

Although more threads running in parallel generally means higher performance, you can experience mutex contentions if you run many concurrent user sessions in parallel. Usually you should not worry about this variable if you don’t have more than 16 concurrent user sessions. If you have more, you need to monitor for mutex locks by querying the Performance Schema or running a SHOW ENGINE INNODB MUTEX query.

If mutex contentions appear, try to limit this variable to 16 or 32. Alternatively, place the mysqld process into a task set on Linux or a processor set on Solaris, and limit it to fewer cores than the whole box has. This is the best course of action on a system with more than eight cores. Alternatively, you can use the Thread Pool Plugin (see sidebar).

THE THREAD POOL PLUGIN

Since version 5.5.16, commercial distributions of MySQL include the Thread Pool Plugin.

By default, the MySQL server creates a new thread for each user connection. If a lot of user connections are created, many threads are running in parallel and context switching overhead becomes high. This can lead to resource contention. For example, for InnoDB this increases the time needed for holding mutexes.

The Thread Pool Plugin provides an alternative way to handle threads. It places all connection threads in groups, the number of which is limited by the variable thread_pool_size, and makes sure only one thread per group is executed at any time.[12] This model reduces overhead and greatly improves performance.

You will find more details about the Thread Pool Plugin in the MySQL Reference Manual.

innodb_concurrency_tickets

When a thread is permitted to enter InnoDB, it receives this number of concurrency tickets, which permit it to leave and re-enter InnoDB until it uses up these tickets. The default is 500. After using up its tickets, a thread is placed into a queue of threads waiting to receive a new group of tickets.

innodb_doublewrite

By default, InnoDB stores data twice: first to the doublewrite buffer, and then to datafiles. Like innodb_checksums, this safety option can be turned off to get increased performance on installations where data safety is not the first priority.

WARNING

When set, the variable innodb_doublewrite prevents InnoDB data corruption. Therefore, do not switch it off until absolutely necessary.

The Innodb_dblwr_writes and Innodb_dblwr_pages_written status variables show the number of doublewrite operations and the number of pages written, respectively.

innodb_flush_log_at_trx_commit

Defines when changes are written to the logfile and flushed to disk. If set to 1 (the default), changes are written and flushed at each transaction commit. For better performance, you can change this value to 0 (write to log and flush once per second, and do nothing on transaction commit) or 2 (write to file at each commit, but flush once per second). Note that only option 1 is ACID-compliant.

The Innodb_os_log_fsyncs status variable stores the number of fsync() operations done to the logfile. Innodb_os_log_pending_fsyncs contains the number of pending fsync() writes. Innodb_log_writes and Innodb_os_log_pending_writes contain the number of writes and pending writes, respectively.

innodb_flush_method

By default, fdatasync() is used to flush datafiles and fsync() is used to flush logfiles to disk. This value can be changed to one of the following:

O_DSYNC

The operating system uses O_SYNC to open and flush the logfiles, and fsync() to flush the datafiles.

O_DIRECT

The operating system uses O_DIRECT to open the datafiles and fsync() to flush them.

Changing the value of innodb_flush_method can either improve or slow down performance, so test it carefully in your environment.

innodb_io_capacity

An upper limit to the I/O activity performed by the background InnoDB task. The default value of 200 is a good choice for most modern systems, but it can be tuned based on the number of I/O operations the system can perform simultaneously. Increasing this value makes sense on fast storage.

innodb_log_buffer_size

The size of the buffer that InnoDB uses to write to the logfiles on disk. When the buffer is full, operations should wait for it to be flushed before continuing. Increasing this variable can save disk I/O operations, but this makes sense only if you have big transactions.

The Innodb_log_waits status variable contains the number of times this buffer was too small for the number of necessary I/O operations.

innodb_log_file_size

The size of each logfile. Large logfiles reduce checkpoint activity and save disk I/O. However, large logfiles can drastically slow recovery after a crash.[13] Sensible values range from 1 MB up to, but less than, innodb_buffer_pool_size/log_files_in_group. The combined size of all logfiles must be less than 4GB.

Best practice is to store InnoDB logfiles, datafiles, and, if used, binary logfiles on different disks, so if one of these devices die, you will not lose all of them at once.

innodb_open_files

This variable is meaningful only when you use innodb_file_per_table. innodb_open_files is the number of .ibd files that InnoDB can open at the same time. The default value is 300. It makes sense to increase it to the total number of InnoDB tables.

innodb_read_io_threads

The number of I/O threads available for InnoDB read operations. These operations handle read-ahead: I/O requests that asynchronously prefetch a group of pages into the InnoDB buffer pool, then purge and insert buffer operations. The default value is 4.

innodb_write_io_threads

The number of I/O threads available for InnoDB to write dirty pages from the buffer. The default is 4.

innodb_stats_method

How the server treats NULLs when collecting statistics on index values. This affects the cardinality of the index, and therefore the query plans created by the optimizer.

innodb_stats_on_metadata

When this variable is enabled (default), InnoDB updates its statistics at each metadata statement, such as SHOW TABLE STATUS or SHOW INDEX, or when any connection issues a query on the INFORMATION_SCHEMA tables TABLES or STATISTICS, which select information about an InnoDB table. If this variable is enabled, these queries have the same effect on table statistics as if you had run ANALYZE TABLE after each query. You can disable this variable if the server calls such statements frequently or selects databases with a large number of tables. But when the variable is disabled, table statistics can become out of date.

innodb_stats_sample_pages

The number of sampled index pages used by the MySQL Optimizer to calculate index distribution statistics, such as when ANALYZE TABLE is called. Increase this variable (the default is 8) if you suspect that the cardinality is being calculated improperly. But note that increasing this variable can increase the time needed to open a table if innodb_stats_on_metadata is enabled.

MyISAM options

In this section, we will discuss options which can affect performance of MyISAM storage engine.

myisam_max_sort_file_size

The maximum temporary file size that MyISAM can use when it re-creates a MyISAM index. The default value is 2GB. If this value is exceeded, MySQL will use a key cache, which can slow down index creation. The temporary file is a disk file, so it’s limited only by disk space.

myisam_use_mmap

When this variable is set, the server uses memory mapping when reading and writing MyISAM tables. The default behavior is using system calls for these operations. Although myisam_use_mmap usually improves performance a lot, there are couple of known bugs, so test your application after setting this variable.

myisam_mmap_size

The maximum amount of memory that can be used for memory mapping of compressed MyISAM files. The defaults are large: 4294967295 on 32-bit systems and 18446744073709547520 on 64-bit systems. You can decrease this value to avoid swapping if you use many compressed MyISAM tables.

myisam_sort_buffer_size

The size of the buffer allocated when sorting or creating MyISAM indexes during REPAIR TABLE, CREATE INDEX, or ALTER TABLE operations.

myisam_stats_method

How the server treats NULLs when collecting statistics on index values. This affects the cardinality of the index, and therefore the query plans created by the optimizer.

bulk_insert_buffer_size

The size of a special tree-like cache that MyISAM uses for bulk inserts: INSERT … SELECT, INSERT … VALUES (…), (…), … and LOAD DATA INFILE statements.

key_buffer_size

Index blocks for MyISAM tables are buffered and shared between threads. This variable controls the size of that buffer. You can create multiple key buffers. Search for this variable description and read about the key cache in the MySQL Reference Manual.

preload_buffer_size

The size of the buffer that is allocated to preload indexes.

Calculating Safe Values for Options

When you try to optimize server performance by increasing buffers or maximums, it is crucial to think globally about memory use. Large buffers can crash the MySQL server with an “Out of memory” error. In this section I offer formulas that will help you calculate whether you are exceeding available memory. I will not describe the options themselves in this part. You can refer to previous sections or to the MySQL Reference Manual for more detailed descriptions. Calculations depend on when an option is allocated and whether it is shared, so I divide them into relevant categories in this section.

Options set for the whole server

These options are global, affecting all connections and queries. Some are allocated at server startup, whereas others take effect later, such as the query cache, which is initially zero and grows until it reaches its maximum value. It can take a bit of time until the MySQL server reaches all the limits and fully allocates all the memory you allow. Therefore, you should calculate the amount of RAM mysqld can acquire and add up all the buffer sizes to make sure you don’t exceed it.

The following is the list of memory buffers allocated for the whole server:

§ query_cache_size

§ innodb_additional_mem_pool_size

§ innodb_buffer_pool_size

§ innodb_log_buffer_size

§ key_buffer_size

Use the following formula to calculate how much RAM in megabytes you need to allocate these buffers:

SELECT (@@query_cache_size + @@innodb_additional_mem_pool_size +

@@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@key_buffer_size)/(1024*1024);

The server also has options that limit the number of file descriptors and how many threads can be cached. You can skip them for this calculation because the amount of memory they allocate is just the size of a pointer on the system multiplied by the quantity of items allocated, a total that is small enough to be ignored on modern systems. I just list them here for reference:

§ thread_cache_size

§ table_definition_cache

§ table_open_cache

§ innodb_open_files

Thread options

These options are allocated on a per-thread basis. So, the server can allocate max_connections*sum(thread options). Set max_connections and these options to make sure that the total amount of physical RAM - max_connections*sum(thread options) - options for whole server is greater than zero. Leave some RAM for options in the third group and a bit more for background operations, which cannot be controlled by variables.

Here is the list of thread options:

§ net_buffer_length

§ thread_stack

§ query_prealloc_size

§ binlog_cache_size

§ binlog_stmt_cache_size

Use the following formula to calculate how much RAM in megabytes you need in order to allocate them:

SELECT @@max_connections * (@@global.net_buffer_length + @@thread_stack +

@@global.query_prealloc_size + @@binlog_cache_size + @@binlog_stmt_cache_size) /

(1024 * 1024)

Or, if you are on a version older than 5.5.9 (the version in which the variable binlog_stmt_cache_size was introduced):

SELECT @@max_connections * (@@global.net_buffer_length + @@thread_stack +

@@global.query_prealloc_size + @@binlog_cache_size) / (1024 * 1024)

Buffers allocated for a specific operation

These buffers are allocated as needed when the server has to carry out a particular operation. It is hard to calculate the exact amount of RAM they can allocate. Analyze your queries to find out which require a lot of resources, and calculate something like the following:

(buffer size) * (number of buffers allocated for particular kind of query)

* (number of such queries that can be executed in parallel)

Do this for all variables, and calculate the sum of the results.

It is good practice to keep these options small, so long as they are adequate for most queries. Then, if a particular query needs more memory, just increase the variable’s value for that session. For example, if you need to set max_join_size really high for a statistics query that you run once per week, there is no sense to set it globally; set it just before running the query. Even with this precaution, don’t forget about memory usage as a whole.

Some of these options are allocated once per thread. These are:

§ read_rnd_buffer_size

§ sort_buffer_size

§ myisam_mmap_size

§ myisam_sort_buffer_size

§ bulk_insert_buffer_size

§ preload_buffer_size

Others can be allocated more than once per thread. These are:

§ join_buffer_size

§ read_buffer_size

§ tmp_table_size

You can use the following formula to calculate the maximum amount of memory in megabytes that the MySQL server can allocate for such options:

set @join_tables = YOUR_ESTIMATE_PER_THREAD;

set @scan_tables = YOUR_ESTIMATE_PER_THREAD;

set @tmp_tables = YOUR_ESTIMATE_PER_THREAD;

SELECT @@max_connections * (@@global.read_rnd_buffer_size +

@@global.sort_buffer_size + @@myisam_mmap_size +

@@global.myisam_sort_buffer_size + @@global.bulk_insert_buffer_size +

@@global.preload_buffer_size + @@global.join_buffer_size * IFNULL(@join_tables,

1) + @@global.read_buffer_size * IFNULL(@scan_tables, 1) +

@@global.tmp_table_size * IFNULL(@tmp_tables, 1)) / (1024 * 1024)

Remove from this formula those options that are not suitable for your environment.

To wrap up this chapter, here is a comprehensive formula that calculates the maximum amount of RAM in megabytes that your MySQL installation can use:

set @join_tables = YOUR_ESTIMATE_PER_THREAD;

set @scan_tables = YOUR_ESTIMATE_PER_THREAD;

set @tmp_tables = YOUR_ESTIMATE_PER_THREAD;

SELECT (@@query_cache_size + @@innodb_additional_mem_pool_size +

@@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@key_buffer_size +

@@max_connections * (@@global.net_buffer_length + @@thread_stack +

@@global.query_prealloc_size + @@global.read_rnd_buffer_size +

@@global.sort_buffer_size + @@myisam_mmap_size +

@@global.myisam_sort_buffer_size + @@global.bulk_insert_buffer_size +

@@global.preload_buffer_size + @@binlog_cache_size +

@@binlog_stmt_cache_size + @@global.join_buffer_size * IFNULL(@join_tables,

1) + @@global.read_buffer_size * IFNULL(@scan_tables, 1) +

@@global.tmp_table_size * IFNULL(@tmp_tables, 1))) / (1024 * 1024)

Or, for versions older than 5.5.9:

set @join_tables = YOUR_ESTIMATE_PER_THREAD;

set @scan_tables = YOUR_ESTIMATE_PER_THREAD;

set @tmp_tables = YOUR_ESTIMATE_PER_THREAD;

SELECT (@@query_cache_size + @@innodb_additional_mem_pool_size +

@@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@key_buffer_size +

@@max_connections * (@@global.net_buffer_length + @@thread_stack +

@@global.query_prealloc_size + @@global.read_rnd_buffer_size +

@@global.sort_buffer_size + @@myisam_mmap_size +

@@global.myisam_sort_buffer_size + @@global.bulk_insert_buffer_size +

@@global.preload_buffer_size + @@binlog_cache_size +

@@global.join_buffer_size * IFNULL(@join_tables,1) + @@global.read_buffer_size *

IFNULL(@scan_tables, 1) + @@global.tmp_table_size * IFNULL(@tmp_tables, 1))) /

(1024 * 1024)

Please note that the formulas work only if the values are small enough. If they are large, you either need to convert each variable to megabytes or cast them as UNSIGNED INTEGER. Even casting to UNSIGNED INTEGER won’t help if any variable can exceed the maximum unsigned integer value, which is 18446744073709547520. I didn’t take these possible overflows into account, because I wanted the formulas to be readable and clear. It also makes sense to remove some of the variables from the calculation if you don’t use those buffers or features. For example, instead of using the default value of myisam_mmap_size, use the maximum size of the MyISAM tables that one thread can use instead.


[11] One exception is the optimizer parameters in EXPLAIN statements.

[12] This is not a hard limit, and sometimes more than one thread per group is executing.

[13] This is not 100% true anymore, because the InnoDB Plugin 1.0.7 introduced improvements that speed up crash recovery.