MySQL in a Nutshell (2008)

Part II. SQL Statements and Functions

Chapter 8. Replication Statements and Functions

This chapter includes a tutorial on setting up and using replication, a list of SQL statements and functions used specifically with replication, and an explanation of replication states that will be useful for checking whether replication is operating as needed. The replication SQL statements and functions covered in this chapter are:

CHANGE MASTER TO, LOAD DATA FROM MASTER, LOAD TABLE...FROM MASTER, MASTER_POS_WAIT(), PURGE MASTER LOGS, RESET MASTER, RESET SLAVE, SET GLOBAL SQL_SLAVE_SKIP_COUNTER, SET SQL_LOG_BIN, SHOW BINARY LOGS,SHOW MASTER LOGS, SHOW BINLOG EVENTS, SHOW MASTER STATUS, SHOW SLAVE HOSTS, SHOW SLAVE STATUS, START SLAVE, STOP SLAVE.

Merits of Replication

One of the difficulties of maintaining a large and active MySQL database is making clean backups without having to bring down the server. Performing a backup while a server is running can slow down a system considerably. Additionally, backups made on active servers can result in inconsistent data because a related table may be changed while another is being copied. Taking down the server ensures consistency of data, but it interrupts MySQL service to users. Sometimes this is necessary and unavoidable, but daily server outages for backing up data may be an unacceptable choice. A simple alternative is to set up replication of MySQL, so that one or more redundant servers maintain a consistent and continuous copy of the main MySQL server’s databases, and can be taken down for backups while the main server continues serving the users.

Typically, replication is primarily a matter of configuring multiple servers to the one where users submit their updates, known in this context as a master server, which houses the data and handles client requests. The server logs all data changes to a binary log, locally. The master in turn informs another MySQL server (a slave server), which contains a copy of the master’s databases, and of any additions to its binary log. The slave in turn makes these same changes to its databases. The slave can either reexecute the master’s SQL statements locally, or just copy over changes to the master’s databases. There are other uses for replication (such as load balancing), but the concern of this tutorial is using replication for data backups and resiliency. Also, it’s easy to set up multiple slaves for each server, but one is probably enough if you’re using replication only for backups.

As a backup method, you can set up a separate server to be a slave, and then once a day (or however often you prefer) turn off replication to make a clean backup of the slave server’s databases. When you’re finished making the backup, replication can then be restarted and the slave will automatically query the master for changes to the master’s data that the slave missed while it was offline.

Replication is an excellent feature built into the MySQL core. It doesn’t require you to buy or install any additional software. You just physically set up a slave server and configure MySQL on both servers appropriately to begin replication. Then it’s a matter of developing a script to routinely stop the replication process, make a backup of the slave’s data, and restart replication.

To understand how to make replication efficient and robust in a particular environment, let’s look in detail at the steps that MySQL goes through to maintain a replicated server. The process is different depending on the version of MySQL your servers are using. This chapter applies primarily to version 4.0 or higher of MySQL. There were some significant improvements made in version 4.0 related to how replication activities are processed, making it much more dependable. Therefore, it is recommended that you upgrade your servers if they are using an older version. You should upgrade one release at a time, and use the same version of MySQL on both the master and all the slave servers. Otherwise, you may experience problems with authenticating the servers, incompatible table schemas, and other such problems.

Replication Process

When replication is running, SQL statements that change data are recorded in a binary log (bin.log) on the master server as it executes them. Only SQL statements that change the data or the schema are logged. This includes data-changing statements such as INSERT, UPDATE, and DELETE, and schema-manipulation statements such as CREATE TABLE, ALTER TABLE, and DROP TABLE. This also includes actions that affect data and schema, but that are executed from the command line by utilities such as mysqladmin. This does not include SELECT statements or any statements that only query the server for information (e.g., SHOW VARIABLES).

Along with the SQL statements, the master records a log position identification number. This is used to determine which log entries the master should relay to the slave. This is necessary because the slave may not always be able to consistently receive information from the master. We’ve already discussed one situation where an administrator deliberately introduces a delay: the planned downtime for making a backup of the slave. In addition, there may be times when the slave has difficulty staying connected to the master due to networking problems, or it may simply fall behind because the master has a heavy load of updates in a short period of time. However, if the slave reconnects hours or even days later, with the position identification number of the last log entry received, it can tell the master where it left off in the binary log and the master can send the slave all of the subsequent entries it missed while it was disconnected. It can do this even if the entries are contained in multiple log files due to the master’s logs having been flushed in the interim.

To help you better understand the replication process, I’ve included—in this section especially, and throughout this chapter—sample excerpts from each replication log and index file. Knowing how to sift through logs can be useful in resolving server problems, not only with replication but also with corrupt or erroneously written data.

Here is a sample excerpt from a master binary log file:

/usr/local/mysql/bin/mysqlbinlog /var/log/mysql/bin.000007 >

/tmp/binary_log.txt

tail --lines=14 /tmp/binary_log.txt

# at 1999

#081120 9:53:27 server id 1 end_log_pos 2158 Query thread_id=1391

exec_time=0 error_code=0

USE personal;

SET TIMESTAMP=1132502007;

CREATE TABLE contacts2 (contact_id INT AUTO_INCREMENT KEY, name VARCHAR(50),

telephone CHAR(15));

# at 2158

#081120 9:54:53 server id 1 end_log_pos 2186 Intvar

SET INSERT_ID=1;

# at 2186

#081120 9:54:53 server id 1 end_log_pos 2333 Query thread_id=1391

exec_time=0 error_code=0

SET TIMESTAMP=1132502093;

INSERT INTO contacts2 (name, telephone) VALUES ('Rusty Osborne',

'001-504-838-1234');

As the first line shows, I used the command-line utility mysqlbinlog to read the contents of a particular binary log file. (MySQL provides mysqlbinlog to make it possible for administrators to read binary log files.) Because the log is extensive, I have redirected the results to a text file in the /tmp directory using the shell’s redirect operator (>). On the second line, I used the tail command to display the last 14 lines of the text file generated, which translates to the last 3 entries in this case. You could instead pipe (|) the contents to more or less on a Linux or Unix system if you intend only to scan the results briefly.

After you redirect the results of a binary log to a text file, it may be used to restore data on the master server to a specific point in time. Point-in-time recovery methods are an excellent recourse when you have inadvertently deleted a large amount of data that has been added since your last backup.

The slave server, through an input/output (I/O) thread, listens for communications from the master that inform the slave of new entries in the master’s binary log and of any changes to its data. The master does not transmit data unless requested by the slave, nor does the slave continuously harass the master with inquiries as to whether there are new binary log entries. Instead, after the master has made an entry to its binary log, it looks to see whether any slaves are connected and waiting for updates. The master then pokes the slave to let it know that an entry has been made to the binary log in case it’s interested. It’s then up to the slave to request the entries. The slave will ask the master to send entries starting from the position identification number of the last log file entry the slave processed.

Looking at each entry in the sample binary log, you will notice that each starts with the position identification number (e.g., 1999). The second line of each entry provides the date (e.g., 081120 for November 20, 2008), the time, and the replication server’s identification number. This is followed by the position number expected for the next entry. This number is calculated from the number of bytes of text that the current entry required. The rest of the entry provides stats on the thread that executed the SQL statement. In some of the entries, a SET statement is provided with the TIMESTAMP variable so that when the binary log entry is used, the date and time will be adjusted on the slave server to match the date and time of the entry on the master. The final line of each entry lists the SQL statement that was executed.

The excerpt begins with a USE statement, which is included to be sure that the slave makes the subsequent changes to the correct database. Similarly, notice that the second entry sets the value of INSERT_ID in preparation for the INSERT statement of the following entry. This ensures that the value to be used for the column contact_id on the slave is the same. Nothing is left to chance or assumed, if possible.

The master server keeps track of the names of the binary log files in a simple text file (bin.index). Here is an excerpt from the binary index file:

/var/log/mysql/bin.000001

/var/log/mysql/bin.000002

/var/log/mysql/bin.000003

/var/log/mysql/bin.000004

/var/log/mysql/bin.000005

/var/log/mysql/bin.000006

/var/log/mysql/bin.000007

This list of binary log files can also be obtained by entering the SHOW MASTER LOGS statement. Notice that the list includes the full pathname of each binary log file in order, reflecting the order in which the files were created. The master appends each name to the end of the index file as the log file is opened. If a slave has been offline for a couple of days, the master will work backward through the files to find the file containing the position identification number given to it by the slave. It will then read that file from the entry following the specified position identification number to the end, followed by the subsequent files in order, sending SQL statements from each to the slave until the slave is current or disconnected. If the slave is disconnected before it can become current, the slave will make another request when it later reconnects with the last master log position identification number it received.

After the slave is current again, the slave will go back to waiting for another announcement from the master regarding changes to its binary log. The slave will make inquiries only when it receives another nudge from the master or if it is disconnected temporarily. When a slave reconnects to the master after a disconnection, it makes inquiries to ensure it didn’t miss anything while it was disconnected. If it sits idle for a long period, the slave’s connection will time out, also causing it to reconnect and make inquires.

When the slave receives new changes from the master, the slave doesn’t update its databases directly. Direct application of changes was tried in versions of replication prior to MySQL 4.0 and found to be too inflexible to deal with heavy loads, particularly if the slave’s databases are also used to support user read requests (i.e., the slave helps with load balancing). For example, tables in its replicated databases may be busy when the slave is attempting to update the data. A SELECT statement could be executed with the HIGH_PRIORITY flag, giving it priority over UPDATE and other SQL statements that change data and are not also specifically entered with the HIGH_PRIORITY flag. In this case, the replication process would be delayed by user activities. On a busy server, the replication process could be delayed for several minutes. If the master server crashes during such a lengthy delay, this could mean the loss of many data changes of which the slave is not informed because it's waiting to access a table on its own system.

By separating the recording of entries received and their reexecution, the slave is assured of getting all or almost all transactions up until the time that the master server crashes. This is a much more dependable method than the direct application method used in earlier versions of MySQL.

Currently, the slave appends the changes to a file on its filesystem named relay.log. Here is an excerpt from a relay log:

/*!40019 SET @@session.max_insert_delayed_threads=0*/;

/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;

# at 4

#081118 3:18:40 server id 2 end_log_pos 98

Start: binlog v 4, server v 5.0.12-beta-standard-log created 051118

3:18:40

# at 98

#700101 1:00:00 server id 1 end_log_pos 0 Rotate to bin.000025 pos: 4

# at 135

#080819 11:40:57 server id 1 end_log_pos 98

Start: binlog v 4, server v 5.0.10-beta-standard-log created 050819

11:40:57 at startup

ROLLBACK;

# at 949

#080819 11:54:49 server id 1 end_log_pos 952

Query thread_id=10 exec_time=0 error_code=0

SET TIMESTAMP=1124445289;

CREATE TABLE prepare_test (id INTEGER NOT NULL, name CHAR(64) NOT NULL);

# at 952

#080819 11:54:49 server id 1 end_log_pos 1072

Query thread_id=10 exec_time=0 error_code=0

SET TIMESTAMP=1124445289;

INSERT INTO prepare_test VALUES ('0','zhzwDeLxLy8XYjqVM');

This log is like the master’s binary log. Notice that the first entry mentions the server’s ID number, 2, which is the slave’s identification number. There are also some entries for server 1, the master. The first entries have to do with log rotations on both servers. The last two entries are SQL statements relayed to the slave from the master.

A new relay log file is created when replication starts on the slave and when the logs are flushed (i.e., the FLUSH LOGS statement is issued). A new relay log file is also created when the current file reaches the maximum size as set with the max_relay_log_size variable. The maximum size can also be limited by the max_binlog_size variable. If these variables are set to 0, there is no size limit placed on the relay log files.

Once the slave has made note of the SQL statements relayed to it by the master, it records the new position identification number in its master information file (master.info) on its filesystem. Here is an example of the content of a master information file on a slave server:

bin.000038

6393

master_host

replicant

my_pwd

3306

This file is present primarily so the slave can remember its position in the master’s binary log file even if the slave is rebooted, as well as the information necessary to reconnect to the master. Each line has a purpose as follows:

1. The first line contains the number of lines of data in the file (14). Although fewer than 14 lines are shown here, the actual file contains blank lines that make up the rest.

2. The second line shows the name of the last binary log file on the master from which the slave received entries. This helps the master respond more quickly to requests.

3. The third line shows the position identification number (6393) in the master’s binary log.

4. The next few lines contain the master’s host address, the replication username, the password, and the port number (3306). Notice that the password is not encrypted and is stored in clear text. Therefore, be sure to place this file in a secure directory. You can determine the path for this file in the configuration file, as discussed later in this chapter.

5. The next to last line (60) lists the number of attempts the slave should make when reconnecting to the master before stopping.

6. The last line here is 0 because the server from which this master information file came does not have the SSL feature enabled. If SSL was enabled on the slave and allowed on the master, there would be a value of 1 on this line. It would also be followed by 5 more lines containing values related to SSL authentication, completing the 14 lines anticipated on the first line.

Take note of how the values in the master information file match the following excerpt from a SHOW SLAVE STATUS statement executed on the slave:

SHOW SLAVE STATUS \G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: master_host

Master_User: replicant

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: bin.000038

Read_Master_Log_Pos: 6393

Relay_Log_File: relay.000002

Relay_Log_Pos: 555

Relay_Master_Log_File: bin.000011

Slave_IO_Running: Yes

Slave_SQL_Running: No

Replicate_Do_DB: test

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 1062

Last_Error: Error 'Duplicate entry '1000' for key 1' on query.'

Skip_Counter: 0

Exec_Master_Log_Pos: 497

Relay_Log_Space: 22277198

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Notice the labels for the additional SSL variables at the end of this excerpt. The master information file contains lines for them, whether they are empty or populated. Also note that, for tighter security, the command does not return the password.

After noting the new position number and other information that may have changed, the slave uses the same I/O thread to resume waiting for more entries from the master.

When the slave server detects any change to its relay log, through a different thread, the slave uses an SQL thread to execute the new SQL statement recorded in the relay log to the slave’s databases. After the new entry is recorded in the slave’s relay log, the new relay log position identification number is recorded in its relay log information file (relay-log.info) through the slave’s SQL thread. Here is an excerpt from a relay log information file:

/var/log/mysql/relay.000002

555

bin.000011

497

The first line lists the file path and name of the current relay log file (Relay_Log_File in the SHOW SLAVE STATUS command ). The second value is the SQL thread’s position in the relay log file (Relay_Log_Pos). The third contains the name of the current binary log file on the master (Relay_Master_Log_File). The last value is the position in the master log file (Exec_Master_Log_Pos). These values can also be found in the results of the SHOW SLAVE STATUS statement shown earlier in this section.

When the slave is restarted or its logs are flushed, it appends the name of the current relay log file to the end of the relay log index file (relay-log.index). Here is an example of a relay log index file:

/var/log/mysql/relay.000002

/var/log/mysql/relay.000003

/var/log/mysql/relay.000004

This process of separating threads keeps the I/O thread free and dedicated to receiving changes from the master. It ensures that any delays in writing to the slave’s databases on the SQL thread will not prevent or slow the receiving of data from the master. With this separate thread method, the slave server naturally has exclusive access to its relay log file at the filesystem level.

As an additional safeguard to ensure accuracy of data, the slave compares the entries in the relay log to the data in its databases. If the comparison reveals any inconsistency, the replication process is stopped and an error message is recorded in the slave’s error log (error.log). The slave will not restart until it is told to do so. After you have resolved the discrepancy that the slave detected in the data, you can then instruct the slave to resume replication, as explained later in this chapter.

Here is an example of what is recorded on a slave server in its error log when the results don’t match:

020714 01:32:03 mysqld started

020714 1:32:05 InnoDB: Started

/usr/sbin/mysqld-max: ready for connections

020714 8:00:28 Slave SQL thread initialized, starting replication in log

'server2-bin.035' at position 579285542, relay log './db1-relay-bin.001'

position: 4

020714 8:00:29 Slave I/O thread: connected to master

'...@66.216.68.90:3306', replication started in log 'server2-bin.035' at

position 579285542 ERROR: 1146 Table 'test.response' doesn't exist

020714 8:00:30 Slave: error 'Table 'test.response' doesn't exist' on query

'INSERT INTO response SET connect_time=0.073868989944458,

page_time=1.53695404529572, site_id='Apt'', error_code=1146

020714 8:00:30 Error running query, slave SQL thread aborted. Fix the

problem, and restart the slave SQL thread with "SLAVE START". We stopped at

log 'server2-bin.035' position 579285542

020714 8:00:30 Slave SQL thread exiting, replication stopped in log

'server2-bin.035' at position 579285542

020714 8:00:54 Error reading packet from server: (server_errno=1159)

020714 8:00:54 Slave I/O thread killed while reading event

020714 8:00:54 Slave I/O thread exiting, read up to log 'server2-bin.035',

position 579993154

020714 8:01:58 /usr/sbin/mysqld-max: Normal shutdown

020714 8:01:58 InnoDB: Starting shutdown...

020714 8:02:05 InnoDB: Shutdown completed

020714 8:02:06 /usr/sbin/mysqld-max: Shutdown Complete

020714 08:02:06 mysqld ended

In the first message, I have boldfaced an error message showing that the slave has realized the relay log contains entries involving a table that does not exist on the slave. The second boldfaced comment gives a message informing the administrator of the decision and some instructions on how to proceed.

The replication process may seem very involved and complicated at first, but it all occurs quickly; it’s typically not a significant drain on the master server. Also, it’s surprisingly easy to set up: it requires only a few lines of options in the configuration files on the master and slave servers. You will need to copy the databases on the master server to the slave to get the slave close to being current. Then it’s merely a matter of starting the slave for it to begin replicating. It will quickly update its data to record any changes made since the initial backup copied from the master was installed on the slave. From then on, replication will keep it current—theoretically. As an administrator, you will have to monitor the replication process and resolve problems that arise occasionally.

Before concluding this section, let me adjust my previous statement about the ease of replication: replication is deceptively simple. When it works, it’s simple. Before it starts working, or if it stops working, the minimal requirements of replication make it difficult to determine why it doesn’t work. Now let’s look at the steps for setting up replication.

The Replication User Account

There are only a few steps to setting up replication. The first step is to set up user accounts dedicated to replication on both the master and the slave. It’s best not to use an existing account for security reasons. To set up the accounts, enter an SQL statement like the following on the master server, logged in as root or a user that has the GRANT OPTION privilege:

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.*

TO 'replicant'@'slave_host' IDENTIFIED BY 'my_pwd';

These two privileges are all that are necessary for a user to replicate a server. The REPLICATE SLAVE privilege permits the user to connect to the master and to receive updates to the master’s binary log. The REPLICATE CLIENT privilege allows the user to execute the SHOW MASTER STATUS and the SHOW SLAVE STATUS statements. In this SQL statement, the user account replicant is granted only what is needed for replication. The username can be almost anything. Both the username and the hostname are given within quotes. The hostname can be one that is resolved through /etc/hosts (or the equivalent on your system), or it can be a domain name that is resolved through DNS. Instead of a hostname, you can give an IP address:

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.*

TO 'replicant'@'12.127.17.72' IDENTIFIED BY 'my_pwd';

If you upgraded MySQL on your server to version 4.x recently, but you didn’t upgrade your mysql database, the GRANT statement shown won’t work because these privileges didn’t exist in the earlier versions. For information on fixing this problem, see the section onmysql_fix_privilege_tables in Chapter 16.

Now enter the same GRANT statement on the slave server with the same username and password, but with the master’s hostname or IP address:

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.*

TO 'replicant'@'master_host' IDENTIFIED BY 'my_pwd';

There is a potential advantage of having the same user on both the master and the slave: if the master fails and will be down for a while, you can redirect users to the slave with DNS or by some other method. When the master is back up, you can then use replication to get the master up-to-date by temporarily making it a slave to the former slave server. This is cumbersome, though, and is outside the scope of this book. For details, see High Performance MySQL (O’Reilly). You should experiment with and practice such a method with a couple of test servers before relying on it with production servers.

To see the results of the first GRANT statement for the master, enter the following:

SHOW GRANTS FOR 'replicant'@'slave_host' \G

*************************** 1. row ***************************

Grants for replicant@slave_host:

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.*

TO 'replicant'@'slave_host'

IDENTIFIED BY PASSWORD '*60115BF697978733E110BA18B3BC31D181FFCG082'

Note, incidentally, that the password has been encrypted in the output. If you don’t get results similar to those shown here, the GRANT statement entry failed. Check what you typed when you granted the privileges and when you executed this statement. If everything was typed correctly and included in both statements, verify that you have version 4.0 of MySQL or higher, a version that supports these two new privileges. Enter SELECT VERSION(); on each server to determine the versions they are using.

Configuring the Servers

Once the replication user is set up on both servers, you will need to add some lines to the MySQL configuration file on the master and on the slave server. Depending on the type of operating system, the configuration file will probably be called either my.cnf or my.ini. On Unix types of systems, the configuration file is usually located in the /etc directory. On Windows systems, it’s usually located in c:\ or in c:\Windows. If the file doesn’t exist on your system, you can create it. Using a plain text editor (e.g., vi or Notepad.exe)—one that won’t add binary formatting—add the following lines to the configuration file of the master under the [mysqld] group heading:

[mysqld]

server-id = 1

log-bin = /var/log/mysql/bin.log

...

The server identification number is an arbitrary number used to identify the master server in the binary log and in communications with slave servers. Almost any whole number from 1 to 4294967295 is fine. Don’t use 0, as that causes problems. If you don’t assign a server number, the default server identification number of 1 will be used. The default is all right for the master, but a different one should be assigned to each slave. To keep log entries straight and avoid confusion in communications between servers, it is very important that each slave have a unique number.

In the configuration file excerpt shown here, the line containing the log-bin option instructs MySQL to perform binary logging to the path and file given. The actual path and filename is mostly up to you. Just be sure that the directory exists and that the user mysql is the owner, or at least has permission to write to the directory. By default, if a path is not given, the server’s data directory is assumed as the path for log files. To leave the defaults in place, give log-bin without the equals sign and without the file pathname. This example shows the default pathname. If you set the log file name to something else, keep the suffix .log as shown here. It will be replaced automatically with an index number (e.g., .000001) as new log files are created when the server is restarted or the logs are flushed.

These two options are all that is required on the master. They can be put in the configuration file or given from the command line when starting the mysqld daemon each time. On the command line, add the required double dashes before each option and omit the spaces around the equals signs.

For InnoDB tables, you may want to add the following lines to the master’s configuration file:

innodb_flush_log_at_trx_commit = 1

sync-binlog = 1

These lines resolve problems that can occur with transactions and binary logging.

For the slave server, we will need to add several options to the slave’s configuration file, reflecting the greater complexity and number of threads on the slave. You will have to provide a server identification number, information on connecting to the master server, and more log options. Add lines similar to the following to the slave’s configuration file:

[mysqld]

server-id = 2

log-bin = /var/log/mysql/bin.log

log-bin-index = /var/log/mysql/log-bin.index

log-error = /var/log/mysql/error.log

relay-log = /var/log/mysql/relay.log

relay-log-info-file = /var/log/mysql/relay-log.info

relay-log-index = /var/log/mysql/relay-log.index

slave-load-tmpdir = /var/log/mysql/

skip-slave-start

...

At the top, you can see the server identification number is set to 2. The next stanzas set the logs and related index files. If these files don’t exist when the slave is started, it will automatically create them.

The second stanza starts binary logging like on the master server, but this time on the slave. This is the log that can be used to allow the master and the slave to reverse roles as mentioned earlier. The binary log index file (log-bin.index) records the name of the current binary log file to use. The log-error option establishes an error log. Any problems with replication will be recorded in this log.

The third stanza defines the relay log that records each entry in the master server’s binary log, along with related files mentioned earlier. The relay-log-info-file option names the file that records the most recent position in the master’s binary log that the slave recorded for later execution (not the most recent statement actually executed by the slave), while the relay log index file in turn records the name of the current relay log file to use for replication.

The slave-load-tmpdir option is necessary only if you expect the LOAD DATA INFILE statement to be executed on the server. This SQL statement is used to import data in bulk into the databases. The slave-load-tmpdir option specifies the temporary directory for those files. If you don’t specify the option, the value of the tmpdir variable will be used. This relates to replication because the slave will log LOAD DATA INFILE activities to the log files with the prefix SQL_LOAD- in this directory. For security, you may not want those logs to be placed in a directory such as /tmp.

The last option, skip-slave-start, prevents the slave from replicating until you are ready. The order and spacing of options, incidentally, are a matter of personal style.

To set variables on the slave related to its connection with the master (e.g., the master’s host address), it is recommended that you use the CHANGE MASTER TO statement to set the values on the slave. You could provide the values in the configuration file. However, the slave will read the file only the first time you start up the slave for replication. Because the values are stored in the master.info file, MySQL just relies on that file during subsequent startups and ignores these options in the main MySQL configuration file. The only time it adjusts the master.info file contents is when you tell it to explicitly through a CHANGE MASTER TO statement. You could edit the master.info file and other replication information files directly, but you might cause more problems in doing so. It’s best to use the CHANGE MASTER TO statement to make changes. Here is an example:

CHANGE MASTER TO MASTER_HOST = 'master_host';

CHANGE MASTER TO MASTER_PORT = 3306;

CHANGE MASTER TO MASTER_USER = 'replicant';

CHANGE MASTER TO MASTER_PASSWORD = 'my_pwd';

This set of SQL statements provides information about the master server. The first statement gives the hostname (or the IP address) of the master. The next one provides the port for the connection. Port 3306 is the default port for MySQL, but another could be used for performance or security considerations. The next two lines set the username and password for logging into the master server. After you run these SQL statements, their values are stored in the master.info file and you shouldn’t need to rerun the statements upon subsequent startups.

At this point, the servers should be configured properly. Next, you will need to get the slave’s data current by making a backup on the master server and copying it manually to the slave. This is described in the following section. If the master and slave are new servers and the master has no data yet, you can skip the next section and proceed to Starting Replication.”

Copying Databases and Starting Replication

If you’re setting up replication with an existing server that already contains data, you will need to make an initial backup of the databases and copy the backup to the slave server. I’ll list the recommended method first, followed by some alternatives and their limitations.

To get a snapshot of the database in a consistent state, you need to shut down the server while you make a copy of the data, or at least prevent users from changing data. Considering that once you set up replication you may never have to shut down your master server for backups again, explain to management that it’s worth inconveniencing the users this one time to get a clean, consistent backup. The following sections will explain how to lock the tables. Note that you can allow users to make changes as soon as your copy is made. If they make changes before replication starts, MySQL can easily recognize and incorporate those changes into the slave.

Using mysqldump

This utility, described in Chapter 16, creates a file of SQL statements that can later be executed to recreate databases and their contents. For the purposes of setting up replication, use the following options while running the utility from the command line on the master server:

mysqldump --user=root --password=my_pwd \

--extended-insert --all-databases \

--ignore-table=mysql.users --master-data > /tmp/backup.sql

The result is a text file (backup.sql) containing SQL statements to create all of the master’s databases and tables and insert their data. Here is an explanation of some of the special options shown:

--extended-insert

This option creates multiple-row INSERT statements and thereby makes the resulting dump file smaller. It also allows the backup to run faster.

--ignore-table

This option is used here so that the usernames and passwords won’t be copied. This is a good security precaution if the slave will have different users, and especially if it will be used only for backups of the master. Unfortunately, there is no easy way to exclude the entire mysql database containing user information. You could list all the tables in that database to be excluded, but they have to be listed separately, and that becomes cumbersome. The only table that contains passwords is the users table, so it may be the only one that matters. However, it depends on whether you set security on a database, table, or other basis, and therefore want to protect that user information.

--master-data

This option locks all of the tables during the dump to prevent data from being changed, but allows users to continue reading the tables. This option also adds a few lines like the following to the end of the dump file:

-- --

Position to start replication from --

CHANGE MASTER TO MASTER_LOG_FILE='bin.000846';

CHANGE MASTER TO MASTER_LOG_POS=427;

When the dump file is executed on the slave server, these lines will record the name of the master’s binary log file and the position in the log at the time of the backup, while the tables were locked. When replication is started, these lines will provide this information to the master so it will know the point in the master’s binary log to begin sending entries to the slave. This is meant to ensure that any data that changes while you set up the slave server isn’t missed.

To execute the dump file and thereby set up the databases and data on the slave server, copy the dump file generated by mysqldump to the slave server. The MySQL server needs to be running on the slave, but not replication. Run the mysql client through a command such as the following on the slave:

mysql --user=root --password=my_pwd < /tmp/backup.sql

This will execute all of the SQL statements in the dump file, creating a copy of the master’s databases and data on the slave.

Alternative Methods for Making Copies

If you peruse MySQL documentation, you might get the idea that the LOAD DATA FROM MASTER statement is ideal for making a copy, but it is actually not very feasible. First, it works only on MyISAM tables. Second, because it performs a global read lock on the master while it is making a backup, it prevents the master from serving users for some time. Finally, it can be very slow and depends on good network connectivity (so it can time out while copying data). Basically, the statement is a nice idea, but it’s not very practical or dependable in most situations. It has been deprecated by MySQL AB and will be removed from future releases.

A better alternative is to drop down to the operating system level and copy the raw files containing your schemas and data. To leave the server up but prevent changes to data before you make a copy of the MySQL data directory, you could put a read-only lock on the tables by entering the following command:

FLUSH TABLES WITH READ LOCK;

This statement will commit any transactions that may be occurring on the server, so be careful and make sure the lock is actually in place before you continue. Then, without disconnecting the client that issued the statement, copy the data directory to an alternative directory. Once this is completed, issue an UNLOCK TABLES statement in the client that flushed and locked the tables. After that, the master responds to updates as usual, while you need only transfer the copy of the data directory to the slave server, putting it into the slave server’s data directory. Be sure to change the ownership of all of the files and directories to mysql. In Linux, this is done by entering the following statement as root:

chown -R mysql:mysql /path_to_data

You will run into a complication with this method of copying the data directory if you have InnoDB tables in your databases, because they are not stored in the data directory. Also, if you don’t have administrative access to the filesystem to be able to manually copy the data directory, you won’t be able to use this method. This is why mysqldump remains the recommended method for copying the master’s data.

Starting Replication

After you create the replication user accounts, configure the servers properly, and load the backed-up databases onto the slave server, you’re ready to begin replication. Execute the following SQL statement while logged in as root or a user with SUPER privileges on the slave:

START SLAVE;

After this statement is run, the slave should connect to the master and get the changes it missed since the backup. From there, it should stay current by continuously interacting with the master, as outlined in the Replication Process” section earlier in this chapter.

If everything is configured correctly on the slave, it will most likely start without a problem and return no message when START SLAVE is executed. However, when the slave tries to connect to the master, the connection may fail. Or when the SQL thread begins processing entries received from the master, it may fail. For whatever reason, if a slave fails after it is started, the client that started the slave will not be informed of the failure, nor will it be informed of the subsequent termination of the slave thread. For that information, you will have to read the slave’s error logs. To confirm a slave is running, you can execute the SHOW SLAVE STATUS statement and check the results to see what state the slave is in, if any. We will describe the various slave states later in this chapter.

By default, the START SLAVE statement starts both the I/O thread and the execution thread as described earlier in the Replication Process” section. You can specify which slave thread to start if you don’t want to start both. You can also specify a particular master binary log file and the position in the log in which to stop replicating. You shouldn’t need to make these distinctions when first starting a slave. These extra options for START SLAVE are useful when debugging a problem with a slave log, and especially when attempting to restore data to a particular position in the log because a user entered an erroneous statement and you want to revert to an earlier point in the database.

Here is an example of these possibilities:

START SLAVE SQL_THREAD

UNTIL MASTER_LOG_FILE = 'relay.0000052',

MASTER_LOG_POS = 254;

You can also control the processing of the relay log file with this syntax, but using the RELAY_LOG_FILE and the RELAY_LOG_POS parameters. You cannot specify a master log position and a relay log position in the same statement, though.

The UNTIL clause will be ignored if the SQL thread is already running. It will also be ignored if a slave already doing replication is shut down and restarted, or if the STOP SLAVE statement is executed followed by a START SLAVE statement without the UNTIL clause. Therefore, to use these options for fine-grained control, restart the slave server with the --skip-slave-start option in the configuration file.

Backups with Replication

With replication running, it’s an easy task to make a backup of the data. You just need to temporarily stop the slave server from replicating by entering the following SQL statement while logged onto the slave server as root or as a user with SUPER privileges:

STOP SLAVE;

The slave server knows the position where it left off in the binary log of the master server and will record that information in the master.info file. So, you can take your time making a backup of the replicated databases on the slave server. You can use any backup utility or method you prefer. The only complication is if the slave also assists in handling user requests for load balancing, in which case STOP SLAVE throws the burden back on the master or on other slaves.

If the slave is used only for backups and has no users accessing the data, you could simply copy the data directory. I prefer to use mysqldump because it’s fairly straightforward and works with all table types. To make a backup with mysqldump, enter something like the following:

mysqldump --user=root --password=my_pwd --lock-all-tables \

--all-databases > /backups/mysql/backup.sql

When the backup is finished, enter the following SQL statement as root on the slave server to restart replication:

START SLAVE;

After entering this statement, there should be a flurry of activity on the slave as it executes the SQL statements that occurred while it was down. After a very short period of time, though, it should be current.

SQL Statements and Functions in Alphabetical Order

Several SQL statements apply directly to replication. One function, MASTER_POS_WAIT(), also applies to replication, and it is listed here with the statements.

Name

CHANGE MASTER TO

Synopsis

CHANGE MASTER TO

[MASTER_HOST = 'host' |

MASTER_USER = 'user' |

MASTER_PASSWORD = 'password' |

MASTER_PORT = port |

MASTER_CONNECT_RETRY = count |

MASTER_LOG_FILE = 'filename' |

MASTER_LOG_POS = position |

RELAY_LOG_FILE = 'filename' |

RELAY_LOG_POS = position |

MASTER_SSL = {0|1} |

MASTER_SSL_CA = 'filename' |

MASTER_SSL_CAPATH = 'path' |

MASTER_SSL_CERT = 'filename' |

MASTER_SSL_KEY = 'filename' |

MASTER_SSL_CIPHER = 'list' |

MASTER_SSL_VERIFY_SERVER_CERT = {0|1}], [,...]

This statement changes the settings on a slave server related to the master server and replication. Some of the variables relate to connecting to the master server, and some relate to master log files and the current position in the log files. This statement is run from the slave.

If the slave is engaging in replication, it may be necessary to use the STOP SLAVE statement before using this statement and the START SLAVE statement afterward. These options can be set from the server’s options file, but it’s much better to use this SQL statement to set replication options. MASTER_SSL_VERIFY_SERVER_CERT is available as of version 5.1.18 of MySQL and is comparable to the --ssl-verify-server-cert option. See Chapter 15 for more information on this client option.

Multiple option and value pairs may be given in one CHANGE MASTER TO statement, as long as the pairs are separated by commas. For example, the following SQL statement sets several properties for this slave:

CHANGE MASTER TO

MASTER_HOST='mysql.company.com',

MASTER_PORT=3306,

MASTER_USER='slave_server',

MASTER_PASSWORD='password',

MASTER_CONNECT_RETRY=5;

The clauses related to log files name the master log files and provide the slave with the current position of the master log files. This may be necessary when first setting up a new slave or when a slave has been disabled for a while. Use the SHOW MASTER STATUS statement to determine the current position of the master log files, and the SHOW SLAVE STATUS statement to confirm a slave’s position for the related files. Here is an example using the clauses related to log files:

CHANGE MASTER TO

MASTER_LOG_FILE= 'log-bin.000153',

MASTER_LOG_POS = 79,

RELAY_LOG_FILE = 'log-relay.000153',

RELAY_LOG_POS = 112;

The remaining clauses set various SSL variables. These values are saved to the master.info file. To see the current values for these options, use the SHOW SLAVE STATUS statement.

Relay log options are available as of version 4.1.1 of MySQL. The MASTER_SSL variable is set to 0 if the master does not allow SSL connections, and 1 if it does. The MASTER_SSL_CA variable holds the name of the file that contains a list of trusted certificate authorities (CAs).MASTER_SSL_CAPATH contains the absolute path to that file. The MASTER_SSL_CERT variable specifies the name of the SSL certificate file for secure connections, and MASTER_SSL_KEY specifies the SSL key file used to negotiate secure connections. Finally, MASTER_SSL_CIPHERprovides a list of acceptable cipher methods for encryption.

Name

LOAD DATA FROM MASTER

Synopsis

LOAD DATA FROM MASTER

This statement has been deprecated and will be removed from future releases of MySQL. It never worked very well. It was meant to make a copy of all the databases on the master server (except the mysql database) and copy them to the slave servers. It gets a global read lock on all tables while it takes a snapshot of the databases, and releases the lock before copying them to the slaves. The MASTER_LOG_FILE and the MASTER_LOG_POS variables will be updated so that the slave knows where to begin logging.

This statement works only with MyISAM tables. The user for the connection must have RELOAD, SELECT, and SUPER privileges on the master server. The user must also have CREATE and DROP privileges on the slave server. For large databases, increase the values of thenet_read_timeout and net_write_timeout variables with the SET statement. To load a specific table from the master server, use the LOAD TABLE...FROM MASTER statement.

Again, this statement does not work very well: it’s not dependable and usually has problems with properly copying data from the master to the slave. Instead, use a utility such as mysqldump to copy the data on the master and then transfer the resulting file to the slave, as described in detail in the tutorial section at the start of this chapter.

Name

LOAD TABLE...FROM MASTER

Synopsis

LOAD TABLE table FROM MASTER

This statement has been deprecated and will be removed from future releases of MySQL because it has many problems. It was meant to copy a MyISAM table from the master server to a slave server. The user for the connection must have RELOAD and SUPER privileges as well as SELECTprivileges for the table on the master server. The user must also have CREATE and DROP privileges on the slave server.

Instead of using this statement, use a utility such as mysqldump to copy the data from the master. This method is described in detail in the tutorial section at the start of this chapter.

Name

MASTER_POS_WAIT()

Synopsis

MASTER_POS_WAIT(log_filename, log_position[, timeout])

This function is useful to synchronize MySQL master and slave server logging. The function causes the master to wait until the slave server has read and applied all updates to the position (given in the second argument) in the master log (named in the first argument). You can specify a third argument to set the number of seconds the master will wait. A value of 0 or a negative amount is given to instruct the function not to time out and to keep trying.

The function returns the number of log entries that were made by the slave while the master was waiting. If all is set properly, you should receive these results rapidly. However, if there is an error, NULL is returned. NULL is also returned if the slave’s SQL thread is not started, if the slave’s master options are not set, or if the parameters given with this function are not correct. If you give the timeout parameter and the amount of time is exceeded, –1 is returned.

Name

PURGE MASTER LOGS

Synopsis

PURGE {MASTER|BINARY} LOGS {TO 'log_filename'|BEFORE 'date'}

This statement deletes the binary logs from a master server. The keywords MASTER and BINARY are synonymous and one is required for the statement. Log files are deleted sequentially from the starting log file to the one named with the TO clause, or up until (but not including) the date named with the BEFORE clause. Here is an example of each method:

PURGE MASTER LOGS TO 'log-bin.00110';

PURGE MASTER LOGS BEFORE '2004-11-03 07:00:00';

Before running this statement, it would be prudent to make a backup of the logs. Then use SHOW SLAVE STATUS on each slave to determine which logs the slaves are reading, and run SHOW BINARY LOGS on the master server to get a list of log files. The oldest log file in the list is the one that will be purged. If the slaves are current, they shouldn’t be reading this log file. If they still are, you might not want to purge it. If you find that your log files aren’t being rotated very often, you can set the system variable expire_logs_days to shorten the amount of time before new log files are created and old ones archived.

Name

RESET MASTER

Synopsis

RESET MASTER

This statement deletes all the binary log files on the master server. Binary log files are located in the directory indicated by the value of the --log-bin option of mysqld (see Chapter 15). The log files are typically named log-bin.n, where n is a six-digit numbering index. Use the SHOW MASTER LOGS statement to get a list of log files to be sure.

This statement will delete all of the master log files and begin numbering the new file at 000001. To get the slave servers in line with the reset master, run the RESET SLAVE statement. You can run the MASTER and SLAVE options together in a comma-separated list like so:

RESET MASTER, SLAVE;

This is a recommended method for ensuring consistency.

Name

RESET SLAVE

Synopsis

RESET SLAVE

Use this statement within or after the RESET MASTER statement that sets the binary logging index back to 1. This statement will delete the master.info file, the relay-log.info file, and all of the relay log files on the slave server. It will delete the relay log files regardless of whether the SQL thread has finished executing its contents. A new .info file will be created with the default startup values.

Name

SET GLOBAL SQL_SLAVE_SKIP_COUNTER

Synopsis

SET GLOBAL SQL_SLAVE_SKIP_COUNTER = number

This statement skips the given number of events from the master. It is used for fine-tuning a recovery. It returns an error if the slave thread is running. Here is an example:

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=10;

Name

SET SQL_LOG_BIN

Synopsis

SET SQL_LOG_BIN = {0|1}

This statement enables or disables binary logging of SQL statements for the current connection. It does not affect logging for the activities of other threads and is reset to the default value when the connection is closed. The statement requires SUPER privileges. A value of 0 disables binary logging; 1 enables it. Here is an example:

SET SQL_LOG_BIN = 0;

Name

SHOW BINLOG EVENTS

Synopsis

SHOW BINLOG EVENTS [IN 'log_filename']

[FROM position] [LIMIT [offset,] count]

This statement displays the events in a binary log file. Use the IN clause to specify a particular log file. If the IN clause is omitted, the current file is used. To obtain a list of binary log files, use the SHOW MASTER LOGS statement. Here is an example of how you can use this statement and typical results:

SHOW BINLOG EVENTS IN 'log-bin.000161'\G

*************************** 1. row ***************************

Log_name: log-bin.000161

Pos: 4

Event_type: Start

Server_id: 1

Orig_log_pos: 4

Info: Server ver: 4.1.7-standard-log, Binlog ver: 3

1 row in set (0.00 sec)

This log file has only one row of data because the SQL statement was run shortly after the server was started. For a larger log file recording many rows of events, the results take a long time and drain system resources significantly. To minimize this, you can focus and limit the results with theFROM and LIMIT clauses. In the results, notice the Pos label with a value of 4. In a large log file, that number might be in the thousands. The results displayed could be focused only on rows starting from a particular position in the log with the FROM clause. You can limit the number of rows of events displayed with the LIMIT clause. In the LIMIT clause, you can also set the starting point of the output based on the number of rows in the results set and limit them to a certain number of rows. Here is an example of both of these clauses:

SHOW BINLOG EVENTS IN 'log-bin.000160'

FROM 3869 LIMIT 2,1\G

*************************** 1. row ***************************

Log_name: log-bin.000160

Pos: 4002

Event_type: Intvar

Server_id: 1

Orig_log_pos: 4002

Info: INSERT_ID=5

In this example, the retrieval of log events is to begin from position 3869 as set by the FROM clause. The results set contains several rows, although only one is shown here. The display is limited to one row, starting from the third one in the results set per the LIMIT clause. The number of skipped records is the sum of the FROM argument and the first LIMIT argument.

As an alternative to using this statement when working with large binary log files, you might try using the mysqlbinlog utility and redirecting the results to a text file that you can read in a text editor when it’s finished. Besides, this utility will provide you more information than SHOW BINLOG EVENTS.

Name

SHOW BINARY LOGS

Synopsis

SHOW BINARY LOGS

This statement displays a list of binary logs created by the master MySQL server in the filesystem directory. It’s synonymous with SHOW MASTER LOGS. To delete logs, see the description of the PURGE MASTER LOGS statement earlier in this chapter. For information on enabling logs, see Chapter 16. Here is an example:

SHOW BINARY LOGS;

Name

SHOW MASTER LOGS

Synopsis

SHOW MASTER LOGS

This statement displays a list of binary logs created by the master MySQL server in the filesystem directory. It’s synonymous with SHOW BINARY LOGS. To delete logs, see the description of the PURGE MASTER LOGS statement earlier in this chapter. For information on enabling logs, see Chapter 16.

Name

SHOW MASTER STATUS

Synopsis

SHOW MASTER STATUS

This statement displays information on the status of the binary log file that is being used currently on the master MySQL server:

SHOW MASTER STATUS;

+----------------+----------+--------------+------------------+

+----------------+----------+--------------+------------------+

| log-bin.000141 | 1123 | | |

+----------------+----------+--------------+------------------+

Name

SHOW SLAVE HOSTS

Synopsis

SHOW SLAVE HOSTS

This statement displays a list of slave servers for the master server. Slaves must be started with the --report-host=slave option in order to be shown. Here is an example:

SHOW SLAVE HOSTS;

+------------+-----------+------+-----------+

+------------+-----------+------+-----------+

| 2 | slave2 | 3306 | 1 |

| 3 | slave3 | 3306 | 1 |

+------------+-----------+------+-----------+

Four fields are in the results:

Server_id

The server identification number for the slave server, which is set by the --server-id option (preferably in the slave’s options file).

Host

The hostname of the slave server, which is set by the --report-host option on the slave.

Port

The port on which the slave is listening for replication. This defaults to 3306, but can be set with the CHANGE MASTER TO statement, described earlier in this chapter.

Master_id

The server identification number of the master. It’s set on the master with --server-id and conversely on the slave with the CHANGE MASTER TO statement.

Name

SHOW SLAVE STATUS

Synopsis

SHOW SLAVE STATUS

This statement displays information on the slave thread. Here is an example of this statement and its results:

SHOW SLAVE STATUS\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: localhost

Master_User: root

Master_Port: 3306

Connect_Retry: 5

Master_Log_File: log-bin.000154

Read_Master_Log_Pos: 159

Relay_Log_File: log-relay-bin.154

Relay_Log_Pos: 694

Relay_Master_Log_File: log-bin.154

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 159

Relay_Log_Space: 694

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: Yes

Master_SSL_CA_File: ssl_ca.dat

Master_SSL_CA_Path: /data/mysql/ssl_ca

Master_SSL_Cert: ssl_cert.dat

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 3

You can set some of these values at startup with the MySQL server daemon (mysqld). See Chapter 15 for more information on setting server variables at startup. You can also set some of these variables with the SET statement, and you can adjust others for particular tables with the ALTER TABLE statement. You can reset some of the log file variables with the RESET MASTER and RESET SLAVE statements.

Name

START SLAVE

Synopsis

START SLAVE [IO_THREAD|SQL_THREAD]

START SLAVE [SQL_THREAD]

UNTIL MASTER_LOG_FILE = 'log_filename', MASTER_LOG_POS = position

START SLAVE [SQL_THREAD]

UNTIL RELAY_LOG_FILE = 'log_filename', RELAY_LOG_POS = position

Use this statement to start a slave server. In the first syntax, you can start just the I/O thread or just the SQL thread by using the respective keyword. You can start both by listing both keywords, separated by a comma. The default is to start both. The I/O thread reads SQL queries from the master server and records them in the relay log file. The SQL thread reads the relay log file and then executes the SQL statements. See the Replication Process” section earlier in this chapter for details.

The second syntax limits the reading of the threads to a specific point, given with MASTER_LOG_POS, in the master log file named with the MASTER_LOG_FILE parameter. The UNTIL clause stops processing of the given log files when the given position is reached. The third syntax specifies the relay log file and limits its reading and execution. If the SQL_THREAD keyword is given in either the second or third syntaxes, the reading will be limited to the SQL thread.

The starting of a slave thread isn’t always dependable. Run the SHOW SLAVE STATUS statement to confirm that the thread began and remained running.

Name

STOP SLAVE

Synopsis

STOP SLAVE [IO_THREAD|SQL_THREAD]

This statement stops the slave server threads. To stop a specific type of slave thread, specify one or both of the thread types. Both may be given in a comma-separated list. The default is to stop both. The statement requires SUPER privileges. You can start slave threads with the START SLAVEstatement.

Replication States

In order to be able to monitor replication effectively, you need to know and understand the various states that the master and slave can occupy. Server states can be displayed by using the SHOW PROCESSLIST statement on the master and the slave. At least one line of the results will be related to the replication activities for the user account associated with replication. Following the examples of this chapter, the account is replicant on the master and system user on the slave. In the Command column, on the master the value will be Binlog Dump, meaning a binary log thread; on the slave the value will be Connect. The results will also contain a field called State, in which the state of the thread will be given. Here is an example from a slave:

SHOW PROCESSLIST \G

*************************** 1. row ***************************

Id: 16

User: system user

Host:

db: NULL

Command: Connect

Time: 119255

State: Waiting for master to send event

Info: NULL

These results show only one thread, the I/O thread waiting on the master. If the server were processing entries from the master’s binary log, there would probably be another row shown in the results for the SQL thread. What follows is a list of all of the possible server states that you may see on master and slave servers, along with descriptions of each. In addition to understanding these traits, you may want to develop a script to check that replication is running on the slave and not stalled and to notify you if it’s not running. Replication on MySQL is very stable, but if it does stop, it’s very quiet about it. Fortunately, it’s very good about rapidly catching up once you restart it.

Master BinLog Dump Thread States

Here is an overview of master server replication states that can be reported for binary log threads (Binlog Dump):

Has sent all binlog to slave; waiting for binlog to be updated

This is the most common status message you should see for a slave connection on the master. In this state, the master is basically doing nothing regarding replication. It has sent the slave all entries requested and is now waiting for another event to occur that will cause its binary log to be updated. Notice that it says it is waiting for the binary log to be updated. It doesn’t say it’s waiting for the databases to be updated. That’s handled by a different component of MySQL. The thread lives only to provide information about the binary log to the slave.

Sending binlog event to slave

After the binary log has been updated, the master informs the slave that one or more new entries have been made. If the slave requests the entries, the master enters this state, indicating that it is in the process of sending a slave entries or information on pertinent database events. There are obviously other states in between, but they are so fast and short-lived that they are not registered and therefore will not show up in the results of SHOW PROCESSLIST.

Finished reading one binlog; switching to next binlog

If a slave has been offline for a while, the master may have flushed its logs in the interim. Whenever the master does this, it will start a new log file, saving the previous ones. When a slave requests log entries that span more than one log file as the master switches from one file to the next, it enters this state.

Waiting to finalize termination

Once the master has completed the process of updating a slave, the master shows this status as it’s closing the binary log file and winding down the communication with the slave. When it is finished, the master will return to the first thread state (Has sent all binlog to slave; waiting for binlog to be updated) in which it is waiting for more changes to the binary log.

Slave I/O Thread States

Here is a list of replication states that can be found on the slave server for I/O threads:

Connecting to master

This state indicates that the slave I/O thread is attempting to connect to the master. If it can’t connect, it may stay in this state for a while as it retries.

Checking master version

After the slave connects to the master, it compares versions of MySQL with the master to ensure compatibility. This is very quick.

Registering slave on master

After the slave connects to the master, it registers itself with the master as a replication slave server. During this process, it will be in this state. On the master side of the connection, the Binlog Dump state will be Has sent all binlog to slave; waiting for binlog to be updated, as described previously.

Requesting binlog dump

When the slave has been informed of changes to the master binary log, it enters this state to request the new entries. Also, when it first connects to a server—either for the first time or after having been disconnected for a while—it enters this state briefly to request all entries since the last master binary log position that it gives the master. If no changes have occurred, none are returned. If there are new entries, the entries starting from the position given until the end of the master’s binary log will be transmitted to the slave. On the master side, you will see the state Sending binlog event to slave as a result of the request.

Waiting to reconnect after a failed binlog dump request

If the request for new entries mentioned in the previous state fails to be received from the master, the slave enters this state as it waits to be able to connect to the master periodically. This timeout period is configured using the --master-connect-retry and defaults to 60 seconds. The number of retries it will make can be found in the master.info file shown earlier in this chapter. Each time the slave attempts to reconnect, it will enter the next state.

Reconnecting after a failed binlog dump request

If the slave failed to stay connected to the master while trying to retrieve entries to the master’s binary log (as mentioned in the previous state description), this state indicates that the slave is trying to reconnect. If it fails again, it will go back to the previous state and wait to retry. By default, it will try 60 times before stopping. You can change the number of retries with the --master-connect-retry option.

Waiting for master to send event

This state is the most common that you will see, unless your server is very busy. The SQL thread is currently connected to the master and is waiting for the master to send it binary log updates. If there is no activity after a while, the connection will time out. The number of seconds that will elapse before timeout is reached can be found in the variable slave_net_timeout (previously slave_read_timeout). A timeout is the same as a lost connection for the slave. Therefore, it will become active and attempt to reconnect to the master, then inquire about any changes to the master’s binary log, before entering this state again.

Queueing master event to the relay log

This state occurs when the slave I/O thread has received changes to the master’s binary log from the master and is writing the SQL statements and the related information to the slave’s relay log. Once it’s done, the slave’s SQL thread will read the relay log and execute the new SQL statements written to the log. On the SQL thread, this is the Reading event from the relay log state described in the next section.

Waiting to reconnect after a failed master event read

If the connection to the slave failed while reading an event (represented by an entry in the master’s binary log), the slave will wait in this state for a certain amount of time before attempting to reconnect to the master. The number of seconds that the slave will wait before retrying is found in the master-connect-retry variable on the slave. When the slave attempts to reconnect, it enters the next state.

Reconnecting after a failed master event read

This state occurs after the slave I/O thread loses its connection to the master while receiving an entry from the master binary log.

Waiting for the slave SQL thread to free enough relay log space

If the SQL thread isn’t processing the entries in the relay log fast enough, and the backlog has caused the relay log files to become too large, the I/O thread will enter this state. In this state, it’s waiting for the SQL thread to process enough of the entries in the relay log so that the I/O thread can delete some of the older content of the log. The maximum amount of space allocated for the relay log files is found in the relay_log_space_limit variable. The slave SQL thread automatically deletes relay log files. The FLUSH LOGS statement, though, causes the slave to rotate log files and to consider deleting old files.

Waiting for slave mutex on exit

When the I/O thread has been terminated, it enters this state as it closes. The term mutex stands for mutual exclusion. The SQL thread gets the mutex to prevent any other slave replication activities so that replication can be shut down without loss of data or file corruption.

Slave SQL Thread States

Here is a list of replication states that can be found on the slave server for SQL threads:

Has read all relay log; waiting for the slave I/O thread to update it

Because replication is so fast, you will usually see the slave’s SQL thread in this state unless you have a very busy database system with data constantly being updated. This state indicates that the slave’s SQL thread has read all of the entries in its relay log and has executed all of the SQL statements that it contains. It has no further updates to make to its databases and is waiting for the slave’s I/O thread to add more entries to the relay log file. As mentioned in the similar state for the master, each thread acts somewhat independently and focuses only on the activities of its purview. Messages related to each thread’s state reflect this.

Reading event from the relay log

When an entry has been made to the relay log by the slave’s I/O thread, the slave’s SQL thread enters this state. In this state it is reading the current relay log file and is executing the new SQL statements that it contains. Basically, the SQL thread is busy updating the slave’s databases.

Waiting for slave mutex on exit

When the SQL thread has finished updating the slave’s databases,it enters this state while it’s closing the relay log file and terminating communications with the slave server. The SQL thread gets the mutex to prevent any other slave replication activities so that replication can be shut down without loss of data or file corruption. This is a very minimal state. However, if there is a problem with closing the relay log file or ending the activities of the slave server, this state is displayed so that you know the thread is locked. This could be caused by a table or log file being corrupted. If you see this state, you may want to run myisamchk or a similar utility, or the REPAIR TABLE statement on the tables that accessed at the time of the lockup. You’ll have to look in the relay log file and the error log file on the slave to determine which tables might need checking.