Backing Up and Restoring Databases - Administration and Beyond - Learning MySQL and MariaDB (2015)

Learning MySQL and MariaDB (2015)

Part V. Administration and Beyond

Chapter 14. Backing Up and Restoring Databases

A database is often the culmination of the work of many people, sometimes thousands of people. The organization creating the database employs developers and administrators. Then there are people who contribute content, and who may be employees or members of the organization. But much of the content of a database can come from other people, such as clients, and unknown people providing content through a website. The amount of data can be enormous. It’s not unusual for even a small site to accumulate thousands of rows of data. A large site could easily have millions of rows of data. All of this content — all of this work from hundreds or thousands of people — can be lost easily, through something as simple as the failure of a hard drive on the server. Because of this, it’s essential to make backups regularly and correctly: too many and too much depend on it.

If you’re going to be a database administrator, you will need to understand how to make backups and restore them. You will need to develop a plan of what will be backed up, as well as when and where. In addition, you will need to check occasionally that backups are not failing. You shouldn’t wait until you need to restore data to find that the backups haven’t been working. And you will need practice restoring backups so that you will be ready when you need to quickly restore them. We will cover all of this in this chapter.

Making Backups

One of the best utilities you can use to make backup copies of data in MySQL or MariaDB is mysqldump. It’s included with both servers and it costs you nothing. You probably already have it installed on your server. Best of all, it doesn’t require you to shut down MySQL services to make a backup, although you might restrict access to the backup utility for better consistency of data. There are other backup utilities (e.g., MySQL Enterprise Backup and Percona XtraBackup), some with a GUI and some that are more comprehensive. You can learn about other types of backups and tools in the book MySQL Troubleshooting (O’Reilly) by Sveta Smirnova. However, mysqldump is the most popular one, and as a new administrator, you should know how to use it, even if you later will use one of the commercial releases. We will use this utility for the examples in this chapter.

The mysqldump utility works very simply: it queries the server for the schema and data of each database and table and exports all of this to a text file. The default text file it creates, which is known as a dump file, includes the SQL statements necessary to reconstruct the databases and data. If you were to open a dump file generated by mysqldump, you would see CREATE TABLE statements and a multitude of INSERT statements. That may seem cumbersome, but it’s simple and manageable.

The mysqldump utility offers many options. You can make a backup of all of the databases, or only specific ones. You can also back up just specific tables. In this section, we’ll look at many of the available options and go through some examples of combinations for common uses.

Backing Up All Databases

The simplest way to make a backup is to dump all of the databases with all of the tables and their data. You can do this easily with mysqldump. Try executing something like the following at the command line on your server, using the administrative user you created in Chapter 13. You’ll have to change the path given from /data/backups/, to a path on your server. Or you can omit it and the dump file will be created in the current directory:

mysqldump --user=admin_backup \

--password --lock-all-tables

--all-databases > /data/backups/all-dbs.sql

The options used here include the following:

--user=admin_backup

Tells the utility to act as the user named admin_backup when interacting with the MySQL server. I showed how to create this user in Restricting the Access of User Accounts, so create a special user with the right privileges now if you have not already done so. Although you might be tempted to use the root user for backups, you should always use a special administrative user, as we’re doing here. The user just needs the proper permissions to lock tables and read data from all the databases and tables.

--password

Tells the utility that the user needs to be prompted for a password, which will have to be typed in on the next line when asked. This acts the same way as the mysql client. If the backup is to be executed by cron through a shell script, this option can be changed to --password=my_pwd, wheremy_pwd is the password. That means, though, that the password will be in crontab in plain text. This is a good example of why you shouldn’t use the root user.

--lock-all-tables

Makes MySQL lock all of the tables before performing the backup. The lock won’t be released until the process is finished. For a busy database with many users, locking all of the tables for a lengthy period of time can create problems. We’ll look at alternatives in a bit.

--all-databases

Specifies that all of the databases are to be exported. In the next subsection, in which we will backup only some databases, we’ll replace this option with another so that we may specify the databases to backup.

The greater-than sign in the command line shown here is a shell redirect of the standard output (STDOUT) to the path and filename given after it. Set the path and filenames to suit your system and preferences.

The resulting dump file will generally contain separate INSERT statements for each row or each table. To bundle INSERT statements into one statement for each table in the dump file, include the --extended-insert option. This will make a smaller dump file. Additionally, the combined INSERTstatements will execute faster when you have to restore a database. If your server generates extended inserts in a dump file by default, but you prefer them as separate statements, use the --skip-extended-insert option.

The INSERT statements don’t include the column names — it just lists the values in the same order as the columns. If you want the column names included, though, you would add the --complete-insert option.

NOTE

You can put the options in any order after the mysqldump command. You just have to put any values you want to pass to an option immediately after it. The only other order requirement is the final piece, the shell redirect — but that’s actually a shell operator and isn’t part of the mysqldump command. Basically, the ordering of options is very much like any command.

MySQL utilities used to offer shorter, single-hyphen options, such as -u for --user. But the short names are being deprecated and may not be available in the future.

TIP

When making backups of InnoDB or other transactional tables with mysqldump, it’s best to include the --single-transaction option. This will keep the data more consistent. It won’t change between the tables until the dump is finished. However, that option will cancel the --lock-tables option. This means that a backup of MyISAM tables in the same database could be inconsistent. You can avoid this potential problem by either using the same storage engine for all of the tables in a database, or making separate backups of InnoDB tables and MyISAM tables.

Backing up all of the databases at once with mysqldump may result in one large dump file. For smaller databases and as part of a regular routine, this is fine and managable. However, for larger databases, this method could take much longer to complete the backup, disrupting traffic while tables are locked, and later it may make restoration bothersome. Instead, you can construct a more adept backup method. For instance, it might be useful to perform a separate backup for each large database, leaving several smaller dump files. You could also back up larger and more active databases during slower traffic times so that you don’t diminish database and web services. We’ll discuss later how to specify which databases to back up and some backup strategies. For now, let’s take some time to become familiar with dump files.

WARNING

There’s a security concern about making backups of all of the databases, as it could include the user table in the mysql database. This table contains usernames and passwords. You can eliminate it from a backup by adding --ignore-table=mysql.user to the mysqldump at the command line when creating the dump file. To make a backup occasionally of just the mysql.user, though, you might use a different user account for the backup and write the dump files to a protected directory or somewhere safe.

Understanding Dump Files

After the mysqldump command in the previous section has finished running, use a simple text editor to open the dump file that it generated. Scroll through the file to examine the contents. You’ll notice quite a few things: the utility annotates the dump file, sets certain variables, then lists CREATE DATABASE, CREATE TABLE, and many INSERT statements. Let’s review a few of those entries so you’ll have a better understanding of dump files. This will be useful later when you need to restore a database.

First, let’s look at the header. Here’s an example of the first few lines in a dump file generated by mysqldump using the settings from the previous example:

-- MySQL dump 10.14 Distrib 5.5.39-MariaDB, for Linux (i686)

--

-- Host: localhost Database: rookery

-- ------------------------------------------------------

-- Server version 5.5.39-MariaDB

The first line of the dump file lists the version of mysqldump that was used and the distribution of MySQL, or in this case, MariaDB, and on which operating system the command was executed. Next, we see that the dump was executed while logged into the server, from the local host. On the same line, we find the name of the first database to be backed up. The next line, after some dashes for nicer formatting, is the version number of MariaDB — that was given in the line showing the distribution, but here it’s more clearly listed.

Next in the dump file come a batch of SET statements that look something like Example 14-1.

Example 14-1. Conditional SET commands in dump file

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;

/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;

/*!40101 SET NAMES utf8 */;

/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;

/*!40103 SET TIME_ZONE='+00:00' */;

/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;

/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;

/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;

/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

The way these SET statements are enclosed between /* and */, they may seem to be comments that won’t be processed. However, they’re SQL statements or tokens that will will executed conditionally based on the version of MySQL or MariaDB that is installed on the server. That’s why the lines start with /*! and not just /*. Within the dump file, comment lines are prefaced instead with --.

You can reduce the size of the dump file by including one or more of the following options when running mysqldump:

--skip-add-drop-table

Leave out DROP TABLE statements that clean up old tables.

--skip-add-locks

Dump without first locking each table.

--skip-comments

Suppress comments in the file.

--skip-disable-keys

Suppress commands that manipulate the indexes in the tables.

--skip-set-charset

Suppress SET NAMES statements that control the character set in use.

--compact

Use all of the previous options in this list.

Some of the options in the preceding list have potentially risky consequences. For instance, if you don’t set the character set, you may end up with the wrong one, and if you don’t lock the tables while the server is running, it could make changes while you’re dumping and end up with an inconsistent table in the backup.

Because a dump file may be used to copy databases from one server to another, and not just for backup and recovery on the same server, the conditional statements are used to check that the server for which the SQL statements in the dump file will be executed. This is necessary so that there won’t be any problems when starting to execute the SQL statements that create tables and insert data. When the dump file is executed, it will restore or re-create the databases and tables exactly as they were at the time of the dump.

Let’s look back at the first SET command:

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;

This line starts by specifying that the command will be executed only if the version of MySQL or MariaDB is at least 4.01.01. mysqldump makes sure in this way that it won’t try to invoke a feature on old versions of databases that don’t support the feature. It’s assumed that once a feature is supported, all future versions of the server will continue to support it. The SQL statement that follows saves the current value of the CHARACTER_SET_CLIENT global variable. If you look back at Example 14-1, you’ll see that the subsequent lines save CHARACTER_SET_RESULTS andCOLLATION_CONNECTION as well. The fourth line then sets all three variables to utf8 with NAMES — that’s an abbreviation for these three variables.

If you skip to the very end of the dump file, you’ll see a similar batch of SQL statements that look like this:

/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;

/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;

/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;

/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;

/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2014-09-14 6:13:40

These conditional SQL statements reverse the first batch of conditional SQL statements. They use the variables that were created at the start to set the global variables back to their old settings. You’ll see many conditional statements like these throughout the dump file. This resetting of key characteristics makes it important to lock tables when restoring a dump file, so that the results of such SET statements won’t affect any data changes that users might make during the restoration of a database.

Let’s go back to the start of the dump file and look at the lines that follow the initial conditional SQL statements. You should see something like this:

--

-- Current Database: `rookery`

--

CREATE DATABASE /*!32312 IF NOT EXISTS*/ `rookery`

/*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_bin */;

USE `rookery`;

The first three lines present a header comment so that when you review the dump file, you will know that this is the start of the section related to the rookery database. The first SQL statement, reasonably enough, is a CREATE DATABASE statement. It can look a bit confusing because it contains a couple of conditional components, which are related to the version of MySQL or MariaDB on which the statement will later be executed. Let’s look at one of those components.

In this SQL statement, IF NOT EXISTS will be executed if the server is running at least version 3.23.12 of MySQL. That’s quite an old version of MySQL, but this option was introduced in that version and release of MySQL and hasn’t changed since. It’s unlikely that a server anywhere in the world is still using such an early version, but this is the nature of mysqldump, to be ready for any conflict. More important is the option itself. If the rookery database already exists, it won’t be created with this CREATE DATABASE statement and it won’t be overwritten. Incidentally, if you want to create a dump file without CREATE DATABASE and without CREATE TABLE statements, you can add the --no-create-info option when running mysqldump.

The last SQL statement in the previous snippet switches the default database to use to rookery. You may wonder why the utility uses the USE statement instead of just including the database name in the subsequent SQL statements (e.g., it doesn’t have statements like, INSERT INTO `rookery`.`bird_families`...). That would seem to me more dependable of a method, but the method used has an advantage. When executing a dump table, if you want to create a new database on the same server, but with all of the tables and data the same, you can simply edit the USEstatement in the dump file and change the database name (e.g., change rookery to rookery_backup) in one place. Then the original will be preserved and you’ll have an identical copy. We’ll talk more about this later. Let’s look at what’s next in the dump file.

The next section of the dump file deals with the first table of the rookery database. As the following excerpt shows, it’s the table structure of the bird_families table:

--

-- Table structure for table `bird_families`

--

DROP TABLE IF EXISTS `bird_families`;

/*!40101 SET @saved_cs_client = @@character_set_client */;

/*!40101 SET character_set_client = utf8 */;

CREATE TABLE `bird_families` (

`family_id` int(11) NOT NULL AUTO_INCREMENT,

`scientific_name` varchar(100) COLLATE latin1_bin DEFAULT NULL,

`brief_description` varchar(255) COLLATE latin1_bin DEFAULT NULL,

`order_id` int(11) DEFAULT NULL,

PRIMARY KEY (`family_id`),

UNIQUE KEY `scientific_name` (`scientific_name`)

) ENGINE=MyISAM AUTO_INCREMENT=334 DEFAULT CHARSET=latin1 COLLATE=latin1_bin;

/*!40101 SET character_set_client = @saved_cs_client */;

The first SQL statement here may concern you. It should. It’s a DROP TABLE statement that will delete the bird_families table. No data ought to be lost because the following SQL lines will re-create the table and insert data into it from the time the dump file was created. However, if there have been changes to the data in the bird_families table since the dump file was created, those changes will be lost when the table is restored to its previous state. For such a situation, there are other methods you can resort to besides the bulk clobbering of tables. One method uses the suggestion made previously to alter the USE statement to point all schema and data statements to a different, temporary database. Then you can attempt to merge the old and new data together. Depending on the situation, you might be able to do this by changing the INSERT to a REPLACE statement. Another method would be to remove the DROP TABLE statement and change the name of CREATE TABLE statement that follows to create a new table name. We’ll cover such techniques later in this chapter in Restoring Backups.

The IF EXISTS option ensures that a restore will drop the table only if it exist. If this statement was omitted, a restore would probably try to run the statement when the table didn’t exist, and thus generate an error that could abort the restore.

After the DROP TABLE statement, there are more conditional SQL statements for variables related to the table and the client. These are followed by the CREATE TABLE statement, which matches the results of a SHOW CREATE TABLE statement for the table. This section ends by returning the variable changed to its previous setting.

Now the bird_families table is ready for the data. The next set of entries in the dump file are:

--

-- Dumping data for table `bird_families`

--

LOCK TABLES `bird_families` WRITE;

/*!40000 ALTER TABLE `bird_families` DISABLE KEYS */;

INSERT INTO `bird_families` VALUES

...

/*!40000 ALTER TABLE `bird_families` ENABLE KEYS */;

UNLOCK TABLES;

After the comment appears a LOCK TABLES statement to lock the bird_families table. It includes the WRITE option so that the data in the table cannot be changed during the restoration of the table. Users can’t read the table either. Another thought may have occurred to you now: mysqldump is write-locking tables one at a time, as needed. That may be what you want, making other tables available for reading and writing when they’re not being dumped. However, this may cause a problem with the consistency of the data.

For example, suppose during backup is at the point where it has preserved the contents of the humans table but not the bird_sightings table in the birdwatchers database. At this point, you decided to delete someone from the humans table along with entries in the bird_sightings table for that person. After that, mysqldump backs up the bird_sightings table. If you were later to restore the entire birdwatchers database, you would have an entries in the bird_sightings table for a person who isn’t listed in the humans table.

If a database isn’t very active, the previous scenario is unlikely. However, if you want to be assured of the consistency of your data, when executing the mysqldump utility, you could add the --lock-tables option. This locks all tables in a database before backing it up, and leaves them locked until the backup of the database is completed. When making a backup of multiple databases, this option still locks only the tables in one database at a time, releasing them before starting the next database. If you’re concerned about consistency between databases — that is to say, if data in one database depends on data in another database — use the --lock-all-tables option to lock all of the tables in all of the databases until the dump is completed.

In the previous excerpt, the LOCK TABLES statement is followed by a conditional statement (i.e., ALTER TABLE...DISABLE KEYS) to alter the bird_families table so as to disable the keys. This can save time when the table is restored. When the INSERT statement that follows — truncated in the example to save space — is executed, data will be inserted much faster if MySQL doesn’t have to index all of the data as it’s inserted. Instead, another ALTER TABLE statement will be executed conditionally to enable the keys again. When that occurs, the table will be indexed. This method uses a special algorithm that is generally much faster when performed for the entire table at once, rather than when each row is inserted.

TIP

Conditional components like DISABLE KEYS are included if the --disable-keys option is set by default on the server. If you don’t see them in the dump files created by mysqldump, it isn’t set by default on your system. It can be added when mysqldump is executed at the command line, or it can be added to the MySQL configuration file under the [mysqldump] heading.

The last line of the previous excerpt issues an UNLOCK TABLES statement to unlock the tables that were locked at the start of this section of the dump file.

In summary, the basic pattern for each table is to establish the table structure and then address the data. To establish the table structure, the dump file generally contains SQL statements to drop the table, set related temporary variables, re-create the table, and then restore the variables. To deal with the data when it re-creates the table, it locks the table, disables the keys, inserts all of the data, and then re-enables the keys and unlocks the table. This pattern is repeated for each table in the database. When the command has finished dumping all of the tables in the database, it will proceed to the next database, and continue until it has finished all of the databases, because in this example it was instructed to make a backup of all of the databases.

The contents of a dump file created by mysqldump can vary depending on the version of the utility and the default settings. It also can vary depending on the databases it’s dumping and what instructions are given with the options at the command line. However, this review of an example of a dump file should give you a good sense of how to read one. Let’s return now to making backups with mysqldump.

Backing Up Specific Databases

Before we concerned ourselves so much with the contents of the dump file, we were experimenting with making backups, learning how to back up all databases on the server. However, you may want to export only one database, or only specific ones. Let’s see how to do that.

To export only one database and not all, instead of using the --all-databases option, use the --databases option followed by the name of the database. Try making a back-up of just the rookery database by entering the following on your server from the command line:

mysqldump --user=admin_backup --password --lock-tables \

--verbose --databases rookery > rookery.sql

This is basically the same as the example that dumped all of the databases, except that we’ve specified the database to be exported, rookery. As mentioned before, you may want to make separate backups of databases to reduce the load on a busy server and to make restoration more manageable. Incidentally, if for some reason you want to make a backup of a database’s schema without the data, you can use the --no-data option. The command would then dump only the database and table schemas and not the rows of data.

You may have noticed in the previous example that we added the --verbose option. This option instructs the utility to display messages regarding each major step in the process of querying the database and creating the dump file. For our database, running this command produces messages like this:

-- Connecting to localhost...

-- Retrieving table structure for table bird_families...

-- Sending SELECT query...

-- Retrieving rows...

-- Retrieving table structure for table bird_images...

...

-- Disconnecting from localhost...

Sometimes these messages can be useful, especially when there are problems, to know which tables are dumped successfully and when problems occur.

To export multiple databases, just enter them after the --databases option, separated by spaces — not commas as you might think. Try executing the following on your server to back up the rookery and the birdwatchers databases:

mysqldump --user=admin_backup --password --lock-tables \

--databases rookery birdwatchers > rookery-birdwatchers.sql

This will dump the rookery and the birdwatchers databases into one file named rookery-birdwatchers.sql. Because those two databases are related and there aren’t any other databases associated with them, this can be useful. We can copy this line into crontab or some other scheduling utility on the server to run automatically each day. However, each command that runs will overwrite the dump file from the previous day. If something happens and data is deleted accidentally, but we don’t discover it for a few days, we won’t be able to restore that data from the backup. To allow for this possibility, we need to create a new dump file each day with a unique name so we don’t overwrite the previous dump files. Unless we intend to initiate the backups manually, we need to be creative and automate the process. We can accomplish this twist with a shell script.

Creating Backup Scripts

To automate many aspects of making backups of databases, it’s useful to create a set of scripts that will execute the mysqldump for the databases you want with the settings that you prefer. It’s not too difficult to do this. You don’t need to be very advanced in programming if you want to do only a few simple things, such as varying the output slightly each time.

Let’s use the problem presented at the end of the previous section for an example back-up script. The solution is to change the name of the dump file each day to include the current date so that there will a unique dump file for each day. Here’s an example of a very simple shell script that may be run on a Linux or Mac system to do this:

#!/bin/sh

my_user='admin_back'

my_pwd='my_silly_password'

db1='rookery'

db2='birdwatchers'

date_today=$(date +%Y-%m-%d)

backup_dir='/data/backup/'

dump_file=$db1-$db2-$date_today'.sql'

/usr/bin/mysqldump --user=$my_usr --password=$my_pwd --lock-tables \

--databases $db1 $db2 > $backup_dir$dump_file

exit

This script will execute the mysqldump with the same options as in our previous example. It starts by setting variables with the username, password, and the names of the databases. It then uses the date command to get the numerical values for the year, month, and day and saves them with dashes in another variable (date_today). It uses the variables for the database names (i.e., $db1 and $db2), combined with $date_today to assemble the name of the dump file (e.g., rookery-birdwatchers-2014-10-25.sql). All of these variables are then used in the mysqldump command.

Because the username and password are included in the script, it can be run automatically and daily by cron without user intervention. It will create a dump file with a new name every day. This script is by no means flawless and definitely not in good form. It doesn’t allow for errors. If the backup fails, it doesn’t notify the administrator that there was a problem. It also doesn’t address older backup files. A good script could remove the older dump files after a certain amount of time. Of course, having an automated script delete files can be a little disturbing. This script is provided only to give you an idea and starting point for constructing your own backup scripts. The ones that you create and use should be much more complex and allow for many possibilities, handle errors, and provide some sort of reporting.

Backing Up Specific Tables

For very large and active databases, you may want to back up the data for individual tables rather than the whole database. You could back up the entire database weekly, perhaps and then do daily backups for tables whose data changes often. For most databases, developing a strategy like this can be prudent.

Take our two databases. The data in the rookery tables will rarely change: new species of birds aren’t discovered daily, and bird families and orders are rarely changed. Once we have all of the details for each bird in each table entered, there will hardly be any changes. Conversely, if our site is very active, almost all of the tables in the birdwatchers database will have new rows and changes frequently, so we would want to back up all of its tables every day. A reasonable strategy, then, is to back up the whole rookery database once a week and all of the birdwatchers database each day.

Still, suppose our boss is overly concerned about losing any data entered by our members. Suppose he insists that we make a backup of the humans table twice a day, once at noon and again at midnight. We could write a shell script like the one in previous section to vary the filenames to include the date and just add a bit more to indicate the time during the day when the dump was made (e.g., birdwatchers-humans-2014-09-14-midday.sql and birdwatchers-humans-2014-09-14-midnight.sql). The only other change is to create a mysqldump command to back up just one table,humans. Try executing the following on your server from the command line:

mysqldump --user=admin_backup --password --lock-tables \

--databases birdwatchers --tables humans > birdwatchers-humans.sql

This is similar to the previous examples, but with the addition of the --tables option followed by the table name. If you want to make a backup for more than one table in the same database, you would just list them after the --tables option, each table name separated by a space. But this example is wordier than necessary. Because we’re backing up tables in only one database, we don’t need the --databases option. We also don’t need the --tables because mysqldump assumes that any nonreserved words after the database name are the names of tables. So the previous example can be entered like this:

mysqldump --user=admin_backup --password --lock-tables \

birdwatchers humans > birdwatchers-humans.sql

Although this command is simpler, the previous one makes it easier to discern what is a database name and what is a table name.

Let’s add another table to the example here, but from another database. Suppose that our boss wants us also to backup the birds table in the rookery database. This possibility is not allowed with mysqldump: you can’t list two databases with the --tables option. You would have to runmysqldump twice. This would create two dump files. If you want one dump file containing both tables, you could do something like this:

mysqldump --user=admin_backup --password --lock-tables \

--databases rookery --tables birds > birds-humans.sql

mysqldump --user=admin_backup --password --lock-tables \

--databases birdwatchers --tables humans >> birds-humans.sql

Here we’re executing mysqldump twice, but the second time we’re setting the redirect (i.e., >>) to append to the dump file instead of creating a fresh one. The dump file will have a comment in the middle of it saying that the dump is completed and then another starting header for the second dump. Because those are just comments, they will have no effect if you use the combined dump file to restore the two tables. Nor will modifying variables twice using SET during the execution of the combined dump file. So it’s fine to append to a dump file like this.

The mysqldump utility is easy to use and very powerful. We’ve touched on many options that may be used with it. However, there are many more options. You can find these on-line on the MySQL and MariaDB websites or in my book, MySQL in a Nutshell (O’Reilly).

One of the problem with dump files, though, is that you can clobber your databases when you use them to restore data if you’re not careful. Therefore, you should practice restoring dump files on a test database or a test server. Do this often so that you will be comfortable with making and restoring backups. Don’t wait until you’ve lost data and feel panic to restore it, because you might make unrecoverable errors or even find out that you haven’t been backing up your data properly. Develop these skills in advance and in a safe and controlled way. To learn how to restore dump files, see the next section on restoring data from backups.

Restoring Backups

If data is lost in MySQL, but you’ve been using mysqldump to make regular backups of the data, you can use the dump files to restore the data. This is the point of the back-ups, after all. Restoring a dump file made with mysqldump is just a matter of using the mysql client to execute all of the SQL statements contained in the dump file. You can restore all of the databases, a single database, individual tables, or even specific rows of data. We’ll cover all of these in this section.

Restoring a Database

Let’s look at how to restore an entire database. To be safe, as part of experimenting, we’ll make a fresh backup of the rookery database and then restore it. Execute the following from the command line on your server:

mysqldump --user=admin_backup --password --lock-tables \

--databases rookery > rookery.sql

Before proceeding, check the contents of the dump file. Make sure it contains the SQL statements for restoring the rookery database. If everything looks OK, delete the rookery database from the server. This may seem scary, but you just made a good back-up. There will come a time when a database is deleted or corrupted unintentionally. So it’s better to develop confidence in your ability to restore a database with a test database like rookery. To get rid of the database, you can execute the following from the command line:

mysql --user=admin_maintenance --password --execute "DROP DATABASE rookery;"

Here we’re using the mysql client at the command line to execute the DROP DATABASE statement. You could have done this from within the mysql client, though. It’s done here on the command line with the --execute option. You’ll have to specify an administrative user that has privileges to drop a database. Here we’re using the admin_restore user we created in the previous chapter. After you’ve dropped the rookery database, execute SHOW DATABASES statement with the mysql client to confirm that rookery has been deleted.

We’re now ready to restore the rookery database. To do this, execute the following from the command line:

mysql --user=admin_restore --password < rookery.sql

This uses the mysql client from the command line to execute the SQL statements contained in the rookery.sql dump file. Notice that we’re using a less-than sign, the redirect for the standard input (STDIN) in the shell, to tell mysql to extract the contents of the dump file as an input source. The command will create the rookery database and all of its tables and insert all of the data into those tables. Log into MySQL, switch to the rookery database, and execute the SHOW TABLES statement to see that all of the tables are there. Execute a few SELECT statements to see that the data is there. It’s important to do this so that you’ll feel more confident about your ability to restore a database.

Restoring a Table

The problem with restoring from a dump file of a whole database is that you may overwrite tables that you wish you hadn’t. For instance, suppose a table was dropped by accident and you want to restore it. The other tables in the database may be fine. If the latest dump file is several hours old and the other tables have been changed since the last update, you wouldn’t want to overwrite those tables. That would delete any new rows or updates since the dump file was created. If you have a backup strategy of making backups of tables separately, restoring one table would be simple. But that might be cumbersome to maintain. There are, however, a few ways of limiting a restoration to one table using a dump file that contains an entire database. Let’s look at those methods.

Modifying a dump file

As we saw in Understanding Dump Files, a database dump file is a simple text file containing SQL statements to create a database and then separate sections that restore each table, including its data. One way to restore a table from a database dump file is to modify the dump file. You could eliminate all of the SQL statements except the ones needed to restore the table you want.

Suppose you have a dump file that contains only the rookery database and you need to restore the conservation_status table because some of the data has been deleted or changed by mistake. You can make a copy of the rookery.sql dump file, open the copy with a plain-text editor, and delete the sections that create the other tables. Leave in the opening and closing lines that set the variables, as well as the section for the conservation_status table. A similar method would be to open the dump file in a text editor and then copy and paste the parts you need into a new text document: the opening and closing lines and the section for the conservation_status table. Either of these methods would result in the same dump file that you could use to restore the table.

Here is an example of how such a trimmed dump file might look:

-- MySQL dump 10.14 Distrib 5.5.39-MariaDB, for Linux (i686)

--

-- Host: localhost Database: rookery

-- ------------------------------------------------------

-- Server version 5.5.39-MariaDB

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;

/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;

/*!40101 SET NAMES utf8 */;

/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;

/*!40103 SET TIME_ZONE='+00:00' */;

/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;

/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS,FOREIGN_KEY...=0*/;

/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;

/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--

-- Current Database: `rookery`

--

CREATE DATABASE /*!32312 IF NOT EXISTS*/ `rookery`

/*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_bin */;

USE `rookery`;

-- [ snip ]

--

-- Table structure for table `conservation_status`

--

DROP TABLE IF EXISTS `conservation_status`;

/*!40101 SET @saved_cs_client = @@character_set_client */;

/*!40101 SET character_set_client = utf8 */;

CREATE TABLE `conservation_status` (

`conservation_status_id` int(11) NOT NULL AUTO_INCREMENT,

`conservation_category` char(10) COLLATE latin1_bin DEFAULT NULL,

`conservation_state` char(25) COLLATE latin1_bin DEFAULT NULL,

PRIMARY KEY (`conservation_status_id`)

) ENGINE=MyISAM AUTO_INCREMENT=10

DEFAULT CHARSET=latin1 COLLATE=latin1_bin;

/*!40101 SET character_set_client = @saved_cs_client */;

--

-- Dumping data for table `conservation_status`

--

LOCK TABLES `conservation_status` WRITE;

/*!40000 ALTER TABLE `conservation_status` DISABLE KEYS */;

INSERT INTO `conservation_status` VALUES

(1,'Extinct','Extinct'),

(2,'Extinct','Extinct in Wild'),

(3,'Threatened','Critically Endangered'),

(4,'Threatened','Endangered'),

(5,'Threatened','Vulnerable'),

(6,'Lower Risk','Conservation Dependent'),

(7,'Lower Risk','Near Threatened'),

(8,'Lower Risk','Least Concern'),

(9,NULL,'Unknown');

/*!40000 ALTER TABLE `conservation_status` ENABLE KEYS */;

UNLOCK TABLES;

-- [ snip ]

/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;

/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;

/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;

/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;

/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2014-09-15 6:48:27

This dump file will restore the conservation_status table. I added a couple of comment lines with [ snip ] to indicate that this is where I cut lines of text from the original dump file. I also added some hard returns so that the lines would fit on the printed page. Otherwise, this is exactly the way a dump file would look if we had backed up only the conservation_status table.

This method works, but it can be tedious and you might accidentally delete a line you shouldn’t or include a line you shouldn’t. Other methods to restore just one table are covered in the next sections.

Restoring with a temporary database

Another way to restore a single table from a dump file that contains a database with many tables is simply to change the name of the database in the dump file. The dump file generally contains a CREATE DATABASE statement. If you change the name of the database to a unique name that’s not already used on the server, a new database will be created on the server when the dump file is run. Then you can copy the table you want from this temporary database within MySQL to the original database. When you’re finished, you can delete the temporary database. Let’s look at an example.

Returning to the previous scenario, suppose that you have a dump file containing the rookery database, from which you need to restore only the conservation_status table. So that you can participate, if you don’t have a current dump file of rookery, use mysqldump to make one.

First, run SHOW DATABASES on the server to see the names of the database so that you don’t by chance give the temporary database a name that’s already in use. Next, open the dump file in a text editor and look for the lines near the top that creates the database. Edit that section to change the name of the database. Here’s how that section of the dump file might look after you edit it:

--

...

-- Current Database: `rookery`

--

CREATE DATABASE /*!32312 IF NOT EXISTS*/ `rookery_backup`

/*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_bin */;

USE `rookery_backup`;

...

In this excerpt, you can see that I changed the name of rookery to rookery_backup in two places: the CREATE DATABASE statement and the USE statement. That’s all that you need to change. You can save the dump file now and execute it. Using an administrative user that has the CREATE privilege,enter something like this from the command line:

mysql --user=admin_restore --password < rookery.sql

Once you’ve executed this, there should be a new database called rookery_backup. Log into MySQL through the mysql client and set the default database to rookery_backup. Run the SHOW TABLES statement and a couple of SELECT statements. You’ll see that the tables and data are all there. Now you’re ready to restore the table you need.

There are a couple of ways you can restore a table at this point. Let’s try both. First, let’s delete the conservation_status table in the rookery database. To do this, execute the following within the mysql client:

DROP TABLE rookery.conservation_status;

Now create a new conservation_status table in rookery. You can do this based on the backup copy by using a CREATE TABLE...LIKE statement, covered in Essential Changes. Enter the following on your server:

CREATE TABLE rookery.conservation_status

LIKE rookery_backup.conservation_status;

Next, you need to copy the data from the backup table to the newly created table. You can do that by entering this SQL statement on your server:

INSERT INTO rookery.conservation_status

SELECT * FROM rookery_backup.conservation_status;

The INSERT...SELECT syntax is covered in Other Possibilities. It will insert into the original database’s table all of the rows selected from the backup table. When that’s finished, execute a SELECT statement to see that all of the data is in the conservation_status table. If everything is fine, delete the temporary database by entering the following on your server:

DROP DATABASE rookery_backup;

This method of restoring a single table works nicely. For a large database, though, it could take a long time to temporarily import the entire database into MySQL. However, if you have a database this large, you should make backups based on tables or batches of tables to make restoration more manageable. This method requires CREATE and DROP privileges, which allow the user account to create new databases and drop them.

There is another method for restoring a single table that doesn’t require editing the dump file. That method is explained in the next section.

Using a limited user account

A simple way to restore only one table is to create a temporary user account that has only privileges for the table you want to restore. When you run the dump file, the SQL statements for other tables will fail and not be executed — only the table for which the user account has privileges will be restored. To create such a user account, you need the GRANT OPTION privilege. As root, you will have that privilege. Let’s go through the steps involved in this method, using the previous example in which we want to restore the conservation_status table.

WARNING

There is a risk in this method. If you’re not precise about what privileges you grant the user account, or if you restore data from the dump file inadvertently using the root user account instead of the limited user account, you will overwrite all of the databases that were backed up to the dump file. So be careful.

Before you start to restore your data, delete the conservation_status table and change some data in one of the other tables so that you can see how well this method works. You can run something like the following from the command line, using the admin_boss user account you should have created in the Chapter 13 exercises:

mysql --user=admin_boss --password \

--execute "DROP TABLE rookery.conservation_status;

INSERT INTO rookery.birds (common_name,description)

VALUES('Big Bird','Large yellow bird found in New York');

SELECT LAST_INSERT_ID();"

That should delete the conservation_status table. To test our restore, we’ve also added a row to the birds table, which we want to make sure has not been lost when we do our restore. The last statement returns the bird_id for the row inserted. Log into MySQL and verify that theconservation_status table has been deleted and use the SELECT statement to view the row inserted into birds, where the bird_id equals the number you were given when you executed the command. If everything looks as it should, you’re ready to proceed.

Now you need to create the limited administrative user. Enter the GRANT statement on your server like this:

GRANT SELECT

ON rookery.* TO 'admin_restore_temp'@'localhost'

IDENTIFIED BY 'its_pwd';

GRANT ALL ON rookery.conservation_status

TO 'admin_restore_temp'@'localhost';

These two SQL statements grant the temporary with the necessary SELECT privilege on all of the tables in the rookery database, and ALL privileges for the conservation_status table. When you restore the database dump file containing all of the tables in the rookery database, using theadmin_restore_temp user account, only conservation_status will be replaced.

When you execute the dump file with this user account, MySQL will generate errors when it tries to replace the other tables. Normally, that might stop execution of the dump file. To overlook the errors and to proceed with the restoration of data for tables for which no errors are generated, use the --force option with the mysql client.

Let’s restore the table now. Enter the following at the command line:

mysql --user admin_restore_temp --password --force < rookery.sql

This should work without a problem. To verify that the conservation_status table has been restored, log into MySQL and check. Then execute the SELECT statement again to see whether the row you entered for Big Bird from the command line in the birds table is still there. If it is, that means the birds table wasn’t overwritten when you restored the dump file. Everything else should be fine.

Restoring Only Rows or Columns

You’ll rarely need to restore an entire database or even an entire table. It’s not often that a database or a table is dropped, or that the data in all of the rows in a table are changed accidentally. It’s more common that someone deletes a single row in a table or data in a single column and can’t undo what they did. In such a situation, if the table has many other rows that were changed correctly since the last backup was made, you wouldn’t want to restore the whole table to fix one small mistake. Instead, you will want to restore only one row or column.

This can be done easily using the method covered in Restoring with a temporary database. That section described how to modify the dump file for the rookery database so that MySQL imports the database into a new, temporary database (rookery_backup). If you use that method, you can then use the INSERT...SELECT statement with a WHERE clause to select only the row or rows you want to restore. Let’s walk through this process.

Suppose that someone accidentally deleted one of the members (e.g., Lexi Hollar) and the email address of another member (e.g., Nina Smirnova) from the humans table in the birdwatchers table. To be able to follow along and to set the stage, make a backup of just the birdwatchers database, delete the entry for Lexi Hollar, and Nina Smirnova’s email address by executing the following from the command line:

mysqldump --user=admin_backup --password --lock-tables \

--databases birdwatchers > birdwatchers.sql

mysql --user=admin_maintenance --password \

--execute "DELETE FROM birdwatchers.humans

WHERE name_first = 'Lexi'

AND name_last = 'Hollar';

UPDATE birdwatchers.humans

SET email_address=''

WHERE name_first = 'Nina'

AND name_last = 'Smirnova'"

After executing this, log into MySQL to confirm there is no member with the name Lexi Hollar and no email address for Nina Smirnova in the humans table. You should do this even though you may be logically satisfied that these changes were made. It’s good to go through the motions to build more confidence in the restoration process.

Now let’s import the birdwatchers database into a temporary table. Edit the birdwatchers.sql dump file you just created and look for the SQL statements that reference the database — there should be only the CREATE DATABASE statement and the USE statement. Change the database name wherever it occurs to birdwatchers_backup, assuming that this name doesn’t already exist on your server. When you’ve done that, save the dump file and exit it. From the command line, execute the following to import it:

mysql --user=admin_maintenance --password < birdwatchers.sql

When you’ve finished importing the database, log into MySQL and run SHOW DATABASES to see that it has been created. Now you’re ready to restore the data in the humans table. Execute the following from within the mysql client:

REPLACE INTO birdwatchers.humans

SELECT * FROM birdwatchers_backup.humans

WHERE name_first = 'Lexi' AND name_last = 'Hollar';

UPDATE birdwatchers.humans

SET email_address = 'bella.nina@mail.ru'

WHERE name_first = 'Nina' AND name_last = 'Smirnova';

That will restore the row for the member that was deleted, restore the email address for the other member, and have no effect on the other rows or other tables in the database. You’ll notice I used the REPLACE statement instead of the INSERT statement. If MySQL finds a row that matches theWHERE clause and that has the same human_id, it will replace the row with the matching row from the backup table. Otherwise, it will insert a new row. Either way, it will restore the row with the same value for the human_id column. That means that any other tables that reference that row will have the correct human_id. Incidentally, if you want to generate a dump file that uses REPLACE instead of INSERT statements, you can do so using the --replace option with mysqldump.

When you’re finished, you can use the DROP DATABASE statement to remove the birdwatchers_backup database.

This method is very useful in restoring rows and columns, especially when you want to restore data to accommodate someone without disturbing other users. It doesn’t usually take long to do and it’s simple. You can restore rows based on whatever criteria you give in the WHERE clause. This is a skill you should learn well if you want to be a good database administrator: users will herald you as their hero when you recover data without much trouble or disruption.

Recovering from a Binary Log

In the previous few sections, we looked at how to restore databases and tables. Most of those are broad methods of restoring data. Sometimes you need more precision, as in the previous section, where we restored a single row and a single column. You would use that method when you have specific rows to restore and the lost data is contained in one of your dump files. However, suppose you want to restore data that was created some time after the last backup. This may sound impossible, but it just requires care and an understanding of MySQL’s binary log. You can use the binary logs to restore data that was created after the most recent dump file was created, up to a specific point in time. This is referred to as point-in-time recovery.

To do point-in-time recoveries, you will have to enable the binary logs. You can’t wait until you need them; you have to enable the binary logs before a problem occurs. To check that it’s enabled, execute the following from the mysql client:

SHOW BINARY LOGS;

ERROR 1381 (HY000): You are not using binary logging

If you get the error message shown here, you will need to enable binary logging.

WARNING

Enabling the binary log does add a security vulnerability. All of the SQL statements executed on the server that modify the data will be recorded in the binary log. This may include sensitive information (e.g., credit card numbers, if your server records them) and passwords. So be sure that you protect the log files and the directory where they are stored, and preferably don’t log changes to themysql table. That’s where passwords for user accounts are stored, so it’s good not to log it. Use the --binlog-ignore-db option to omit databases from the log.

To enable binary logs, edit the configuration file for MySQL (my.cnf or my.ini, depending on your system). In the [mysqld] section, add the following lines:

log-bin

binlog-ignore-db=mysql

The log-bin option requires no equals sign or value. The second line here tells MySQL to ignore any changes to the mysql database. When you’ve added these entries to the configuration file, restart MySQL for it to take effect. Once that’s done, log into MySQL and check again whether binary logs are enabled. This time, we’ll use the SHOW MASTER STATUS statement:

SHOW MASTER STATUS;

+---------------------------+----------+--------------+------------------+

| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |

+---------------------------+----------+--------------+------------------+

| mysqlresources-bin.000001 | 245 | | mysql |

+---------------------------+----------+--------------+------------------+

Here you can see the name of the current binary log file and verify that it’s ignoring changes to the mysql table.

Now that MySQL is recording all of the SQL statements in the binary log, point-in-time recovery is possible. To be able to experiment with this, log into MySQL and insert many rows of data into a table. To make this easier, you may download two dump files from the MySQL Resources sitecalled birds-simple.sql and birds-simple-transactions.sql. The birds-simple.sql dump file will add the birds_simple table with data to rookery. The birds-simple-transactions.sql file will insert many rows of data in birds_simple, change several rows with a single SQL statement — simulating an accident — and then insert more rows. For the example that follows, we will restore everything up until the offending SQL statement and all transactions after it — skipping the bad statements. To participate in the examples, download those two dump files and execute the following from the command line in the directory where you’ve placed them:

mysql --user=admin_maintenance --password --database=rookery < birds-simple.sql

mysql --user=root --password --silent \

--execute="SELECT COUNT(*) AS '' FROM rookery.birds_simple;"

If you didn’t get an error message, the second line should return the number of rows contained in the birds_simple table. It should be about 28,892. You may have noticed that I added the --database option, setting it to rookery. When I generated the dump file, I dumped only thebirds_simple table. As a result, the dump file does not contain a USE statement and the table name isn’t prefaced with rookery. So the SQL statements are not specific to any database. By adding it at the command line like I did here, you can make MySQL execute all SQL statements contained in the dump file in that database.

Let’s move on to messing with the birds_simple table. Process the birds-simple-transactions.sql file, which will add and delete many rows:

mysql --user=admin_maintenance --password \

--database=rookery < birds-simple-transactions.sql

mysql --user=root --password --silent \

--execute="SELECT COUNT(*) AS '' FROM rookery.birds_simple;"

The count of the number of rows should now be about 296 fewer. The birds-simple-transactions.sql dump file contains a couple of DELETE statements that delete a lot of rows based on the WHERE clause. There are also a couple of INSERT statements that add more rows to the same table.

Now we’re ready to go through the steps to restore based on a point in time. To restore everything to a specific point in time, we need to start from the last good backup. In this case, we’ll start by restoring the birds-simple.sql dump file:

mysql --user=admin_maintenance --password \

--database=rookery < birds-simple.sql

That should have restored the birds_simple back to where it was at the time that dump file was generated. If you want, log into MySQL and get a count of the number of rows in the birds_simple table. It should be back to 28,892.

The next step is to get the SQL statements that were executed on the server for the rookery database since the time of the dump file. That can be a bit of a bother to determine on a very active database. Therefore, if you intend to use mysqldump in conjunction with mysqlbinlog, you should have mysqldump flush the logs when it performs the backup. I did this when I created the birds-simple.sql dump file by including the --flush-logs option. So now we need to restore data from the beginning of the current log file to the point at which the DELETE statements were run. We can determine that point in time from the binary logs.

We’ll use the mysqlbinlog utility to extract all of the transactions from the current binary log and save them to a text file. We’ll then examine that text file to find the exact point in which the erroneous SQL statements were run.

Finding information in the binary log

To get the information, we need to know the name of the binary log file that contains these SQL statements, as well as where to find that log file. We’ll run the SHOW MASTER STATUS to get the filename. Its location will be the data directory, which we can determine by executing the SHOW VARIABLES statement. Enter both of those as you see here:

SHOW MASTER STATUS;

+---------------------------+----------+--------------+------------------+

| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |

+---------------------------+----------+--------------+------------------+

| mysqlresources-bin.000002 | 7388360 | | mysql |

+---------------------------+----------+--------------+------------------+

SHOW VARIABLES WHERE Variable_Name LIKE 'datadir';

+---------------+--------------+

| Variable_name | Value |

+---------------+--------------+

| datadir | /data/mysql/ |

+---------------+--------------+

The results from the first SQL statement show the name of the current binary log file (i.e., mysqlresources-bin.000002). The name changed since we last checked our server because mysqldump flushed the logs when the dump file was made. The results of the second SQL statement in the previous listing shows that the data directory is /data/mysql/. Check the contents of that directory to make sure that mysqlresources-bin.000002 is there. Assuming it is there, we’re now ready to extract the transactions we need from the binary log. Enter the following from the command line:

mysqlbinlog --database=rookery \

/data/mysql/mysqlresources-bin.000002 > recovery-research.txt

Here you can see that I’ve included the --database option to instruct mysqlbinlog to extract only transactions for the rookery database. If we didn’t do this, we would get transactions for other databases. On this particular server, there are over two dozen databases, some of them large and very active. To make restoration simpler and avoid overwriting data in other databases, it’s best to limit the results to only what is needed.

Next, we specify the path and name of the binary file. This is followed by a redirect to have the system write the results from mysqlbinlog to a text file (recovery-research.txt).

Extracting and executing information from the binary log

When mysqlbinlog has finished creating a text file for us, we’ll open the file with a simple text editor and search for the DELETE statements. Because we know that there were only two DELETE statements that occurred together, this will be easy to fix. Here’s an excerpt from the output of the binary log showing these two transactions:

# at 1258707

#140916 13:10:24 server id 1 end_log_pos 1258778

Query thread_id=382 exec_time=0 error_code=0

SET TIMESTAMP=1410887424/*!*/;

SET @@session.sql_mode=0/*!*/;

BEGIN

/*!*/;

# at 1258778

#140916 13:10:24 server id 1 end_log_pos 1258900

Query thread_id=382 exec_time=0 error_code=0

use `rookery`/*!*/;

SET TIMESTAMP=1410887424/*!*/;

DELETE FROM birds_simple WHERE common_name LIKE '%Blue%'

/*!*/;

# at 1258900

#140916 13:10:24 server id 1 end_log_pos 1258927 Xid = 45248

COMMIT/*!*/;

...

# at 1284668

#140916 13:10:28 server id 1 end_log_pos 1284739

Query thread_id=382 exec_time=0 error_code=0

SET TIMESTAMP=1410887428/*!*/;

SET @@session.sql_mode=0/*!*/;

BEGIN

/*!*/;

# at 1284739

#140916 13:10:28 server id 1 end_log_pos 1284862

Query thread_id=382 exec_time=0 error_code=0

SET TIMESTAMP=1410887428/*!*/;

DELETE FROM birds_simple WHERE common_name LIKE '%Green%'

/*!*/;

# at 1284862

#140916 13:10:28 server id 1 end_log_pos 1284889 Xid = 45553

COMMIT/*!*/;

This may seem very confusing, but it’s not too bad when you understand how binary log entries are organized and a few things about transactions.

Binary log entries always start with two comment lines for a header — comments start with a hash sign (i.e., #). The first comment line contains the position number of the entry after the word at. This is the number we need to restore to a specific point. The second comment line of the header provides the time of the entry and other information. A binary log entry ends with /*!*/;.

A transaction is a set of SQL statements that are executed together and are generally related. Transactions are used with transactional tables (e.g., InnoDB) and not non-transactional tables (e.g., MyISAM). Any SQL statements contained within a transaction can be undone or rolled back if they’re not yet committed. The binary log uses transactions so that when data is restored, it can be restored properly. This will make more sense as we look at the components of a transaction in the excerpt shown.

Transactions always start with a BEGIN statement and end generally with a COMMIT statement, which commits the SQL statements between the two — they can’t be rolled back or otherwise undone once they are committed. Near the start of the excerpt from the binary log, you can see a BEGINstatement, followed soon after by the first DELETE statement. Therefore, the DELETE is in the midst of a transaction.

The position number for the entry containing the first DELETE is 1258778. However, we need to go back to the entry containing the BEGIN before it so that we can get the whole transaction. Let’s look at the header for that entry:

# at 1258707

#140916 13:10:24 server id 1 end_log_pos 1258778 Query thread_id=382

The position number for that entry is 1258707. The date and time of the entry is 140916 13:10:24 (i.e., 2014 September 16 at 1:10 p.m. and 24 seconds). We now know the position number and time for the transaction that contains the first DELETE. You may notice that the same line has a number following end_log_pos. That’s the position number for the next log entry (1258778), which is the entry for the DELETE. Don’t let that confuse you. Position numbers are based on positions in the file; they’re not from an incremental counter.

We want to restore the binary log from the beginning until the start of the transaction containing the first DELETE, which means until position 1258707. We could edit the text file that we created with mysqlbinlog (i.e., recovery-research.txt) and delete the transactions that we don’t want, and then just execute the file with the mysql client. However, there’s an easier and better way to do this. We can have the mysqlbinlog export the transactions again, but have it stop just before position 1258707. To do this, enter the following at the command line:

mysqlbinlog --database=rookery --stop-position="1258707" \

/data/mysql/mysqlresources-bin.000002 |

mysql --user=admin_maintenance --password

This will extract the same log entries, starting from the beginning of the same binary log file, but stopping at the position we gave it.

At this point, we’ve restored all of the transactions up until the DELETE statements — but not including them. Now we need to restore all of the transactions starting from the transaction immediately after the transaction containing the second DELETE statement.

Looking at the binary log excerpt for the COMMIT for that transaction for the second DELETE statement, we see that the end_log_pos has a value of 1284889. That is the position of the start of the next transaction. We want to restore from that point forward. As for where we want to stop restoring, we don’t need to specify a position number for it. Instead, we’ll use the option --to-last-log to indicate that we want to install to the end of the log. This may be further than the end of the log file, if the logs have been flushed and more log files were added. Given these two factors, execute the following:

mysqlbinlog --database=rookery --start-position="1284889" --to-last-log \

/data/mysql/mysqlresources-bin.000002 |

mysql --user=admin_maintenance --password

This will restore all of the remaining log entries, but omitting the DELETE statements. This method is very precise in that it utilizes exact positions in the binary log for specific transactions. You may also perform a point-in-time recovery using starting and ending times. To do that, use the --start-datetime and --stop-datetime options with mysqlbinlog. Looking back at the binary log excerpts, you could do the following to accomplish the same point-in-time recovery that we made:

mysqlbinlog --database=rookery --stop-datetime="140916 13:10:24" \

/data/mysql/mysqlresources-bin.000002 |

mysql --user=admin_maintenance --password

mysqlbinlog --database=rookery --start-datetime="140916 13:10:29" --to-last-log \

/data/mysql/mysqlresources-bin.000002 |

mysql --user=admin_maintenance --password

Our first invocation of mysqlbinlog gives it the date and time we noted earlier for the stop point just before the first DELETE statement. Our second invocation specifies one second past the time of the transaction for the second DELETE statement as the start point for restoring data. This will work just fine, but using position numbers is more precise, because plenty can happen in a second.

NOTE

A similar method of making backups with the binary logs is to use MySQL replication. With replication, you would have another server, a slave that has been continuously reading the binary log of the main or master server. The slave can use the binary log entries to maintain an exact duplicate of the databases on the master. When you want to make a backup, you need only stop the slave from replicating the master and make a backup of the databases on the slave. When you’re finished, begin replicating again, and within seconds the slave is current again. This topic is beyond the scope of this book. However, my book MySQL Replication: An Administrator’s Guide to Replication in MySQL (A Silent Killdeer Publishing, 2010) explains replication and how to resolve problems with MySQL.

Developing a Backup Policy

Knowing how to make backups of databases and how to restore them is fine. But these skills will be of no use unless you put a system in place to make backups regularly and effectively. The value of backups is greatly diminished if you can’t restore them without clobbering databases in the process, without causing more loss of data, or if you can’t quickly restore them. To be effective as a database administrator, you should develop a backup policy and and adhere to it.

A backup policy should be in writing, even if it’s only for your use, and it should cover a variety of aspects of making backups and being able to restore them. You’ll have to develop your own unique policy according to your situation, based on the value of the databases, the sensitivity of the information, and other factors. For instance, if you have a database for your personal website, a database for which you earn nothing, that no one else depends upon, and one that you change rarely, your policy might be to make a complete backup once a week and keep backups for at least a month. However, if you’re the database administrator for a large site with millions of rows of data in many tables, a database that thousands of people use every day and your employer uses to store credit card numbers from transactions amounting to a significant amount of revenues, your backup policy will be much more elaborate. You will address security, the effect that making a backup has on user traffic, and how quickly data can be restored when needed. For our purposes, we’ll develop a backup policy that is somewhere in between these two extremes to give you a sense of what you should consider.

The first step is to take inventory of the databases and tables for which you’re responsible. Let’s use the two databases that we have been using for the examples throughout this book. However, so that the scenario is more meaningful, let’s suppose that a couple of years have passed and the bird-watchers website has attracted many more members. Based on that, I’ve arbitrarily increased the row counts for most of the tables, and eliminated temporary tables. Table 14-1 lists the tables, grouped by database and sorted alphabetically, along with an assessment of each table.

Table 14-1. Assessment of databases for backup policy

Table

Row Count

Changing

Active

Sensitive

rookery

bird_families

229

bird_images

8

bird_orders

32

birds

28,892

birds_bill_shapes

9

birds_body_shapes

14

birds_details

0

birds_habitats

12

birds_wing_shapes

6

habitat_codes

9

birdwatchers

bird_identification_tests

3,201

bird_sightings

12,435

birder_families

96

birding_events

42

birding_events_children

34

humans

1822

prize_winners

42

survey_answers

736

survey_questions

28

surveys

16

This list of tables for the two databases indicates a few factors that we’ve decided are important to the policy we’re developing: the number of rows in each table; whether a table changes often (i.e., its data changes or its schema is altered occasionally); if a table is generally active or the data is accessed often; and if it contains sensitive information. When you develop a backup policy, you may be concerned with other factors. However, for our example here, these concerns will dictate how and when we will backup these two databases.

We won’t bother making daily backups of the tables that rarely change. We will make backups of the active tables each day, running mysqldump when they are less in use. We will make backups of tables that contain sensitive information (e.g., personal information on members and their children) with a special user account and store them in a more secure directory. We will also make a full backup once a week and store those dump files in the same secure directory for the same reason.

With all of these concerns in mind, we can begin to formulate a schedule for making backups and where they should be located. Table 14-2 groups backups based on each database and then groups tables based on security and usage concerns. For each backup, there is a list of tables, if not all tables. The columns to the right in the table show whether a backup should be made daily or weekly, as well as which days of the week and at what time of the day. The table also indicates whether the backup should be made to a secure directory and whether a copy should be kept off site, in addition to on site.

Table 14-2. Backup schedule

Backup

Frequency

Days

Time

Secure

Off-Site

The first day of the week will be Monday. All times are in G.M.T. Backups containing sensitive information will be made by a special administrator and stored in a secure directory. Some backup files are also stored offsite.

rookery - full back-up

Weekly

First

8:00

No

Yes

all tables

(rookery-yyyy-mmm-dd.sql)

rookery - bird classification

Daily

Every

9:00

No

No

birds, bird_families, bird_orders

(rookery-class-yyyy-mmm-dd.sql)

birdwatchers - full back-up

Weekly

First

8:30

Yes

Yes

all tables

(birdwatchers-yyyy-mmm-dd.sql)

birdwatchers - people

Daily

Every

9:30

Yes

No

humans, birder_families, birding_events_children

(birdwatchers-people-yyyy-mmm-dd.sql)

birdwatchers - activities

Daily

Every

10:00

No

No

bird_sightings, birding_events, bird_identification_tests,

prize_winners, surveys, survey_answers, survey_questions

(birdwatchers-activities-yyyy-mmm-dd.sql)

Notice that the plan here is to do a full backup of each of the two databases once a week. You might want to put these backups into one dump file, but I prefer them separate. It makes it easier to restore one later.

The plan also calls for daily backups of the tables that change often, either in content or in structure. Because the other tables change rarely, there’s no need to make daily back-ups of them. However, because the other tables are so small, it’s not much of a problem to make backups of them each day as well. For some people, full backups every day is easiest and preferred. But if you have very large databases and security and performance concerns, full backups might not be the best choice. For this example, I want you to see alternative ways in which you might organize a backup schedule.

For the fictitious bird-watchers website, our database contains many members in Europe and the United States. Because bird-watching is a hobby for most people, most of our traffic will be in the evenings. The times here are all Greenwich Mean Time and in the morning. When it’s 8:00 a.m. in London, the time of our first backup, it will be midnight in San Francisco. Put another way, when it’s late at night for our members that are the furthest West, with the exception of a few we might have in the Pacific, we begin making our backups. This should be a slow traffic time for our databases.

We will keep all backups on site and on two separate servers. We’ll use cron to copy the dump file automatically to the second server across our internal network. Additionally, we will copy the weekly, full backups to a cloud server like DropBox or Google Drive in case there is a fire or some other catastrophe destroying our servers in the same building.

Now that we have a plan about what and when we will backup, we need a plan to check those backups to make sure they are being performed correctly (see Table 14-3). This will include not only looking to see whether the files are there, but trying to restore them. This has the added advantage of giving us practice restoring databases. As mentioned several times already, when there is an urgent situation in which you need to restore data, you need to be ready and know what to do. It’s difficult to become proficient in restoring data during a crisis.

Table 14-3. Backup verification schedule

Back-up

Verify

Restoration Tests

Retention

Database

Tables

Rows

Backups will be verified on a regular basis. For testing and practicing purposes, databases, tables, and rows will be restored regularly in a test environment.

rookery - full back-up

Weekly

Monthly

N/A

Semi-monthly

Two months

rookery - bird classification

Weekly

N/A

Semi-monthly

Semi-monthly

One month

birdwatchers - full back-up

Weekly

Monthly

N/A

Semi-monthly

Two months

birdwatchers - people

Weekly

N/A

Semi-monthly

Semi-monthly

One month

birdwatchers - activities

Weekly

N/A

Semi-monthly

Semi-monthly

One month

Let’s go through the verification plan in this schedule. Once a week we will inspect all of the dump files made for that week to ensure that the back-ups are being made and contain the tables that we want. To carry out this task, you could look to see whether the files are created and check the file sizes of each. You could also open each with a text editor to see whether it looks correct. You might also use the grep command to extract the table names used with the CREATE TABLE within the dump file. If you want to use grep, you could execute something like the following to get a list of tables the rookery.sql dump file would create if executed:

grep 'CREATE TABLE' rookery.sql | grep -oP '(?<=CREATE\ TABLE\ \`).*(?=\`)'

bird_families

bird_images

bird_orders

birdlife_list

birds

birds_bill_shapes

birds_body_shapes

birds_details

birds_habitats

birds_wing_shapes

conservation_status

habitat_codes

The next three columns of Table 14-3 are related to testing and practicing restoring data. Once a month, we will try to restore the databases made in the full backups. You could test this by restoring each database to a test server. Then you can execute queries on the live and the test server to compare the results. Just keep in mind that the data will be a little different on the live server.

The other backup dump files are based on tables. These tables change often or are large and critical to our bird-watchers site. So we’ll test restoring tables from these dump files twice a month. For all of the backups, we’ll try twice a month to restore individual rows. This is the type of restoration we will be most likely to do. It’s important that we know how to restore very specific data from all of our dump files. With this much practice, restoring a minor loss of data when needed won’t be much trouble for us.

The last column in the table has to do with retention: how long we will retain the dump files. Our plan is to keep the dump files for the full backups for two months and the ones for specific tables only one month. You might not want to keep them that long, or maybe you will want to keep them longer. Some people copy dump files to CDs for each month and then store them for years.

Tables 14-2 and 14-3 basically represent our backup policy. One table lists what we will back up, when, and where. The other lists when we will verify that the backups are performed successfully, when we will perform restoration drills, and how long we will retain the backups. There are other factors you could put into a backup policy and much more detail. However, this should give you a sense of one way you might develop a backup policy.

Summary

By now you probably understand that making backups is important. It can save you from plenty of problems and frustrations. Being skilled in restoring backups can make life as a database administrator easier, and help you to turn major problems into minor ones that can be resolved easily. Developing and adhering to a good backup policy ensures that all of your efforts and skills are brought together in an effective manner.

As mentioned near the start of this chapter, there are quite a few utilities that you can use to make backups of your data, as well as other methods (e.g., replication). Using mysqldump is the easiest and in some ways the best. As an administrator, you should know how to use it well and how to restore dump files. To that end, complete the exercises in the next section to get some practice.

Exercises

A few exercises follow to get you more familiar with making backups with mysqldump, as well as restoring them. You should try to complete all of the exercises. However, there are a couple that might be too advanced. If you’re having difficulty working through them, try again later when you are more experienced.

1. So that you won’t cause yourself problems with the other exercises here, make a couple of backups for the first exercise. Using the mysqldump utility, make a backup of all of the databases. Then make a backup of both the rookery and the birdwatchers databases in one dump file. Don’t use these two dump files for the remaining exercises. Keep them in case something goes wrong and you need to restore something.

2. Refer to the backup schedule in Table 14-2. It contains a list of backups to be made regularly. There are two full backups and three backups based on tables. Make all five backups in this schedule using mysqldump and name the dump files in accordance with the naming pattern shown in the table for each.

3. Write five simple shell scripts, each to make a backup using mysqldump for each of the backups listed in Table 14-2. Make it so that the names of the dump files that it creates conform automatically to the naming pattern based on the current date, as shown in Table 14-2. There is a script that can do this in Creating Backup Scripts. You can copy this script and modify it, or you can write your own using a scripting or programming language with which you’re comfortable.
After you write the five scripts, execute them and see whether they create and name the dump files correctly. If it won’t cause problems to your server, add lines to crontab or another scheduling utility to have the five scripts execute automatically, but at a time not long afterwards. Wait and see if they execute as scheduled. You can remove the entries from crontab after you’ve tried this.

4. Modify the scripts that you created in the previous exercise and have the scripts remove older dump files, ones that are older than the amount of time set in the retention column for each table in Table 14-3. Make copies of the first set of dump files you created with these scripts, but change the names so that the date part of the filenames are further back than the retention period. Make copies for dates that are one and two days within the retention period and dates that are one and two days outside of the retention period.
Run your scripts again to see whether they delete the dump files with the older names. You may have to try this a few times to get it right, so that the scripts delete the right dump files.

5. Log into MySQL and use the DROP TABLE statement to delete the birds_bill_shapes and birds_body_shapes tables.
Next, use the dump file you made in the second exercise here to restore these tables from the rookery.sql dump file. When you finish, log into MySQL to verify that they were restored and contain the data.

6. Log into MySQL and use the UPDATE statement to change the common_name in the birds table to NULL for any rows where the common_name contains the word Parrot. There should be about 185 rows.
Make a copy of the rookery.sql dump file. Name it rookery_temp.sql. Edit this new dump file to change the name of the database to rookery_temp. This method was described in Restoring Only Rows or Columns.
Next, use the rookery_temp.sql dump file to create the rookery_temp database on your server. When that’s done, restore the Parrot common names in rookery.birds from rookery_temp.birds using the UPDATE statement.

7. If you haven’t already, enable binary logging on your server as described in Recovering from a Binary Log. Remember to restart the server once you’ve set it to be enabled. Use mysqldump to make a backup of just the birds table in the rookery database. Be sure to include the --flush-logs option.
After you’ve enabled binary logging and made the backup of the table, log into MySQL and execute a DELETE statement to delete any birds with the word Gray. Then insert a few rows of data into the birds table. You can just make up values for the common_name column and leave the other columns blank.
Now use the dump file to restore the birds table. Using the point-in-time recovery method described in Recovering from a Binary Log, restore all of the transactions in the binary logs up until the DELETE statement that deleted the gray birds with mysqlbinlog. This will require you to find the position number in the binary log when the DELETE statement was executed.
Next, using the position number for the transaction immediately after the DELETE statement in the binary logs, restore the transactions from that point until the end of the binary logs.
Log into MySQL to see whether you were successful in restoring the data. When you’re done, remember to disable binary logging if you don’t want to continue logging transactions.