Best Practices - MySQL Troubleshooting (2012)

MySQL Troubleshooting (2012)

Chapter 7. Best Practices

This chapter summarizes some best practices that can help you troubleshoot MySQL effectively and safely. They aren’t troubleshooting methods or tools in themselves, but they can dramatically affect troubleshooting. These practices were discussed throughout the book but are combined here to emphasize their value.

Backups

Many of the procedures in this book can make changes to databases, which is why I’ve encouraged you to run tests in a sandbox. You can create your own disaster, however, by making such changes in a production database. Backups can save you in such cases.

A backup is a state to which you can return at any step. If something is damaged during tests or an application failure, you can restore all but the most recent changes to data. Backups are also useful because you can load data from them into a sandbox for testing. If you make a backup at a convenient time, you don’t have to wait until load is low on the main server and you can take a new snapshot of a running instance. Without interrupting normal operation, you can just copy the backup to the sandbox and start testing.

Of course, a backup is useful only if it’s recent. A backup made a month ago of an application with extensive write operations won’t help much. So it’s a good idea to make periodical full backups and frequent incremental backups. It’s also good to have the binary log turned on so that you have all the changes made at the time a problem occurred.

The books High Performance MySQL by Baron Schwartz et al. and MySQL High Availability by Charles Bell et al. (both published by O’Reilly) describe how to make backups, including descriptions of when, why, and how, and the available tools to do the backups. The following subsections just give you some considerations to keep in mind when planning backups.

Planning Backups

When planning backups, think about how to do it so that you’ll be able to do a full restore at any time. For instance, you shouldn’t rely on just a replication slave as your backup. The slave can fall far behind its master, and thus have outdated data. The slave can also contain different data from the master, particularly with statement-based replication. We described reasons why this happens at Chapter 5. Therefore, don’t rely on a slave as your only backup solution.

I prefer to do weekly full backups along with daily incremental backups and to keep all binary logs. Of course, you can change the schedule based on the actual write load. I just don’t recommend leaving a really large interval between backups, because in such a case you would be at risk of losing a lot of data due to a hardware failure. Plan reasonably.

Whenever you restore data, restore from the latest full backup first, then apply the incremental backups made after the full backup in order, if they exist, and finally load any remaining changes from the binary logs.

Types of Backups

This section discusses full and incremental backups. The third element of backing up, saving binary logs, is simply this: don’t delete the logs that were created after the most recent backup, and copy them to a safe place periodically if the time lag between backups is large.

Backups can be grouped along several different dimensions:

By format

Logical

Saves a dump of structure and data. Although this kind of backup is slow, it can be very useful because its files can be read by humans and edited manually.

Physical

Saves the binary files, which is usually fast. This kind of backup is very important if you cannot repeat a problem because the table is somehow corrupted. In such cases it can make sense to make binary copy of a single table and move it to the test server.

By interaction with the MySQL server

Online

Taken when the MySQL server is running

Offline

Taken when the MySQL server is stopped

By interaction with the MySQL server objects

Cold

Taken when no operation is allowed for the MySQL server. The server should be either stopped or blocked from modifying its own files. The advantage of such backups is that they are extremely fast.

Warm

Taken when the MySQL server is running, and prohibiting only a few operations on the objects being backed up. It is not always possible to have a consistent backup if parallel threads use database objects, so all backup methods use some kind of intelligence to keep objects being backed up secure. This method involves write locking only the objects currently being backed up, while allowing other connections to modify other objects. Read access is usually also allowed during this kind of backup.

Hot

Taken when the MySQL server is running and all operations on backed-up objects are allowed. This is the fastest method among online backups.

By content

Full

Backs up all objects

Incremental

Backs up changes made after a particular time, usually the time of a previous backup

Partial

Copies only specific objects, e.g., a few tables from a database

Tools

I won’t describe all the backup tools available for MySQL, but instead describe just a few that illustrate the types of backups shown in the previous section. You are free to use your favorite, regardless of whether it’s on this list.

mysqldump

Comes with MySQL distributions and makes logical backups.

MySQL Enterprise Backup (MEB)

A separate product available for Enterprise customers. It can create hot backups of InnoDB tables, warm backups of other storage engines, and cold backups.

Percona XtraBackup

An open source product with functionality similar to MEB. It can create hot backups of XtraDB and InnoDB tables and warm backups of other storage engines.

cp

The basic Unix shell command that copies files. You can use it to create cold offline backups.

Filesystem snapshot tools, such as LVM

Creates a snapshot of file system. Should be used when the MySQL server is either stopped or prevented from writing to its own files.

Table 7-1 matches various types of backups with the tools that provide them.

Table 7-1. Backup types supported by the tools discussed in this section

Backup type/tool

mysqldump

MEB

XtraBackup

cp

LVM

LOGICAL

YES

NO

NO

NO

NO

PHYSICAL

NO

YES

YES

YES

YES

ONLINE

YES

YES

YES

NO

NO

OFFLINE

NO

YES

NO

YES

YES

COLD

NO

YES

NO

YES

YES

WARM

YES

YES

YES

NO

NO

HOT

NO

YES

YES

NO

NO

FULL

YES

YES

YES

YES

YES

INCREMENTAL

NO

YES

YES

NO

NO

PARTIAL

YES

YES

YES

YES

NO

Using this table, you can determine which tool is most suited to your needs. If you decide to run a solution that does not include incremental backups, do incremental backups through binary logs. For more detail and best practices related to backups, consult the books mentioned earlier inBackups.

Gathering the Information You Need

Information is the key to successful troubleshooting. In addition to using it for your own troubleshooting, it is critical when you open a support case. So don’t ignore the sources and instruments that collect reports about problems, such as the error log.

That said, you can’t log everything all the time. Logging puts a burden on your server. So you need to find a balance between the information that needs to be saved all the time and the information you can ignore until a real problem happens.

I recommend you always turn on instruments that do not need a lot of resources, such as the error logfile and operating system logs. Logging options that require light resources, such as MEM without the query analyzer, can be running constantly too. It is not thrifty to try to shave a little load off your server by turning off these instruments and giving up the help they will offer during a critical problem.

With regard to reporting tools that can decrease performance noticeably, be prepared to use them, but turn them off until a problem fires up. Use of these instruments can be implemented as an option in the application.

What Does It All Mean?

Even gigabytes of information are useless if you don’t understand it. So read the error messages, suggestions from MEM, and so on. If you don’t understand them, refer to the MySQL Reference Manual, books, blogs, forums, or other sources of information.

A search engine is a good companion. Just typing an error message into a search field usually turns up lots of links containing information about how other users solved a similar problem. In most cases, you are not alone.

Of course, you can always ask questions on public forums and IRC. And finally, you can buy support.

Testing

After you develop a hypothesis about what is going on and how to solve your problem, test the hypothesis and consider your results.

In my job, I meet users who are afraid of testing. There are two main reasons that people reject it: overconfidence (the user thinks that a correct guess does not require testing) and diffidence (the user is afraid of breaking something else).

But even very experienced users make mistakes, so it is risky to just rely on a guess. Always check the result of any change you make. What is quickly noticed can be quickly fixed.

Don’t be shy! Just consider this: if you need to troubleshoot something, you are already experiencing a problem. If one or another test makes the problem worse, at least the test spares you from making a major change that would take even longer to fix. A mistake made during a test can be rolled back in a few seconds.

Further, discovering that a guess is wrong is valuable because you narrow the search area. Fewer options are left, so there’s more chance that your next test will prove correct.

If you don’t want to take any risk of harming your production setup, test in a sandbox. I recommend this especially when a test can change data, but it’s also useful when you simply need to modify some configuration option. Just create a repeatable test case and run it in the sandbox.

And don’t be lazy! Laziness can be a good thing when you are finding the best and shortest way to solve a problem, but it can play a bad joke on you when testing.

Being too lazy to copy the original tables to a sandbox can lead to many ineffective attempts to reproduce your problem when it cannot be reproduced with the inadequate test data. Even worse, you may end up implementing the wrong solution. So if the problem is not reproducible with a small test data set, just copy the full tables to the sandbox and experiment on them there.

Prevention

The best approach to troubleshooting is to prevent the situation in the first place. The only way to eliminate all possible problems is to completely block access to the MySQL server, which means it wouldn’t do any useful work. So we can speak only about compromises.

Privileges

One important aspect of prevention is privileges. Although many test cases in this book have used the root user, this is acceptable only in test environments where you can’t harm anything. When running on a production server, each person should have as few privileges as possible. Ideally, a user would have only the privileges for those objects she is going to use. For example, if a web application only selects data from a few databases, don’t give write access and read access to the mysql database to the web application’s user account. If you need to do a maintenance task on that database, create a separate user for it.

Such precautions will protect you from bad interventions and can even mitigate against SQL injection attacks. A successful attack will be limited in the damage it can cause.

Environment

Another important aspect is the environment for running the server: the MySQL server options and the external environment in which it runs.

When adjusting options, analyze their effects. Start with only the options whose effects you are sure of, then add others one by one and analyze how they change things. In such a scenario, if something goes wrong, you will be able to restore an acceptable work environment in a few seconds.

When planning a MySQL installation, analyze how the environment will affect it. Don’t expect the MySQL server to run fast without issues when the hardware and operating system are overloaded by other processes or suffer from a corrupted disk. Plan correspondingly, and check the hardware at the first sign of problems.

When planning replication or some other complicated environment, try to diagnose in advance how the setup can affect database activities. A simple example is the differences in query behavior when row versus statement binary logging is used. Always analyze and plan correspondingly.

Think About It!

I want to finish this book by stressing the importance of reasoning. The ability to find the causes of problems comes not only from practice, but also from the skill of analyzing problems.

If you encounter a problem, I recommend you think it over thoroughly before choosing the troubleshooting techniques you believe are best suited to that particular issue.

Choosing a good action path at the first step will save you much time. And a thorough diagnosis can prevent not only the issue you noticed right away, but also other future problems stemming from the same cause.