MySQL in the Cloud - Effective MySQL: Backup and Recovery (2012)

Effective MySQL: Backup and Recovery (2012)

Chapter 9. MySQL in the Cloud

“Everything fails all the time.”

Dr Werner Vogels, CTO of Amazon (http:/allthingsdistributed.com/)

The emergence of the cloud in recent years has seen a number of MySQL specific and MySQL like database solutions. These offerings are in addition to running a stock MySQL implementation in a virtualized environment. Amazon, HP, and Google currently provide MySQL cloud deployments that are based on using the core MySQL server. There are solutions including ScaleDB that use new MySQL storage engines to provide many cloud scalable features. Xeround and Clustrix are custom solutions that use the MySQL protocol for communication and ease of application integration; however, they have an entirely different underlying product solution. In this chapter we will be discussing the options that most closely represent a standard MySQL environment:

• Amazon Relational Database Service (RDS)

• Google Cloud SQL

• HP Cloud Database as a Service (DBaaS)

image

Amazon Relational Database Service (RDS)

AWS RDS is the most mature cloud based MySQL product. RDS provides a managed MySQL service offering for versions 5.1 and 5.5, including several point releases. This includes a web based console and command line APIs for creating and maintaining RDS instances. RDS supports two HA options, a master and read replicas topology via MySQL asynchronous replication and a Multi Availability Zone (AZ) deployment that provides a proprietary synchronous replication solution. The Multi AZ instance supports failover capability via internal management of the RDS instances; however, the second Multi AZ instance is not accessible for read load like a normal replication topology.

Some of the benefits of the managed RDS service include:

• Ability to enable automatic minor updates of MySQL software (using --auto-minor-version-upgrade=true option)

• Ability to upgrade or downgrade the size of the RDS instance without additional work (using rds-modify-db-instance --db-instance-class=<newsize> command)

The RDS implementation of MySQL has some restrictions including:

• No direct access to the MySQL configuration file (i.e. my. cnf). Access to change parameters is via the rds-modify-db-parameter-group command.

• Lack of SUPER privilege for any user.

• No access to read binary logs.

• No access to the MySQL error log.

See the blog post http://effectivemysql.com/article/setting-up-aws-rds/ for the necessary steps to set up and create a new RDS instance.

Example Database Creation

Using the example database from Chapter 8, the following data was added to RDS by restoring a mysqldump backup.

image

MySQL Versions

These current versions are presently available with RDS.

image

Backup Options

RDS enables the use of standard client tools to connect to MySQL. The mysql and mysqldump commands can be used when connecting to a remote host. MySQL Enterprise Backup and XtraBackup cannot be used with RDS.

mysqldump

As described in Chapter 2 and Chapter 8, the standard options for using mysqldump are possible providing you connect to the appropriate remote host. For example:

image

image

As you can see, an error occurred. This is due to a change in mysqldump between MySQL 5.1 and MySQL 5.5. In this example the MySQL server running the client command mysqldump is an older version than the RDS database server version.

CAUTION When using mysql dump to connect to a remote database host for backups, the mysqldump version must be at least the same version as the remote MySQL database.

Using a correct version produces:

image

This execution time includes the network overhead between an EC2 instance and an RDS instance. This is comparable with a standard EC2 instance running a local MySQL instance.

A more advanced option for excluding the mysql meta-schema (which will become relevant in the restore section) can be achieved by:

image

The SQL statement in this solution has a limitation on the length of the string output from the GROUP_CONCAT function. For MySQL instances with a large number of database Schemas this query will fail. This length restriction can be increased with the group_concat_max_lenconfiguration variable. Giuseppe Maxia provides an improvement with two more possible options; the first, however, is limited to the maximum length of an OS command. See http://datacharmer.blogspot.com/2012/04/few-hacks-to-simulate-mysqldump-ignore.html for more details. Within a day of posting a related blog post about this syntax, Mark Leith provided a code patch to the mysqldump command that implements an actual --ignore-database option. The beauty of the MySQL community and open source software is seen in these immediate responses. See more information at http://www.markleith.co.uk/?p=768 and MySQL bug #3228 at http://bugs.mysql.com/bug.php?id=3228.

TIP There is currently no --ignore-database option with mysqldump. The previously mentioned command provides one of several suitable alternatives.

Database Snapshot

RDS provides a native snapshot option to produce a consistent backup of MySQL data. This occurs while the database is online. It is unclear how locking occurs with MyISAM tables to ensure database consistency. Using the RDS CLI tools, you can produce a snapshot with:

image

NOTE Unlike most other commands that can be timed, this is an asynchronous process. You must poll the results via the rds-describe-db-snapshots command to determine when the backup is completed. There is no estimated percentage completion like an AWS EBSfilesystem snapshot. There is no execution time or size information available following the backup.

You can monitor the state of the snapshot creation with the following command:

image

image

Or interactively with the command:

image

This snapshot took approximately 20 minutes to complete for a 20GB RDS instance.

An RDS snapshot can also be scheduled with the RDS management tools using the rds-modify-db-instance command with the --preferred-backup-window and --backup-retention-period configuration options.

MySQL Binary Logs

RDS does not provide access to the binary logs. This means it is not possible to perform data analysis or auditing via the binary log. RDS does provide a capability to restore to a given point in time via the rds-restore-db-instance-to-point-in-time command within the last five minutes.

image

This information is not available with the standard rds-describe-db-instances command. The --show-long option is required.

You can use the MySQL status variables to determine that RDS uses binary logs and flushes these to disk every five minutes, and this infers a redundancy of binary logs at a filesystem level.

Recovery Options

The following steps describe how to restore an RDS backup.

SQL Dump

You can use the mysql command line tool to restore a mysqldump backup.

image

A verification of the data shows:

image

CAUTION Do not be fooled by a significant and appropriate restore time and a database that appears to contain a large amount of data (e.g., 3.6GB for the musicbrainz schema).

However, you should always check for any errors and perform a more in-depth validation to ensure your database has been completely restored. The average time to perform a restore, the total database size, and additional checks and balances are necessary in a production system. A review of the output file shows:

image

This error condition is due to the RDS implementation restricting user permissions. This type of error is unavoidable for a full mysqldump of all Schemas. A normal restoration on a more traditional MySQL database system can overcome this with applicable user privileges. In order to address this limitation with RDS, you need to use the -f option during the restore; however, this introduces other problems. This type of situation is not limited to RDS instances with the lack of non-adjustable user privileges, which some may state as an appropriate security feature.

image

image

With no actual error state and numerous error messages it is difficult to verify a successful restore. In addition, several errors occur when importing the MySQL sakila database due to stored procedures. These errors can be reproduced with:

image

An import of the second mysqldump file excluding the mysql meta-schema produced:

image

As described in Chapter 5, the modification of various configuration variables can improve the performance of a restore, for example, innodb_flush_log_at_trx_commit. You can also disable MySQL binary logging by setting the backup-retention-period value to 0. This, however, requires a MySQL restart to disable, and then to re-enable when completed.

RDS Snapshot

To perform a restoration for a given snapshot that can be identified by the rds-describe-db-snapshots command, use the following syntax. This will create a new RDS instance.

image

This restoration took approximately five minutes.

A restoration from a given snapshot cannot use the binary logs to perform a point in time recovery. In order to use this, you must use the rds-restore-db-instance-to-point-in-time command.

Point in Time Recovery

For the purposes of verifying a point in time recovery, two new Schemas were created at the given time interval of the schema name in the following example:

image

A point in time recovery was specified before the creation of the second schema with the --restore-time option. This value does not have to correspond with a specific five minute interval value. This can be anytime after an available snapshot and before or equal to the last restorable time.

image

This restore took approximately 11 minutes.

A verification of the restored Schemas shows the second schema does not exist as expected. In a production system a more detailed verification is necessary. This example is only used to highlight the syntax for the commands.

image

NOTE Be sure to remove RDS instances no longer used with the rds-delete-db-instance.

More information about RDS can be found at http://aws.amazon.com/ rds/.

image

Google Cloud SQL

Google provides a MySQL version 5.5 cloud offering with a few additional features and a few unsupported features. At publication of this book, this is available in a limited beta program. It is recommended that InnoDB is used for tables; however, it is possible to use MyISAM.

This offering automatically replicates data synchronously to multiple geographic regions to provide high data availability. There is currently no asynchronous option via traditional replication. With this synchronous functionality, Cloud SQL provides automatic failover with no data loss. Software upgrades and database management are automatically managed by the service.

Google Cloud SQL is tightly integrated with Google App Engine (GAE) using Java and Python. A web based SQL interface and custom CLI interface are available for access to run SQL statements. Any product that can communicate with a JDBC connection can connect to Google Cloud SQL.

See the blog post http://effectiveMySQL.com/article/setting-up-google-cloud-sql/ for the necessary steps to set up and create a new Google Cloud SQL instance.

Example Database Creation

Using the example database in Chapter 8, an uncompressed or compressed (via gzip) mysqldump file can be used to populate a Cloud SQL environment. In order to import any data you must first store the file on Google Cloud Storage. For example:

image

The only verification option is file size.

You can then use the Google APIs web console to import the given file. There is no CLI interface to initiate this process or determine the total execution time.

The web interface provides a log of the process after completion.

image

The import process took two hours and 22 minutes.

If there is a problem, the log will report an error but will not provide any additional information. For example:

image

CAUTION You should not import the mysql meta-schema, as this can cause problems including permissions for the root MySQL user and meta-table structure issues. Refer to the RDS section for a mysqldump option to exclude this schema.

The first step of the verification process can be confirmed with the Google provided SQL CLI tool:

image

Backup Options

Only one option exists to perform an independent backup from Google Cloud SQL.

mysqldump

A mysqldump export is possible. This can only be initiated from the Google APIs web console. You are not able to specify per-schema or per-table information or any other additional options at this time. The output file, compressed or uncompressed, is available in Google Cloud Storage when completed. The web console provides a log of the process start and completion.

image

This backup took 42 minutes.

You can verify and obtain the backup file for off site management with:

image

A scheduled backup feature is also available for regular daily backups.

Restore Options

As described in the example database creation, the restore of a mysqldump file generated by the export or scheduled process is possible. Google Cloud SQL does not currently provide a point in time recovery capability. Due to the synchronous nature of this product, high availability and failover features are provided by default, reducing the need for a database restore. More selective disaster recoverability options or data analysis of SQL statements in the binary log is not currently possible.

For obtaining a copy of data at a given time, the recommendation is to use the export functionality. It is then possible to import this for verification and use if necessary.

At the time of this book’s publication, Google Cloud SQL was in limited beta. More information can be found at https://developers.google.com/cloud-sql/.

image

HP Cloud Database as a Service (DBaaS)

The HP Public Cloud (http://hpcloud.com) provides a number of services based on the popular Open Stack cloud software (http://openstack.org/). These services include compute, storage, content delivery network (CDN), identity management, and a managed MySQL relational database offering. This MySQL offering is a DBaaS based on the Red Dwarf project from Open Stack (http://wiki.openstack.org/DatabaseAsAService).

Access to creating, deleting, and restarting instances, and creating and using snapshots is via a RESTful HTTP API providing JSON formatted response. The standard MySQL client tools, including mysql and mysqldump, can be used for access to the MySQL database.

The HP DBaaS offering is fully integrated with the Openstack Keystone Identity Service. To access the DBaaS API, you first need to obtain a token to be used for the X-Auth-Token HTTP header. In addition the X-Auth-Project-Id (generally your e-mail, or tenant name) is required for all requests. The credentials required to obtain an auth-token and the project-id can be found from the HP Management Console.

As of this writing, the HP Cloud DBaaS (currently in private beta) offers the following features. These are subject to change in the future:

• Create and terminate database instances

• Reset password

• Create and delete snapshots

• Create instance from an existing snapshot

MySQL Versions

DBaaS currently uses Percona Server version 5.5, available from http://www.percona.com/software/percona-server/. This is a fork of MySQL version 5.5 providing XtraDB (a modified version of InnoDB), better instrumentation, performance improvements, and a shorter deployment life cycle for new features.

Example Database Creation

A request has to be made to the Identity Service to obtain the auth token to use for DBaaS access. For example:

image

image

Using the id value you can create a new instance with:

image

Backup Options

DBaaS provides for backup options via mysqldump or by a database snapshot.

mysqldump

Refer to the mysqldump syntax in Chapters 2 and 8 and the syntax in the RDS section for example usage.

Database Snapshot

DBaaS provides several API calls to manage database snapshots. A database snapshot can be performed on a running MySQL instance with the following command:

image

Snapshots are automatically replicated to multiple Availability Zones for higher availability.

Recovery Options

mysqldump

Refer to the syntax in Chapter 5 and the RDS section for example usage.

Database Snapshot

A DBaaS instance can be created from a pre-existing snapshot. The request is identical to creating a new instance, with the additional snapshot Id parameter. For example:

image

Point in Time Recovery

No information was available at this time regarding point in time capabilities.

At the time of this publication DBaaS was in closed beta and access for more detailed testing was not possible. More information can be found at http://www.hpcloud.com/products/RDB.

image

Cloud Impact on Backup and Recovery

The use of the cloud does not mean that disasters will no longer occur. This is a myth. The cloud has made the case to ensure appropriate DR operations are more prevalent. As the control and management of data systems become the responsibility of service providers, it is more important that your business understands the risks of entrusting this responsibility to a third party.

The cloud has enabled a new way of testing and verifying your processes. The availability of systems on demand provides the ability to easily test and verify your system at production scale, for a very small cost. The ability to also test with additional size and load is possible immediately without any hardware provisioning requirements (except a credit card). That same ability enables options to fully test backup and restore procedures with full production volume without any hindrances.

Organizations are also using the cloud for storing backup files externally, read scalability via replication, and ancillary services including analytics, reporting, and data warehousing. The use of the cloud is becoming an ideal hybrid implementation that scales on demand with no upfront hardware costs. The use of the cloud does introduce additional security concerns regarding the access to your business data and appropriate encryption techniques become more important. Backup options that have been discussed in this book generally store and transfer data in clear text.Chapter 8 discusses some more information on MySQL backup security. Amazon, the current leading cloud provider, recognizes security as the top priority and is constantly improving the level of fine grained access via techniques including security groups, virtual private network (VPN) access, and AWS direct connect.

In some organizations, cloud deployments have now streamlined the disaster preparedness to an art form, giving backup and recovery the top priority in business needs. For example, using a quote from Netflix: “The Chaos Monkey’s (and simian army’s) job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most—in the event of an unexpected outage.” This is a brilliant concept for being prepared.

See http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html and http://techblog.netflix.com/2011/07/netflix-simian-army.html for more background information.

Ironically, the best advice for creating an effective MySQL backup and recovery strategy is a quote now possible because of these cloud based technology capabilities.

“The best way to avoid failure is to fail constantly.”

John Ciancutti, Netflix

image

Conclusion

Cloud based services are becoming more easily available to consumers. While the Amazon RDS service has been available for some time, the HP and Google offerings are recent editions not yet generally available when this book was published. More cloud based MySQL products will definitely become available in the future. Each option has its relative merits of providing a managed service; however, you must evaluate the strengths and interoperability needs with any existing production environment to determine the best approach to maintaining your business continuity effectively.

The commands, SQL statements, and web links listed in this chapter can be downloaded from http://effectivemysql.com/book/backup-recovery/.