The TokuDB Storage Engine - MariaDB Cookbook (2014)

MariaDB Cookbook (2014)

Chapter 4. The TokuDB Storage Engine

In this chapter, we will cover the following recipes:

· Installing TokuDB

· Configuring TokuDB

· Creating TokuDB tables

· Migrating to TokuDB

· Adding indexes to TokuDB tables

· Modifying the compression of a TokuDB table

Introduction

TokuDB is a high-performance storage engine for MariaDB, optimized for write-intensive workloads. It is highly scalable and uses a storage technology that the developer, Tokutek, calls Fractal Tree Indexes. It can be used with no application or code changes instead of (and alongside) MyISAM, Aria, and InnoDB/XtraDB tables. It is ACID and MVCC compliant.

ACID compliance means that TokuDB transactions have atomicity, consistency, isolation, and durability. More information on ACID is available at http://en.wikipedia.org/wiki/ACID.

MVCC compliance means that TokuDB has multiversion concurrency control for database transactions. More information on MVCC is available at http://en.wikipedia.org/wiki/Multiversion_concurrency_control.

Fractal trees are a modification of B-trees, which is what InnoDB uses to store data (to be more accurate, InnoDB uses a balanced B+ tree). Whereas InnoDB has a single small cache for an entire data tree, TokuDB implements several large caches at multiple levels in a tree. It then buffers inserts, updates, deletes, and other operations until it has a large batch of them that it can apply as a single operation, greatly reducing the number of input/output (I/O) operations and thus increasing performance.

The video presentation at http://youtu.be/c-n2LGPpQEw is a good introduction to Fractal Tree Indexes.

TokuDB is only supported on 64-bit Linux systems, so the recipes in this chapter will not work on Windows or Mac OS X.

Installing TokuDB

Before we can start using TokuDB, we must first install it. TokuDB is included in MariaDB, but it is not activated by default.

How to do it...

Follow the ensuing steps:

1. Launch the mysql command-line client application with a user that has the SUPER privilege (like the Root user).

2. Run the following command:

3. INSTALL SONAME 'ha_tokudb.so';

4. Run the SHOW PLUGINS; command and verify that the TokuDB plugins are ACTIVE. The output will be similar to the following screenshot (it has been edited to show just the TokuDB entries):

How to do it...

5. Run the SHOW ENGINES; command and verify that the TokuDB storage engine is listed and enabled (the Support column). The output of the TokuDB line will be similar to the following:

Engine

Support

Comment

TokuDB

YES

Tokutek TokuDB Storage Engine

How it works...

Like some of the other plugins and alternative storage engines that ship with MariaDB, TokuDB is disabled by default. To enable it, we use the INSTALL SONAME command. There are several parts to the TokuDB storage engine, which is why one command appears to enable several plugins.

There's more...

Some additional steps we may want to perform when installing TokuDB are to make it the default storage engine and to create a TokuDB-specific configuration file.

Making TokuDB the default storage engine

We can make TokuDB the default storage engine by running the following command:

SET GLOBAL default_storage_engine=TokuDB;

To make it permanent, we then add the following to the end of our system's my.cnf or my.ini file (or to an existing [mysqld] section) and restart MariaDB:

[mysqld]

default-storage-engine=TokuDB

Creating a TokuDB-specific configuration file

On Linux systems, such as Fedora, Debian, CentOS, Ubuntu, and others, MariaDB comes configured with support for modular configuration files. At the bottom of the default configuration file is a line beginning with an exclamation mark (!). This command includes all the files that end in .cnf in the directory named on the line.

Using multiple files for our configuration allows us to enable and disable features by just moving files around. We can also make changes without having to edit an increasingly long configuration file.

The directory will be located in one of two places. On Red Hat, CentOS, and Fedora, the location is /etc/my.cnf.d/. On Debian, Ubuntu, and Linux Mint, the location is /etc/mysql/conf.d/.

When creating our own custom configuration file, it's best to give it a descriptive name. Something like tokudb.cnf is perfect. The file needs a [mysqld] section at the very least, but we could put other sections in the file if we needed or wanted to.

See also

· The MariaDB knowledge base has a section devoted to TokuDB, which is available at https://mariadb.com/kb/en/tokudb/

· The TokuDB section of the Tokutek website also contains lots of good information and is available at http://www.tokutek.com/products/tokudb-for-mysql/

Configuring TokuDB

Like other storage engines, TokuDB has many custom settings and options. Thankfully, there are only a few that we really need to know about up front, and the default settings are fairly optimized.

Getting ready

This recipe is going to assume a few things; firstly, that our server has 16 GB of RAM, and secondly that we have two SSD drives, /dev/sdb1 and /dev/sdc1, mounted under the /mnt/ directory.

How to do it...

Follow the ensuing steps:

1. Open the my.cnf file and add the following lines to an existing [mysqld] section:

2. # TokuDB Cache should be set to at least half of available RAM

3. tokudb-cache-size = 9GB

4.

5. # TokuDB File Locations

6. tokudb-data-dir = /mnt/sdb1

7. tokudb-log-dir = /mnt/sdb1

8. tokudb-tmp-dir = /mnt/sdc1

9. Disable the write cache on our disks with the following:

10.hdparm -W0 /dev/sdb1

11.hdparm -W0 /dev/sdc2

12. Show the status of TokuDB with the following command:

13.SHOW ENGINE TokuDB STATUS;

How it works...

There are many settings for TokuDB that we can tweak, enable, and disable; but for many users, the defaults work well. One such default is that TokuDB will automatically set the tokudb-cache-size option to be equal to half of our system RAM. In our recipe, we set it manually to be a bit more than half, but we will need to test to see if it helps with our databases and workloads. The TokuDB developers recommend that this setting should never be set to lower than half.

Like with InnoDB, we can set the location of TokuDB's log, data, and temporary files to be different from the configured default for other tables. In our example, we configure them to live on our fast solid-state drives (SSDs). For simplicity, we configure the data and logs to go to the same place. However, we set the temporary files to go to a completely different drive, so that they don't impact the performance of our data drive. We could just set the MariaDB default locations to these and TokuDB would write there by default, but sometimes it's better for performance if we stick certain table types in one location and other table types in another.

Next, we turn off the write cache on our drives. The write cache is problematic because during a power failure or other catastrophic event, we don't want any unwritten data to be in a drive's onboard cache. When the operating system tells us it has written the data to disk, we want it to have actually been written, not temporarily sitting in a cache somewhere. Many servers have battery-backed RAID cards and other protections, but they are not 100 percent foolproof.

Incidentally, the SHOW ENGINE TokuDB STATUS; command is case insensitive. We can use TokuDB, TOKUDB, tokudb, or any other case variant we can think of.

There's more...

There are many additional options for TokuDB that are not covered here. The TokuDB section of the MariaDB knowledge base at https://mariadb.com/kb/en/tokudb/ contains links to many useful resources for getting the most out of this powerful storage engine.

See also

· In addition to the TokuDB section of the MariaDB knowledge base mentioned in the previous section, another good resource for learning about the various TokuDB configuration options is the Tokutek website at http://tokutek.com.

Creating TokuDB tables

Creating a TokuDB table is much like creating a MyISAM, Aria, or InnoDB/XtraDB table. There are some more options and abilities we should know about.

Getting ready

Import the ISFDB database as described in the Importing the data exported by mysqldump recipe from Chapter 2, Diving Deep into MariaDB.

How to do it...

Follow the ensuing steps:

1. Launch the mysql command-line client and connect to the isfdb database.

2. Use the following CREATE statement to create our TokuDB table:

3. CREATE TABLE authors_tokudb (

4. author_id int NOT NULL AUTO_INCREMENT,

5. author_canonical mediumtext,

6. author_lastname varchar(128),

7. author_birthplace mediumtext,

8. author_birthdate date DEFAULT NULL,

9. author_deathdate date DEFAULT NULL,

10. PRIMARY KEY (author_id),

11. KEY (author_lastname),

12. KEY (author_birthdate),

13. KEY (author_deathdate)

14.) ENGINE=TokuDB;

15. Import some data from the authors table into our new table:

16.INSERT authors_tokudb

17. SELECT

18. author_id, author_canonical,

19. author_lastname, author_birthplace,

20. author_birthdate, author_deathdate

21. FROM authors;

22. Verify that the indexes are working with the following command:

23.SHOW INDEXES FROM authors_tokudb;

24. The output of the preceding steps will have four rows with information on each of the indexes we created.

25. Optimize the table with the following command:

26.OPTIMIZE TABLE authors_tokudb;

27. The output of the OPTIMIZE command will be similar to the following screenshot:

How to do it...

How it works...

At first glance, the CREATE TABLE statement in this recipe looks similar to other CREATE TABLE statements we have seen earlier, but there are a few differences. The obvious one is that we specify ENGINE=TokuDB after the data definition. This is necessary if we have not set TokuDB to be the default storage engine.

Next, in the data definition, we have defined three KEY indexes in addition to our PRIMARY KEY index.

After creating our table, we populate it using the data selected from the isfdb.authors table. Then, we take a look at our indexes to make sure they are there and then we optimize our new table. The optimization step basically applies pending additions and deletions to our indexes. The optimization step is not needed for performance reasons in TokuDB like it is with other storage engines.

See also

· The Adding indexes to TokuDB tables recipe for more information on indexes in TokuDB

· The Modifying the compression of a TokuDB table recipe for more information on TokuDB's ROW_FORMAT compression settings

Migrating to TokuDB

TokuDB doesn't do us any good if we don't use it. Migrating existing tables to TokuDB, whether they are MyISAM, Aria, or InnoDB/XtraDB, is fairly painless.

Getting ready

Import the ISFDB database as described in the Importing the data exported by mysqldump recipe in Chapter 2, Diving Deep into MariaDB.

How to do it...

Follow the ensuing steps:

1. Launch the mysql command-line client and connect to the isfdb database.

2. Alter the pub% tables to be TokuDB tables:

3. ALTER TABLE pub_authors ENGINE=TokuDB;

4. ALTER TABLE pub_content ENGINE=TokuDB;

5. ALTER TABLE pub_series ENGINE=TokuDB;

6. ALTER TABLE publishers ENGINE=TokuDB;

7. Run SHOW CREATE TABLE on each of the tables to verify that they now have ENGINE=TokuDB after the data definition section. Using the publishers table as an example, we get the following output:

How to do it...

8. Convert additional tables in the isfdb database to TokuDB format if required.

How it works...

The ALTER TABLE statement works in three stages. It first creates a new table using the new table definition. It then copies data from the old table to the new table. The last step is to rename the new table to the same name as the old table.

There's more...

The ALTER TABLE method of migrating to TokuDB is probably the best way, but there are others. For example, if we have a backup made with mysqldump, we could do a search and replace on the schema definitions in our backup file and change the CREATE TABLEstatements to create TokuDB tables instead of what they were configured to create. Then, when we import the file, the restored tables will be TokuDB tables.

Another method is to create a table based on an existing table, alter it, and then backup the existing table and import the backup into the new table, as follows:

CREATE TABLE notes_tokudb LIKE notes;

ALTER TABLE notes_tokudb ENGINE=TokuDB;

SELECT * FROM notes INTO OUTFILE '/tmp/notes.tmp';

LOAD DATA INFILE '/tmp/notes.tmp' INTO TABLE notes_tokudb;

That said, for most cases, if not all, the ALTER TABLE method is preferred.

See also

· The How to quickly insert data into MariaDB page of the MariaDB knowledge base goes into more detail about large data import operations, and the instructions can be adapted for TokuDB easily. This is available at https://mariadb.com/kb/en/how-to-quickly-insert-data-into-mariadb/.

Adding indexes to TokuDB tables

TokuDB has advanced indexing capabilities compared to other storage engines, but we can't use them if we don't add them to our tables.

Getting ready

Import the ISFDB database as described in the Importing the data exported by mysqldump recipe in Chapter 2, Diving Deep into MariaDB.

How to do it...

Follow the ensuing steps:

1. Launch the mysql command-line client and connect to the isfdb database.

2. View the current indexes on the authors table with the following command:

3. SHOW INDEXES FROM authors;

4. Alter the authors table to use the TokuDB storage engine and change the index on the author_canonical column to CLUSTERING, as shown in the following commands:

5. ALTER TABLE authors

6. DROP KEY canonical,

7. ADD CLUSTERING KEY canonical (author_canonical(50)),

8. ENGINE=TokuDB;

9. Create another index, this time on the author_birthdate column, as shown in the following commands:

10.CREATE CLUSTERING INDEX birthdate

11. ON authors (author_birthdate);

12. View the indexes again as we did in step 2.

How it works...

Clustered indexes include all the columns of a table and can be used as covering indexes. They also have performance advantages compared to other indexes because of the way TokuDB works. We can define clustered indexes in some other storage engines, but only one of them. TokuDB lets us define multiple clustered indexes. Being able to define more than one gives TokuDB tables a great performance boost over a wide range of queries.

In our recipe, we defined clustered indexes in two ways, first using an ALTER TABLE statement to replace an existing index with a clustered index, and then to add a new index using the CREATE INDEX statement.

There's more...

Some of the other advantages of TokuDB's indexing capabilities include being able to use an auto_increment column in any index and within any position in that index. Also, TokuDB indexes can have up to 32 columns.

See also

· More information on TokuDB's clustering indexes is available at http://tokutek.com/2009/05/introducing_multiple_clustering_indexes/.

Modifying the compression of a TokuDB table

TokuDB has several compression options to help us strike the perfect balance between disk space and performance.

Getting ready

Import the ISFDB database as described in the Importing the data exported by mysqldump recipe in Chapter 2, Diving Deep into MariaDB.

How to do it...

Follow the ensuing steps:

1. Launch the mysql command-line client and connect to the isfdb database.

2. Alter the titles table to use default compression, as shown in the following command:

3. ALTER TABLE titles ENGINE=TokuDB

4. ROW_FORMAT=default ;

5. Alter the pub_content table to use high compression, as shown in the following command:

6. ALTER TABLE pub_content ENGINE=TokuDB

7. ROW_FORMAT=tokudb_small;

8. Alter the canonical_author table to have fast compression, as shown in the following command:

9. ALTER TABLE canonical_author ENGINE=TokuDB

10. ROW_FORMAT=tokudb_fast;

11. Alter the notes table to use the lzma compressor, as shown in the following command:

12.ALTER TABLE notes ENGINE=TokuDB

13. ROW_FORMAT=tokudb_lzma;

14. Alter the pubs table to not use any compression, as shown in the following command:

15.ALTER TABLE pubs ENGINE=TokuDB

16. ROW_FORMAT=tokudb_uncompressed;

17. Optimize all the tables we just altered, as shown in the following command:

18.OPTIMIZE TABLE

19. titles, pub_content, canonical_author, notes, pubs;

How it works...


Apart from being able to switch the compression of tables to whatever works best in our situation, a big advantage of TokuDB compared to other storage engines is its ability to highly compress data and still have great performance. The amount and type of compression we use is controlled by the ROW_FORMAT option. If we do not specify row format when creating a table, it will default to the default row format.

The official recommendation from Tokutek, the developers of TokuDB, is to use standard compression (default) on machines with six or fewer cores and high compression (tokudb_small) only on machines with more than six cores.

There's more...

The tokudb_fast and tokudb_small compression options are actually just aliases to tokudb_quicklz and tokudb_lzma, respectively. They may be changed in the future if other compression options are added to TokuDB. Likewise, the default compression is currently alsotokudb_quicklz.

There are two other compression options: the deprecated tokudb_zlib compression option, which is what TokuDB used as its default until tokudb_quicklz debuted in TokuDB Version 5.2, and tokudb_uncompressed, which disables compression (useful for uncompressable data).