The CONNECT Storage Engine - MariaDB Cookbook (2014)

MariaDB Cookbook (2014)

Chapter 5. The CONNECT Storage Engine

In this chapter, we will cover the following recipes:

· Installing the CONNECT storage engine

· Creating and dropping CONNECT tables

· Reading and writing CSV data using CONNECT

· Reading and writing XML data using CONNECT

· Accessing MariaDB tables using CONNECT

· Using the XCOL table type

· Using the PIVOT table type

· Using the OCCUR table type

· Using the WMI table type

· Using the MAC address table type

Introduction

In this chapter, we will explore some of the features of the CONNECT storage engine. This storage engine allows us to access data in various file formats such as XML, CSV, and other types of files stored on our host system. Its purpose is to connect MariaDB to these various data types. It's a very handy tool for bringing various pieces of an infrastructure together. So, the CONNECT storage engine tables are not exactly tables in the traditional sense (they may not even physically exist). With that in mind, there are some things we need to realize when working with this storage engine.

First, DROP TABLE does not delete content the way MyISAM, InnoDB, and other tables do. CONNECT tables are definitions of where the data we want to access is and what format it is in. For example, an XML file stored in a user's home directory. When we drop a CONNECT table using DROP, we are dropping the where-and-what definition stored in the CONNECT table and not the data itself.

Secondly, indexing behaves differently for CONNECT tables. Most (but not all) of the CONNECT data types that connect to files support indexing but only as long as there are no NULL values. Virtual CONNECT tables, which connect to a source of information such as another database, the filesystem, or the operating system, cannot be indexed because data from these sources is unknown until we access it.

More about indexing CONNECT tables can be found at https://mariadb.com/kb/en/using-connect-indexing/ and the full documentation of the CONNECT storage engine can be found at https://mariadb.com/kb/en/connect/.

Installing the CONNECT storage engine

The CONNECT storage engine is not installed by default. So, the first thing that we have to do is to install and enable it.

How to do it...

1. On Fedora, CentOS, or Red Hat systems, we can run the following command line:

2. sudo yum install MariaDB-connect-engine

On Debian, Ubuntu, or Linux Mint systems, we can run the following command line:

sudo apt-get install mariadb-connect-engine-10.0

3. On all systems, launch the mysql command-line client and connect it to our MariaDB server with a user that has the SUPER privilege.

4. Enable the CONNECT storage engine by running the following command line:

5. INSTALL SONAME 'ha_connect';

6. Verify the installation by running the following two commands and look for CONNECT in the output:

7. SHOW ENGINES;

8. SHOW PLUGINS;

How it works...

On Windows and MacOS, the CONNECT storage engine is included but not active. On Linux, we need to install the CONNECT package before we can enable it using the mysql command-line client.

See also

· The full documentation of the CONNECT storage engine can be found at https://mariadb.com/kb/en/connect/

Creating and dropping CONNECT tables

CONNECT tables are only superficially similar to other tables. In this recipe, we'll create a CONNECT DIR table.

Getting ready

Enable the CONNECT engine as specified in the Installing the CONNECT storage engine recipe at the beginning of this chapter.

How to do it...

1. Connect to MariaDB with the mysql command-line client and to the test database with a user that has the CREATE privilege. If the test database is absent, create one.

2. Run the following CREATE statement to create a table that lists the files in the data directory of the test database:

3. CREATE TABLE test_data (

4. path varchar(256) NOT NULL flag=1,

5. filename varchar(256) NOT NULL flag=2,

6. filesize double(12,0) NOT NULL flag=5

7. ) ENGINE=CONNECT DEFAULT CHARSET=latin1

8. TABLE_TYPE=DIR FILE_NAME='*.frm'

9. OPTION_LIST='subdir=1';

Select everything in the table. The output will vary depending on the tables in the test database and their location and size. Though the columns will be similar to the following screenshot, the contents will not match:

How to do it...

Use the SUM() and COUNT() functions to summarize the output, as shown in the following code. As with step 3, the columns in your output will match but the results will not:

SELECT path, COUNT(*), SUM(filesize)

FROM test_data GROUP BY path;

10. The output will be similar to the following screenshot:

How to do it...

11. Drop the test_data table using the following statement:

12.DROP TABLE test_data;

How it works...

When we define a table using ENGINE=CONNECT, we are assigning the CONNECT storage engine to look at the data that is stored somewhere outside our MariaDB database. This data could be in a file, in another database, or as in this recipe, in the filesystem itself. The whole purpose of this storage engine is to, for lack of a better term, connect us to different sources of data.

As it connects to so many different forms of data, the CONNECT engine adds over two dozen table options and five column options to the standard CREATE TABLE syntax. These options enable us to correctly describe the data we want MariaDB to connect to.

The most important of these is the TABLE_TYPE option. We use this to inform CONNECT about the several supported table types that we are creating. These include CSV, XML, INI, ODBC, MYSQL, DIR, and others. For this recipe, we used the DIR type, which is one of the simpler ones to define in a CREATE TABLE statement.

The FILE_NAME option is the other one we will use on all CONNECT tables. It defines the file that we will be reading and writing data to. In this recipe, it simply defines a *.frm wildcard pattern to list every file in the directory housing our test database that ends with that extension.

Another important table option is OPTION_LIST. CONNECT provides over two dozen options but they aren't enough for some of the supported data types that CONNECT can connect to. The OPTION_LIST option is a catchall for all supplemental options. The documentation for each table type lists these supplemental options and what they do. For our recipe, we set the supplemental option subdir=1. It means that we want our directory listing to recursively descend up to one directory below our default directory.

When CONNECT accesses a directory to gather data for us for the DIR table type, it collects the predefined pieces of information, such as path, file name, and file size. It flags each type of data with a number. Our column definition maps these flags to our columns. The following is a table of the flag numbers for the DIR table type and the information they contain:

Flag Number

Information

1

Path

2

File name

3

File type

4

File attributes

5

File size

6

Last write-access date

7

Last read-access date

8

File creation date

Keep in mind that flags for other table types are not the same because they are dealing with different types of data. When defining a new CONNECT table, we should always consult the CONNECT documentation for the list of flags.

There's more...

The CREATE TABLE statements for CONNECT tables can be tricky to define properly. A table definition may be accepted by the server, but it may not do anything or connect to our file the wrong way.

OPTION_LIST options

The OPTION_LIST option is specified in the following pattern:

OPTION_LIST='option1=optvalue1,option2=optvalue2,...'

No spaces or other blanks are allowed between the equal signs or commas; it must be one long space-free string.

Another thing to note is that option values cannot contain commas, but they can contain equal signs. For example, the following is valid for connecting to an HTML table in an XML document:

OPTION_LIST='name=table,coltype=HTML,attribute=border=3;cellspacing=2;cellpadding=5,headattr=title=mytable;bgcolor=gray'

It can be a little tricky to parse, but we just need to remember that semicolons (;) are fine to separate parts of complex options in an option list. However, commas (,) can only be used as option separators.

Dropping CONNECT tables

The standard DROP statement is used to remove a CONNECT table from our database, but unlike what happens with a normal table, the corresponding data and index files are not removed. To really remove a dropped CONNECT table, we need to first drop it and then navigate to wherever the data is (as defined by the FILE_NAME option) and remove the data file or files. In this recipe, this isn't necessary because we're actually connecting to the output of a directory listing from our filesystem and not to an actual file. However, it is something to remember when we connect to actual files in other recipes in this chapter.

Files and CONNECT

When we define a connection to an actual file, say with the XML table type, the file we specify with the FILE_NAME option does not need to actually exist. If it does, great; however, if it doesn't, CONNECT will not create it until we actually INSERT some data into the table.

When CONNECT does create a file, it will either use the value we defined in the FILE_NAME option or use the TABLENAME.TABLETYPE pattern for the name, for example, myfile.xml for a table of the XML type and a name of myfile.

See also

· The full documentation of how to CREATE and DROP CONNECT tables, including tables of all the various options can be found at https://mariadb.com/kb/en/creating-and-dropping-connect-tables/

· DIR and other Special Virtual Tables are documented at https://mariadb.com/kb/en/connect-table-types-special-virtual-tables/

Reading and writing CSV data using CONNECT

CSV (comma separated values) is a very common data-interchange format. MariaDB can easily import CSV formatted files using the LOAD DATA INFILE command, and there is a CSV storage engine that stores data in the CSV format. However, neither of these handles cases where we need to be capable of querying CSV files that are updated outside of MariaDB and CSV files that we don't have to import before we can query them. The CONNECT storage engine's CSV data type allows us to do this easily.

Getting ready

We need to have some CSV data to work with for this recipe. We'll use data from the ISFDB database for this. To start with, perform the following steps:

1. Import the ISFDB database as described in the Importing the data exported by mysqldump recipe from Chapter 2, Diving Deep into MariaDB.

2. Install and enable the CONNECT storage engine as described in the Installing the CONNECT storage engine recipe at the beginning of this chapter.

3. Launch the mysql command-line client application and connect to the isfdb database on our MariaDB server. Then create a /tmp/authors.csv file with the following statement:

4. SELECT author_id, author_canonical, author_legalname,

5. author_birthplace, author_birthdate, author_deathdate

6. INTO OUTFILE '/tmp/authors.csv'

7. FIELDS TERMINATED BY ',' ENCLOSED BY '"'

8. FROM authors ORDER BY author_id LIMIT 100;

How to do it...

1. Launch the mysql command-line client application and connect to the isfdb database on our MariaDB server.

2. Create a table named authors_csv, which uses the CONNECT storage engine's CSV data type and is connected to the authors_csv.CSV file we created in preparation for this recipe (change the FILE_NAME value to wherever the file is actually located):

3. CREATE TABLE authors_csv (

4. author_id int(11) NOT NULL,

5. author_canonical varchar(1024) NOT NULL,

6. author_legalname varchar(1024) NOT NULL,

7. author_birthplace varchar(1024) NOT NULL,

8. author_birthdate varchar(10),

9. author_deathdate varchar(10)

10.) ENGINE=CONNECT TABLE_TYPE='CSV'

11.FILE_NAME='/tmp/authors.csv'

12.SEP_CHAR=',' QCHAR='"' QUOTED=1;

13. Run the following SELECT statement to verify that we are reading from the CSV file:

14.SELECT * FROM authors_csv;

15. Run the following INSERT statement to add a couple of rows to the CSV file:

16.INSERT authors_csv VALUES (

17. 101,"Fake Author",

18. "Author, Fake",

19. "Charlotte, North Carolina, USA",

20. "1970-01-01",""), (

21. 102,"Really Fake Author",

22. "Author, Really Fake",

23. "St. Paul, Minnesota, USA",

24. "1969-12-31","");

25. Open the CSV file in a text editor and add the following row to the bottom of the file, then save and close the file:

26.103,"Fake","Fake","Fake, USA","1970-04-01",

27. Run the SELECT statement from step 3 and verify the three rows we added with author_id numbers greater than 100 in the output, which is shown in the following screenshot:

How to do it...

How it works...

When we create a table that uses the CONNECT storage engine's CSV data type, we're not actually creating a table in the traditional sense. Instead, we are telling CONNECT how to read the file. Most of the statement we use to create the table is the standardCREATE TABLE syntax, but there are several bits at the end that are specific to the CONNECT storage engine and the CSV data type; the primary ones being TABLE_TYPE and the FILE_NAME parts.

The other three are more specific to the CSV table type. SEP_CHAR defines the separator character, a comma (,), in our recipe, and QCHAR defines the character used to quote values, double quotes (") for this CSV file.

The QUOTED option is special. This option sets how CONNECT should handle quoting. There are four settings that it recognizes. They are as follows:

· A setting of 0 means fields will only be quoted if they contain the separator character or if they begin with the quoting character (in which case the quoting character will be doubled)

· A setting of 1 means all text fields will be quoted unless they are NULL (numeric fields will not be quoted)

· A setting of 2 means all fields will be quoted unless they are NULL

· A setting of 3 means all fields will be quoted, including NULL fields

Inserting using the mysql command-line client works like we would expect, and the new rows are added to the bottom of the file. We can also make insertions outside of MariaDB by editing the CSV file directly.

There's more...

There are a few more things to be aware of when working with CSV files using the CONNECT storage engine.

CSV header lines

Some CSV files have a header line that contains the names of the columns. We can instruct CONNECT to ignore this line with a HEADER=1 option when defining the table. A common place to define this is after the FILE_NAME option.

Changing the number and order of columns read using flags

For some CSV files, if we plan on just reading the data, we may only care about a subset of the columns in the file, or we may want them to be read in a different order. For both of these situations, we use the FLAG option as part of the column definition when creating the table. For example, the following code is a modified version that only contains a reordered subset of the information in our example CSV file:

CREATE TABLE authors_csv2 (

author_id int(11) NOT NULL,

author_birthdate varchar(10) NOT NULL FLAG=5,

author_birthplace varchar(1024) NOT NULL FLAG=4,

author_canonical varchar(1024) NOT NULL FLAG=2

) ENGINE=CONNECT DEFAULT CHARSET=utf8

TABLE_TYPE='CSV'

FILE_NAME='/tmp/authors_csv.CSV'

SEP_CHAR=',' QCHAR='"' QUOTED=1;

We will run into trouble if we write to this table, so if we do decide to do this, we should treat the table as read-only and possibly set READONLY=1 when defining the table so that CONNECT will not even attempt to perform an INSERT query.

See also

· The full documentation on connecting to CSV data files can be found at https://mariadb.com/kb/en/connect-table-types-data-files/

· More information on data types in the CONNECT storage engine can be found at https://mariadb.com/kb/en/connect-data-types/

Reading and writing XML data using CONNECT

There is a lot of data stored in XML format. MariaDB can easily export data as XML, but before the CONNECT engine, it did not have a way to easily read from and write to external XML documents.

Getting ready

Import the ISFDB database as described in the Importing the data exported by mysqldump recipe from Chapter 2, Diving Deep into MariaDB. Then, install and enable the CONNECT storage engine as described in the Installing the CONNECT storage engine recipe at the beginning of this chapter. Then, export the isfdb-001.xml file as described in the Producing XML output recipe from Chapter 2, Diving Deep into MariaDB. For this recipe, it is assumed that the XML file is located in /tmp/isfdb-001.xml, but it will be wherever we were when we exported it from MariaDB. We'll need to alter the FILE_NAME option in the recipe to point at it.

How to do it...

1. Launch the mysql command-line client application and connect to the isfdb database on our MariaDB server.

2. Run the following CREATE TABLE statement:

3. CREATE TABLE authors_xml (

4. author_id int,

5. author_canonical varchar(1024),

6. author_legalname varchar(1024),

7. author_birthplace varchar(1024),

8. author_birthdate char(10),

9. author_deathdate char(10),

10. note_id int,

11. author_wikipedia varchar(1024),

12. author_views int,

13. author_imdb varchar(1024),

14. author_marque int,

15. author_image varchar(1024),

16. author_annualviews int,

17. author_lastname varchar(1024),

18. author_language int

19.) ENGINE=CONNECT TABLE_TYPE=XML FILE_NAME='/tmp/isfdb-001.xml'

20. TABNAME='resultset'

21. OPTION_LIST='rownode=row,colnode=field,coltype=HTML'

22.;

23. Run the following SELECT statement to get a list of authors born in the United Kingdom (may be empty):

24.SELECT

25. author_id, author_canonical

26.FROM authors_xml

27.WHERE author_birthplace LIKE '%UK';

28. Run the following INSERT statement to add a row to our XML file:

29.INSERT authors_xml VALUES (

30. 101,"Terry Pratchett","Pratchett, Terry",

31. "Beaconsfield, Buckinghamshire, UK",

32. "0000-00-00","0000-00-00",101,

33. "",101,"",101,"",101,"Terry",101 );

34. Run the SELECT statement from step 3 to see that the row was added.

How it works...

For this recipe, we're using the XML output produced by MariaDB. The file has the following format:

<resultset>

<row>

<field name="first_column"></field>

<field name="second_column"></field>

...

<field name="last_column"></field>

</row>

</resultset>

This style of XML is actually quite similar to an HTML table, except with different tag names (resultset instead of table, row instead of tr, and field instead of td).

Defining the columns for this table so that we can read the XML data is similar to the process of creating columns in a regular MyISAM or InnoDB table. However, after the ENGINE=CONNECT part, we add options to tell the CONNECT storage engine how to read the file.

The data we are interested in is between the <resultset> tags, so we specify TABNAME='resultset'. We then inform CONNECT about the data using the OPTION_LIST option. In this option, we first specify coltype=HTML, which means that the column tags are all going to be named the same, so we should read them by their position and not by their names. Next, we give the names of the tags that specify the rows and the columns, rownode=row and colnode=field respectively.

Once the table is defined, we can query it much like a regular table. We can also insert data with some things to be aware of as described in the next section.

There's more...

XML data can be tricky to work with mainly because it is such a flexible data storage format. The CONNECT storage engine tries to accommodate as much variation as it can, but there will always be some XML files that it simply cannot read from or write to properly.

Inserting XML data

In this recipe, we inserted some data. From within MariaDB, it appeared that the data was inserted properly; however, if we view the XML file, it's plain to see that CONNECT did not put it in the way the other entries were inserted. For this reason, it's often best to stick to inserting into simple XML documents or just treating them as read-only data.

Tree versus HTML-like data structures

Some XML data is like the data in our recipe; it is similar in structure to an HTML table. Other XML data is more like a tree. For example, our data may have had the following format:

<resultset>

<row>

<first_column></first_column>

<second_column></second_column>

...

<last_column></last_column>

</row>

</resultset>

If our data had been defined as shown in the preceding code lines, we would have defined our table as follows:

CREATE TABLE table_name (

first_column data_definition,

second_column data_definition,

...,

last_column data_definition)

ENGINE=CONNECT TABLE_TYPE=XML TABNAME='resultset'

FILE_NAME='/tmp/isfdb-001.xml'

OPTION_LIST'rownode=row';

This simplified definition is possible because CONNECT is smart enough to figure out that, as we didn't say otherwise, the tags in between the <row> and </row> tags must be the names of the columns.

Tags and tag attributes

For some XML files, we may want to query and update both tag names and attributes within tags. For this, the CONNECT storage engine provides us with the FIELD_FORMAT option.

Suppose our XML data had the following format:

<resultset attribute1="value" attribute2="value">

<row>

<first_col>

<sub1></sub1>

<sub2></sub2>

</first_col>

<second_col attribute="value" />

<last_col>

<sub1></sub1>

<sub2></sub2>

</last_col>

</row>

</resultset>

We could create our table as follows:

CREATE TABLE table_name (

attribute1 data_def FIELD_FORMAT='@attribute1',

attribute2 data_def FIELD_FORMAT='@attribute2',

subitem1 data_def FIELD_FORMAT='first_col/sub1',

subitem2 data_def FIELD_FORMAT='first_col/sub2',

attribute data_def FIELD_FORMAT='second_col/@attribute',

last_col data_def FIELD_FORMAT='last_col'

) ENGINE=CONNECT TABLE_TYPE=XML TABNAME='resultset'

FILE_NAME='/tmp/isfdb-001.xml'

The FIELD_FORMAT option allows us to specify which tag we want to read from and which attribute inside a tag uses the @ sign. We don't need to specify the top-level tag; so, for those attributes, we just name them.

See also

· The full documentation for the CONNECT storage engine's XML data type is at: https://mariadb.com/kb/en/connect-table-types-data-files/

Accessing MariaDB tables using CONNECT

Using the CONNECT storage engine, we can set up connections to local or remote MariaDB database tables and have them appear as if they are part of our MariaDB database.

Getting ready

Import the ISFDB database as described in the Importing the data exported by mysqldump recipe from Chapter 2, Diving Deep into MariaDB. Then, install and enable the CONNECT storage engine as described in the Installing the CONNECT storage engine recipe at the beginning of this chapter.

How to do it...

1. Launch the mysql command-line client application and connect to the isfdb database on our MariaDB server.

2. Run the following CREATE TABLE statement by altering the user:pass part of the CONNECTION option with a username and password that has rights to the isfdb database:

3. CREATE TABLE websites_2 (

4. site_id int(11),

5. site_name varchar(255),

6. site_url varchar(1024),

7. PRIMARY KEY (site_id)

8. ) ENGINE=CONNECT TABLE_TYPE=MYSQL

9. CONNECTION='mysql://user:pass@localhost/isfdb/websites';

10. Run the following two SELECT statements to test that our connection is working and that the output is the same for both tables (we're using the LENGTH part to limit the output to just the shorter URLs; feel free to omit it):

11.SELECT * FROM websites WHERE LENGTH(site_url)<40;

12.SELECT * FROM websites_2 WHERE LENGTH(site_url)<40;

13. Add some data to the table using the following INSERT statement:

14.INSERT websites_2 VALUES

15. ("","MariaDB.com","https://mariadb.com"),

16. ("","MariaDB.org","https://mariadb.org");

17. Run the SELECT statements from step 3 again to verify whether the new entries have appeared in both the tables.

How it works...

It may seem silly to set up a connection from our local isfdb database to our local isfdb database, but it serves as a good demonstration of the ability of the CONNECT storage engine to connect us to other MariaDB databases. This functionality is similar to that of the FEDERATEDX storage engine which also ships with MariaDB.

The usefulness of this feature becomes obvious when we are able to connect to remote database tables on other servers. They can be anywhere in the world, but we can connect and interact with them as if they are local. We'll be limited by the speed of our network connection between our local and remote MariaDB servers, but the utility and flexibility are hard to beat.

The key to configuring this data type is the CONNECTION option. It has the following format:

mysql://username:password@host/database_name/table_name

This format is basically a MariaDB URL. The host parameter can be any valid IP address, domain name, or the localhost key word.

If we connect with a user that doesn't require a password, we can skip :password of the username:password part of the URL.

The definitions of the original websites table and our new websites_2 table are slightly different. This is because the CONNECT storage engine does not support certain data types such as TINYTEXT or certain options such as AUTO INCREMENT. In practice, this mostly works out fine; we just exclude the options that aren't supported and modify the data type to be something close to what we need. Then, when we insert data, the original table's configuration will take care of making sure only the correct data is inserted. For example, in our recipe, the original table took care of inserting the correct auto-incremented values into the site_id column when we inserted our two rows of data.

There's more...

It probably goes without saying that this feature needs to be used with great care. Any time we open our database server to the Internet we're just asking for trouble. The best way to do this, if we need to connect two servers together that aren't in the same building (or even the same country), is to connect them via a VPN or another private, encrypted network connection.

We also need to ensure that permissions on our server are locked down good and tight as any user with the permission to issue a SHOW CREATE TABLE query can see the CONNECTION parameter, password, and other details.

See also

· The full documentation of the CONNECT Table Type is at https://mariadb.com/kb/en/connect-table-types-mysql-table-type-accessing-mysqlmariadb-tables/

Using the XCOL table type

In a perfect world, all data in a MariaDB database would be properly defined and normalized. We don't live in such a world, and sometimes, we have to work with tables that contain one or more catchall columns stuffed full of related values. The XCOL table type enables us to work with this data as if it was stored in a separate rather than a single column.

How to do it...

1. Launch the mysql command-line client application and connect to the test database on our MariaDB server. If the test database does not exist, create it first.

2. Run the following CREATE TABLE statement to create our example table:

3. CREATE TABLE superheroes (

4. team varchar(50),

5. heroes varchar(1024)

6. );

7. Add some data to our new table:

8. INSERT superheroes VALUES

9. ("The Avengers","Thor, Iron Man, Black Widow, Hawkeye, Hulk, Captain America"),

10. ("The Justice League", "Superman, Batman, Aquaman, Flash, Wonder Woman"),

11. ("The X-Men", "Storm, Cyclops, Wolverine, Rogue, Iceman");

12. Create an XCOL table that references our superheroes table (change username to a user that has read access rights to the superheroes table without needing a password):

13.CREATE TABLE superheroes_xcol ENGINE=CONNECT

14. TABLE_TYPE=XCOL TABNAME='superheroes'

15. OPTION_LIST='user=username,colname=heroes';

16. Run the following SELECT statements to test our XCOL table:

17.SELECT * FROM superheroes_xcol;

18.SELECT * FROM superheroes_xcol WHERE heroes LIKE "S%";

19.SELECT team, count(heroes) FROM superheroes_xcol GROUP BY team;

How it works...

An XCOL table is useful when we have a column in a table that is a list of values. The XCOL table gives us a view into this data and lets us query it as if it was separate.

There's more...

There are a few particulars of the XCOL table type that can cause unexpected issues. One is that an XCOL table is actually reconnecting to the server when it is being queried. If we're connected with a user that requires a password, we need to either supply a password to the option list when we create the table or supply the option list with a user that can read the table without a password.

Another thing to be aware of is that proxy tables are strictly read only. This is not obvious because the error message we get if we try to INSERT is a cryptic COLBLK SetBuffer: undefined Access Method instead of something more understandable.

Finally, XCOL and proxy tables are inefficient and consume more resources when accessed than a regular table. If we try to set up an XCOL table that connects to a large table, then we will run into serious performance issues. If we have a need to deal with lots of unstructured data, a better option is dynamic columns which are discussed in Chapter 10, Exploring Dynamic and Virtual Columns in MariaDB.

See also

· The full documentation of the XCOL data type can be found at https://mariadb.com/kb/en/connect-table-types-xcol-table-type/

· The full documentation of the Proxy table type can be found at https://mariadb.com/kb/en/connect-table-types-proxy-table-type/

Using the PIVOT table type

The PIVOT table type is very useful to sort and sum the columns in a table. It's similar to GROUP BY but with a more understandable layout. This sort of task is often used to sort and sum columns of data in a desktop spreadsheet program.

How to do it...

1. Launch the mysql command-line client application and connect to the test database on our MariaDB server. If the test database does not exist, create it first.

2. Run the following CREATE TABLE statement to create an expenses table:

3. CREATE TABLE expenses (

4. who varchar(64),

5. day varchar(10),

6. what varchar(64),

7. amount varchar(10)

8. );

9. Add some data to the table by executing the following command lines:

10.INSERT expenses VALUES

11. ("Daniel","2013-09-01","Clothing",42.50),

12. ("Amy","2013-09-02","Food",5.22),

13. ("Daniel","2013-09-01","Clothing",27.75),

14. ("Daniel","2013-09-03","Food",10.27),

15. ("Amy","2013-09-03","Gas",42.84),

16. ("Amy","2013-09-01","Food",15.01),

17. ("Amy","2013-09-01","Clothing",11.00),

18. ("Daniel","2013-09-01","Gas",34.10),

19. ("Amy","2013-09-02","Food",15.00),

20. ("Daniel","2013-09-01","Food",12.50),

21. ("Daniel","2013-09-02","Gas",32.20),

22. ("Daniel","2013-09-03","Clothing",82.80),

23. ("Amy","2013-09-03","Food",8.72),

24. ("Daniel","2013-09-03","Gas",15.08),

25. ("Daniel","2013-09-02","Clothing",17.27),

26. ("Amy","2013-09-03","Clothing",32.00) ;

27. Create a PIVOT table (change username to a user that has read access rights to the expenses table without needing a password):

28.CREATE TABLE expenses_pivot

29. ENGINE=CONNECT TABLE_TYPE=PIVOT TABNAME=expenses

30. OPTION_LIST='user=username';

31. Run the following SELECT statement to show our pivoted data. The result is shown in the following screenshot:

32.SELECT * FROM expenses_pivot;

How to do it...

How it works...

When our expenses table is pivoted, the default pivot places the contents of the what column next to the who and day columns. We could get the same data with the following query, but the output is not that well-organized and easy to read:

SELECT who, day, what, SUM(amount)

FROM expenses

GROUP BY who, day, what;

What the CONNECT storage engine is doing when it pivots a table is to first find a facts column; in our sample table, this column is the prices column. It then determines the column to pivot on; in our sample table, it automatically chose the what column. It then aggregates the facts (the prices) by summing the prices together by day and what. It then gets the distinct values of the what column (the pivot column) and creates a column for each value. It constructs all this in memory, and after it has everything sorted, it outputs the table.

One potential issue with this table type is that the CONNECT engine reads in the values to pivot on only when a PIVOT table is created. If those values change, we may get odd results or an error, for example, if we delete all rows from the expenses table that match Gasor if we insert a row where the what column is Electricity. The only remedy in this case is to DROP and then recreate the PIVOT table.

There's more...

CONNECT tries to guess the column we want to pivot on, but we can also specify the column using pivotcol=column_name in the OPTION_LIST option. We can also change the default function from SUM to something else, such as AVG, to compute the average amountinstead of the total amount spent on a day. For example, the following is a query creating a table that pivots on the day column and computes the average:

CREATE TABLE expenses_pivot2

ENGINE=CONNECT TABLE_TYPE=PIVOT TABNAME=expenses

OPTION_LIST='user=daniel,pivotcol=day,function=AVG';

SELECT * FROM expenses_pivot2;

We can also drop columns from our pivot tables; for example, we are interested in the grand totals for each day without regard to the who column. The following is an example that does this:

CREATE TABLE expenses_pivot3

ENGINE=CONNECT TABLE_TYPE=PIVOT TABNAME=expenses

OPTION_LIST='user=daniel';

ALTER TABLE expenses_pivot3 DROP COLUMN who;

SELECT * FROM expenses_pivot3;

See also

· The full documentation of the PIVOT data type is at https://mariadb.com/kb/en/connect-table-types-pivot-table-type/

Using the OCCUR table type

If a table contains many columns, all of which contain similar types of data, it can be difficult to answer questions which deal with comparing those values. This is where the OCCUR data type can prove useful.

How to do it...

1. Launch the mysql command-line client application and connect to the test database on our MariaDB server. If the test database does not exist, create it first.

2. Run the following CREATE TABLE statement to create a gadgets table:

3. CREATE TABLE gadgets (

4. who varchar(64),

5. phone int,

6. tablet int,

7. mp3player int,

8. camera int

9.

10.);

11. Add some data to our gadgets table using the following statement:

12.INSERT gadgets VALUES

13. ("Jim",1,2,1,2),

14. ("Bob",0,0,3,0),

15. ("Tom",1,1,1,0),

16. ("Joe",1,1,1,1),

17. ("Rob",2,2,0,0),

18. ("Tim",0,3,1,1)

19.;

20. Run the following statement to create our OCCUR table (replace username with a user that has read access rights to the gadget table without needing a password):

21.CREATE TABLE gadgets_occur (

22. who varchar(64) NOT NULL,

23. gadget varchar(16) NOT NULL,

24. number int NOT NULL

25.) ENGINE=CONNECT TABLE_TYPE=OCCUR TABNAME=gadgets

26.OPTION_LIST='user=username,occurcol=number,rankcol=gadget'

27.COLIST='phone,tablet,mp3player,camera';

28. Run the following SELECT statements to view our OCCUR table in action:

29.SELECT * FROM gadgets_occur;

30.SELECT * FROM gadgets_occur

31. WHERE gadget="tablet" and number > 1;

How it works...

When we create an OCCUR table, we are basically viewing the data in the source table in a different way. In our recipe, instead of columns with the number of gadgets each person owns being listed with one row per user, we flip it such that the number of each gadget is listed by itself.

When defining an OCCUR table, we first specify the tables we want our values stored in. The first column we defined, name, matches the same column in our source table. The other two are the columns we're mapping data to, so their names do not match. For this recipe, we use gadget and number since those names match the data being stored.

With our columns defined, we now need to tell the CONNECT storage engine how to map the data. After defining which table our OCCUR column is connecting to with the TABNAME option, the rest of the configuration happens in the OPTION_LIST and COLIST variables.

In the OPTION_LIST variable, we first define which column in our OCCUR table we want to use to hold the numbers we're tracking using the occurcol variable; for our recipe, we're using the number column. We then name the column using the rankcol variable that will hold the column names we're interested in. Lastly, with the COLIST option, we name those columns.

With the setup complete, we can easily ask questions that are much more difficult to ask our original table. For example, we can get a list of every person who owns multiples of the given gadget types using the following command line:

SELECT * FROM gadgets_occur

WHERE number > 1;

We can approximate this query on the gadgets table only with difficulty. For example, the following four queries give us the same data as the previous single query, but it is separated:

SELECT who,phone FROM gadgets WHERE phone > 1;

SELECT who,tablet FROM gadgets WHERE tablet > 1;

SELECT who,mp3player FROM gadgets WHERE mp3player > 1;

SELECT who,camera FROM gadgets WHERE camera > 1;

Going further, we can add UNION ALL in between the queries to combine all of the numbers together, but while this gives us the numbers, it doesn't tell us what gadget each number represents:

SELECT who,phone AS gadget FROM gadgets WHERE phone > 1 UNION ALL

SELECT who,tablet FROM gadgets WHERE tablet > 1 UNION ALL

SELECT who,mp3player FROM gadgets WHERE mp3player > 1 UNION ALL

SELECT who,camera FROM gadgets WHERE camera > 1;

A better solution is to use an OCCUR table.

There's more...

In our example as some people did not own certain gadgets, the values in the original table were set to zero. We omitted those by specifying NOT NULL for the columns in our OCCUR table.

See also

· The full documentation of the OCCUR data type can be found at https://mariadb.com/kb/en/connect-table-types-occur-table-type/

Using the WMI table type

Windows includes an interface through which various components of the operating system can provide useful system information. This interface is called Windows Management Instrumentation (WMI). The WMI table type allows us to easily connect to and display information from this interface.

Getting ready

As WMI is specific to the Windows operating system, this recipe uses Windows.

How to do it...

1. Launch the mysql command-line client application and connect to the test database on our MariaDB server. If the test database does not exist, create it first.

2. Run the following CREATE TABLE statement to create a WMI table:

3. CREATE TABLE alias (

4. friendlyname char(32) NOT NULL,

5. target char(64) NOT NULL

6. ) ENGINE=CONNECT TABLE_TYPE='WMI'

7. OPTION_LIST='Namespace=root\\cli,Class=Msft_CliAlias';

8. Run the following SELECT statement to query the table:

9. SELECT * FROM alias;

How it works...

The WMI table type maps rows to each instance of the related information. To accomplish this mapping, our column names must match the properties that we're interested in. This matching is case insensitive.

Apart from naming, when configuring a WMI table, we need to tell the CONNECT storage engine the Namespace and Class of the data that we are looking up. We define these as variables in the OPTION_LIST. Our recipe is actually a handy way to get a list of common classes.

There's more...

We don't always need to define tables when using the WMI data type as some namespaces have default values in the class specification that CONNECT can look up when creating the table. For example, we can create a table that queries the CSPROD class simply by naming it csprod:

CREATE TABLE csprod

ENGINE=CONNECT TABLE_TYPE='WMI';

Performance

Some WMI providers are slow to respond. There's not a whole lot that can be done about it as it is due to the way WMI works. It is something to be aware of.

A related issue is that some WMI providers output a lot of information. So much so that for some, it can bog down our system or cause a query to take far too long to complete. To combat this, CONNECT has an Estimate option that has a default value of 100. This option limits the output to 100 rows. For instances where we need to increase this, we can do so when creating our WMI tables. For most providers, keeping it at 100 is preferred.

Other information

A couple of other bits of information that are useful to know about WMI tables are that they cannot be indexed and that they are read-only.

See also

· The full documentation of the WMI data type can be found at https://mariadb.com/kb/en/connect-table-types-special-virtual-tables/

Using the MAC address table type

The MAC table type allows us to look up and query various bits of information about the network connection and network setup of our local machine.

Getting ready

The MAC table type works only in Windows. So, this recipe requires the Windows OS.

How to do it...

1. Launch the mysql command-line client application and connect to the test database on our MariaDB server. If the test database does not exist, create it first.

2. Run the following CREATE TABLE statement to create a MAC table:

3. CREATE TABLE host (

4. hostname varchar(132) flag=1,

5. domain varchar(132) flag=2,

6. ipaddr char(16) flag=15,

7. gateway char(16) flag=17,

8. dhcp char(16) flag=18,

9. leaseexp datetime flag=23

10.) ENGINE=CONNECT TABLE_TYPE=MAC;

11. Run the following SELECT statement to query the information about our current network settings:

12.SELECT * FROM host;

How it works...

Information on our network cards and their current settings is actually pretty easy to get, so the MAC table type is more of a convenient feature compared to other CONNECT storage engine table types. That said, it can be a very useful way to inform our applications about their network connection as all of that data is now available right inside the database.

There are many network parameters that we can query, and the ones that we connect to in our table definition are set using the flag= option when defining a MAC table.

The following table lists all of the values, their flags, and their data definitions:

Flag

Value

Data type

1

Host Name

varchar(132)

2

Domain

varchar(132)

3

DNS address

varchar(24)

4

Node type

int(1)

5

Scope ID

varchar(256)

6

Routing

int(1)

7

Proxy

int(1)

8

DNS

int(1)

10

Name

varchar(260)

11

Description

varchar(132)

12

MAC address

char(24)

13

Type

int(3)

14

DHCP

int(1)

15

IP address

char(16)

16

SUBNET mask

char(16)

17

GATEWAY

char(16)

18

DHCP server

char(16)

19

Have WINS

int(1)

20

Primary WINS

char(16)

21

Secondary WINS

char(16)

22

Lease obtained

datetime

23

Lease expires

datetime

There's more...

Flag values of less than 10 are specific to the computer. Other flag values are specific to the network card or cards in the computer (which may or may not be removable).

See also

· The full documentation of the MAC table type is at: https://mariadb.com/kb/en/connect-table-types-special-virtual-tables/