Adding Functions and Commands to MySQL - Extending MySQL - Expert MySQL, Second Edition (2012)

Expert MySQL, Second Edition (2012)

PART 2. Extending MySQL

CHAPTER 7. Adding Functions and Commands to MySQL

One of the greatest challenges facing systems integrators is overcoming the limitations of the systems being integrated. This is usually a result of the system having limitations with, or not having certain functions or commands that are needed for, the integration. Often, this means getting around the problem by creating more “glue” programs to translate or augment existing functions and commands.

MySQL developers recognized this need and added flexible options in the MySQL server to add new functions and commands. For example, you may need to add functions to perform some calculations or data conversions, or you may need a new command to provide specific data for administration.

This chapter introduces you to the options available for adding functions and shows you how to add your own SQL commands to the server. We will explore user-defined functions, native functions, and new SQL commands. Much of the background material for this chapter has been covered in previous chapters. Feel free to refer back to those chapters as you follow along.

Adding User-Defined Functions

User-defined functions (UDF) have been supported by MySQL for some time. A UDF is a new function (calculation, conversion, etc.) that you can add to the server, thereby expanding the list of available functions that can be used in SQL commands. The best thing about UDFs is that they can be dynamically loaded at runtime. Furthermore, you can create your own libraries of UDFs and use them in your enterprise or even give them away for free (as open source). This is perhaps the first place systems integrators look for extending the MySQL server. MySQL engineers had another genius-level idea with the UDF mechanism.

User-defined functions can be used anywhere the SQL language permits an expression. For example, you can use UDFs in stored procedures and SELECT statements. They are an excellent way to expand your server without having to modify the server source code. In fact, you can define as many UDFs as you please and even group them together to form libraries of functions. Each library is a single file containing source code that is compiled into a library (.so in Linux or .dll in Windows).

The mechanism is similar to the plug-in interface and, in fact, predates it. The UDF interface utilizes external, dynamically loadable object files to load and unload UDFs. The mechanism uses a CREATE FUNCTION command to establish a connection to the loadable object file on a per-function basis and a DROP FUNCTION command to remove the connection for a function. Let’s take a look at the syntax for these commands.

CREATE FUNCTION Syntax

The CREATE FUNCTION command registers the function with the server, placing a row in the mysql.func table. The syntax is:

CREATE FUNCTION function_name RETURNS [STRING | INTEGER | REAL | DECIMAL] SONAME "mylib.so";

The function_name parameter represents the name of the function you are creating. The return type can be one of STRING, INTEGER, REAL, or DECIMAL and the SONAME refers to the name of the library. The CREATE FUNCTION command tells the MySQL server to create a mapping of the function name in the command (function_name) to the object file. When the function is invoked, the server calls the function in the library for execution.

DROP FUNCTION Syntax

The DROP FUNCTION command unregisters the function with the server by removing the associated row from the func table in the selected database. The syntax is shown here. The function_name parameter represents the name of the function you are creating.

DROP FUNCTION function_name;

Let’s take a look at how you can create a UDF library and use it in your own MySQL server installations. We will start off by modifying the existing sample UDF library. Once you are familiar with how the functions are coded, it is an elementary exercise to create a new source file and add it to the server build files (CMakeLists.txt).

Creating a User-Defined Library

There are two types of user-defined functions:

· You can create functions that operate as a single call that evaluates a set of parameters and returns a single result.

· You can create functions that operate as aggregates being called from within grouping functions. For instance, you can create a UDF that converts one data type into another, such as a function that changes a date field from one format to another, or you can create a function that performs advanced calculations for a group of records, such as a sum of squares function. UDFs can return only integers, strings, or real values.

· You can create functions that provide values used in SELECT statements.

The single-call UDF is the most common. They are used to perform an operation on one or more parameters. In some cases, no parameters are used. For example, you could create a UDF that returns a value for a global status or like SERVER_STATUS(). This form of UDF is typically used in field lists of SELECT statements or in stored procedures as helper functions.

Aggregate UDF functions are used in GROUP BY clauses. When they are used, they are called once for each row in the table and again at the end of the group.

The process for creating a UDF library is to create a new project that exposes the UDF load/unload methods (xxx_init and xxx_deinit, where xxx is the name of the function) and the function itself. The xxx_init and xxx_deinit functions are called once per statement. TheXXX function would be called once per row. If you are creating an aggregate function, you also need to implement the grouping functions xxx_clear and xxx_add. The xxx_clear function is called to reset the value (at the start of a group). The xxx_add function is called for each row in the grouping, and the function itself is called at the end of the group processing. Thus, the aggregate is cleared, then data are added for each call to add. Finally, the function itself is called to return the value.

Once the functions are implemented, you compile the file and copy it to the plugin directory of your server installation. You can load and use the functions using the CREATE FUNCTION command. Listing 7-1 shows a sample set of methods for a UDF.

Listing 7-1. Sample UDF Methods

/*
Simple example of how to get a sequences starting from the first
argument or 1 if no arguments have been given
*/

my_bool sequence_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
if (args->arg_count > 1)
{
strmov(message,"This function takes none or 1 argument");
return 1;
}
if (args->arg_count)
args->arg_type[0]= INT_RESULT; /* Force argument to int */

if (!(initid->ptr=(char*) malloc(sizeof(longlong))))
{
strmov(message,"Couldn't allocate memory");
return 1;
}
memset(initid->ptr, 0, sizeof(longlong));
/*
sequence() is a non-deterministic function : it has different value
even if called with the same arguments.
*/
initid->const_item=0;
return 0;
}

void sequence_deinit(UDF_INIT *initid)
{
if (initid->ptr)
free(initid->ptr);
}

longlong sequence(UDF_INIT *initid __attribute__((unused)), UDF_ARGS *args,
char *is_null __attribute__((unused)),
char *error __attribute__((unused)))
{
ulonglong val=0;
if (args->arg_count)
val= *((longlong*) args->args[0]);
return ++*((longlong*) initid->ptr) + val;
}

Oracle has provided an example UDF project, named udf_example.cc and located in the /sql folder, that contains samples of all the types of functions you may want to create. This provides an excellent starting point for adding your own functions. The sample functions include:

· A metaphon function that produces a soundex-like operation on strings

· A sample function that returns a double value that is the sum of the character-code values of the arguments divided by the sum of the length of all the arguments

· A sample function that returns an integer that is the sum of the lengths of the arguments

· A sequence function that returns the next value in a sequence based on the value passed

· An example aggregate function that returns the average cost from the list of integer arguments (quantity) and double arguments (cost)

Depending on your needs, you may find some of these examples useful.

Let’s begin by modifying the example UDF project. Locate the udf_example.cc file located in the /sql directory off the root of your source-code tree and open it in your favorite editor. Since the udf_example library is included in the cmake files, compiling it is very easy. You simply execute make once you’ve finished your edits. On Windows, you can rebuild the mysql.sln file using Visual Studio.

image Caution Windows users will have to remove the networking UDFs from the library. These are not supported directly on Windows. Comment out the functions if you encounter errors about missing header files or external functions.

If you encounter errors during the compilation, go back and correct them, and try the compile again. The most likely cause is a mistyped name, incorrect code replacement, or incorrect path.

Now that the library is compiled, let’s test the load and unload operations. This will ensure that the library has been properly compiled and is located in the correct location. Open a MySQL client window and issue the CREATE FUNCTION and DROP FUNCTION commands to load all of the functions in the library. Listing 7-2 shows the commands for loading and unloading the first five functions. The listing shows the commands for Windows; replace udf_example.dll with udf_example.so on Linux. The output will be the same on any platform on which you execute the functions.

Listing 7-2. Sample CREATE and DROP FUNCTION Commands

CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.dll";
CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.dll";
CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME "udf_example.dll";
CREATE FUNCTION sequence RETURNS INTEGER SONAME "udf_example.dll";
CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "udf_example.dll";

DROP FUNCTION metaphon;
DROP FUNCTION myfunc_double;
DROP FUNCTION myfunc_int;
DROP FUNCTION sequence;
DROP FUNCTION avgcost;

Listings 7-3 and 7-4 show the correct results when you run the CREATE FUNCTION and DROP FUNCTION commands shown earlier.

Listing 7-3. Installing the Functions

mysql> CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.dll";Query OK, 0 rows affected (0.00 sec)

mysql> CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.dll";Query OK, 0 rows affected (0.00 sec)

mysql> CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME "udf_example.dll";Query OK, 0 rows affected (0.00 sec)

mysql> CREATE FUNCTION sequence RETURNS INTEGER SONAME "udf_example.dll";Query OK, 0 rows affected (0.00 sec)

mysql> CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "udf_example.dll";Query OK, 0 rows affected (0.00 sec)

Listing 7-4. Uninstalling the Functions

mysql> DROP FUNCTION metaphon;Query OK, 0 rows affected (0.00 sec)

mysql> DROP FUNCTION myfunc_double;Query OK, 0 rows affected (0.00 sec)

mysql> DROP FUNCTION myfunc_int;Query OK, 0 rows affected (0.00 sec)

mysql> DROP FUNCTION sequence;Query OK, 0 rows affected (0.00 sec)
mysql> DROP FUNCTION avgcost;Query OK, 0 rows affected (0.00 sec)

Now, let’s run the commands and see if they work. Go back to your MySQL client window and run the CREATE FUNCTION commands again to load the UDFs. Listing 7-5 shows sample execution of each of the first five UDFs in the library. Feel free to try out the commands as shown. Your results should be similar.

Listing 7-5. Example Execution of UDF Commands

mysql> SELECT metaphon("This is a test.");
+−−---------------------------+
| metaphon("This is a test.") |
+−−---------------------------+
| 0SSTS |
+−−---------------------------+
1 row in set (0.00 sec)

mysql> SELECT myfunc_double(5.5, 6.1);
+−−-----------------------+
| myfunc_double(5.5, 6.1) |
+−−-----------------------+
| 50.17 |
+−−-----------------------+
1 row in set (0.01 sec)

mysql> SELECT myfunc_int(5, 6, 8);
+−−-------------------+
| myfunc_int(5, 6, 8) |
+−−-------------------+
| 19 |
+−−-------------------+
1 row in set (0.00 sec)

mysql> SELECT sequence(8);
+−−-----------+
| sequence(8) |
+−−-----------+
| 9 |
+−−-----------+
1 row in set (0.00 sec)

mysql> CREATE TABLE testavg (order_num int key auto_increment, cost double, qty int);
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO testavg (cost, qty) VALUES (25.5, 17);Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO testavg (cost, qty) VALUES (0.23, 5);Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO testavg (cost, qty) VALUES (47.50, 81);Query OK, 1 row affected (0.00 sec)
mysql> SELECT avgcost(qty, cost) FROM testavg;
+−−------------------+
| avgcost(qty, cost) |
+−−------------------+
| 41.5743 |
+−−------------------+
1 row in set (0.03 sec)

The last few commands show a very basic use of the avgcost() aggregate function. You would typically use aggregate functions when using the GROUP BY clause.

Adding a New User-Defined Function

Let’s now add a new UDF to the library. What if you are working on an integration project and the requirements call for expressing dates in the Julian format? The Julian conversion simply takes the day of the year (number of days elapsed since December 31 of the previous year) and adds the year to form a numeric value, such as DDDYYYY. In this case, you need to add a function that takes a month, date, and year value and returns the date expressed as a Julian date. The function should be defined as:

longlong julian(int month, int day, int year);

I kept the function simple and used three integers. The function could be implemented in any number of ways (e.g., accepting a date or string value). Now let’s add the JULIAN function to the UDF library that you just built.

This is what makes creating your own UDF library so valuable. Any time you encounter a need for a new function, you can just add it to the existing library without having to create a new project from scratch.

The process for adding a new UDF begins with adding the function declarations to the extern section of the UDF library source code and then implementing the functions. You can then recompile the library and deploy it to the plugin directory of your MySQL server installation. Let’s walk through that process with the JULIAN function.

image Note Use SHOW VARIABLES LIKE 'plugin%'; to discover the plugin directory.

Open the udf_example.cc file and add the function declarations. Recall that you need definitions for the julian_init(), julian_deinit(), and julian() functions. The julian_init() function takes three arguments:

· UDF_INIT, a structure that the method can use to pass information among the methods

· UDF_ARGS, a structure that contains the number of arguments, the type of arguments, and the arguments themselves

· A string that the method should return if an error occurs

The julian() method takes four arguments:

· The UDF_INIT structure completed by the julian_init() function

· A UDF_ARGS structure that contains the number of arguments, the type of arguments, and the arguments

· A char pointer to a variable that is set to 1 if the result is null

· The message that is sent to the caller if an error occurs

The julian_deinit() function uses the UDF_INIT structure completed by the julian_init() function.

When a UDF is called from the server, a new UDF_INIT structure is created and passed to the function, the arguments are placed in the UDF_ARGS structure, and the julian_init() function is called. If that function returns without errors, the julian() function is called with theUDF_INIT structure from the julian_init() function. After the julian() function completes, the julian_deinit() function is called to clean up the values saved in the UDF_INIT structure. Listing 7-6 shows an excerpt of the declaration section of the file with theJULIAN functions added. This section is denoted with the C_MODE_START and C_MODE_END macros, and it is located at the top of the file. We include out modification markers to ensure that others (or ourselves in the future) know that we modified this file intentionally.

Listing 7-6. The Declarations for JULIAN (udf_example.cc)

C_MODE_START;
...

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section declares the methods for the Julian function */
my_bool julian_init(UDF_INIT *initid, UDF_ARGS *args, char *message);
longlong julian(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error);
void julian_deinit(UDF_INIT *initid);
/* END CAB MODIFICATION */
...
C_MODE_END;

image Note We show the macros delineated with the ellipse to indicate where these statements should be placed.

You can now add the implementation for these functions. I find it helpful to copy the example functions that match my return types and then modify them to match my needs. The julian_init() function is responsible for initializing variables and checking correct usage. Since theJULIAN function requires three integer parameters, you need to add appropriate error handling to enforce this. Listing 7-7 shows the implementation of the julian_init() function. You can insert this method near the end of the udf_example.cc fle.

Listing 7-7. Implementation for the julian_init() Function (udf_example.cc)

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section implements the Julian initialization function */
my_bool julian_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
if (args->arg_count != 3) /* if there are not three arguments */
{
strcpy(message, "Wrong number of arguments: JULIAN() requires 3 arguments.");
return 1;
}
if ((args->arg_type[0] != INT_RESULT) ||
(args->arg_type[1] != INT_RESULT) ||
(args->arg_type[2] != INT_RESULT))
{
strcpy(message, "Wrong type of arguments: JULIAN() requires 3 integers.");
return 1;
}
return 0;
}
/* END CAB MODIFICATION */

Notice in Listing 7-7 that the argument count is checked first, followed by type checking of the three parameters. This ensures that they are all integers. Savvy programmers will note that the code should also check for ranges of the values. Since the code does not check ranges of the parameters, this could lead to unusual or invalid return values. I leave this to you to complete should you decide to implement the function in your library. It is always a good practice to check range values when the domain and range of the parameter values is known.

The julian_deinit() function isn’t really needed, because there are no memory or variables to clean up. You can implement an empty function just to complete the process. It is always a good idea to code this function even if you don’t need it. Listing 7-8 shows the implementation for this function. Since we didn’t use any new variables or structures, the implementation is simply an empty function. If there had been variables or structures created, you would deallocate them in this function.

Listing 7-8. Implementation for the julian_deinit() Function (udf_example.cc)

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section implements the Julian deinitialization function */
void julian_deinit(UDF_INIT *initid)
{
}
/* END CAB MODIFICATION */

The real work of the JULIAN function occurs in the julian() implementation. Listing 7-9 shows the completed julian() function.

image Note Some sophisticated Julian-calendar methods calculate the value as elapsed days since a start date (usually in the 18th or 19th century). This method assumes that the need is for a Julian day/year value.

Listing 7-9. Implementation for the julian() Function (udf_example.cc)

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section the Julian function */
longlong julian(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error)
{
longlong jdate = 0;
static int DAYS_IN_MONTH[] = {31, 28, 31, 30, 31, 30, 31,
31, 30, 31, 30, 31};
longlong month = 0;
longlong day = 0;
longlong year = 0;
int i;

/* copy memory from the arguments */
month = *(longlong *)args->args[0];
day = *(longlong *) args->args[1];
year = *(longlong *) args->args[2];

/* add the days in the month for each prior month */
for (i = 0; i < month - 1; i++)
jdate += DAYS_IN_MONTH[i];

/* add the day of this month */
jdate += day;

/* find the year */
if (((year % 100) != 0) && ((year % 4) == 0))
jdate++; /*leap year!*/

/* shift day of year to left */
jdate *= 10000;

/* add the year */
jdate += year;
return jdate;
}
/* END CAB MODIFICATION */

Notice the first few lines after the variable declarations. This is an example of how you can marshal the values from the args array to your own local variables. In this case, I copied the first three parameters to integer values. The rest of the source code is the calculation of the Julian-date value that is returned to the caller.

image Note The calculation for a leap year is intentionally naïve. I leave a more correct calculation for you as an exercise. Hint: When should the jdate variable be incremented?

If you are using Windows, you also need to modify the udf_example.def file and add the methods for the JULIAN function. Listing 7-10 shows the updated udf_example.def file.

Listing 7-10. The udf_example.def Source Code

LIBRARY MYUDF
DESCRIPTION 'MySQL Sample for UDF'
VERSION 1.0
EXPORTS
metaphon_init
metaphon_deinit
metaphon
myfunc_double_init
myfunc_double
myfunc_int
myfunc_int_init
sequence_init
sequence_deinit
sequence
avgcost_init
avgcost_deinit
avgcost_reset
avgcost_add
avgcost_clear
avgcost
julian_init
julian_deinit
julian

Now you can compile the library. Once the library is compiled, copy the library to the plugin directory of your MySQL server installation. If you are running Linux, you will be copying the file udf_example.so; if you are running Windows, you will be copying the fileudf_example.dll from the /udf_example/debug directory.

I recommend stopping the server before you copy the file and restarting it after the copy is complete. This is because it is possible (depending on where you placed your new function) that the object file could be different from the previous compilation. It is always a good practice to follow any time you make changes to the executable code.

Go ahead and copy the library and install the function, then enter the CREATE FUNCTION command and try out the new function. Listing 7-11 shows an example of installing and running the JULIAN function on Windows.

Listing 7-11. Sample Execution of the julian() Function

mysql> CREATE FUNCTION julian RETURNS INTEGER SONAME "udf_example.dll";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT JULIAN(8, 13, 2012);

+−−-------------------+
| JULIAN(8, 13, 2012) |
+−−-------------------+
| 2262012 |
+−−-------------------+
1 row in set (0.00 sec)

WHAT IF I WANT TO CREATE MY OWN LIBRARY?

You can use the udf_example library as the start of your own library or copy it and create your UDF libraries. To compile your own UDF library in the /sql folder, edit the /sql/CMakeLists.txt file and copy the following block, replacing udf_example for the name of your library. You will need to re-run the cmake command before the make command.

IF(WIN32 OR HAVE_DLOPEN AND NOT DISABLE_SHARED)
ADD_LIBRARY(udf_example MODULE udf_example.cc)
SET_TARGET_PROPERTIES(udf_example PROPERTIES PREFIX "")
# udf_example depends on strings
IF(WIN32)
IF(MSVC)
SET_TARGET_PROPERTIES(udf_example PROPERTIES LINK_FLAGS "/DEF:${CMAKE_CURRENT_SOURCE_DIR}/udf_example.def")
ENDIF()
TARGET_LINK_LIBRARIES(udf_example strings)
ELSE()
# udf_example is using safemutex exported by mysqld
TARGET_LINK_LIBRARIES(udf_example mysqld)
ENDIF()
ENDIF()

UDF libraries can help you expand the capabilities of your server to meet almost any computational need. The libraries are easy to create and require only a small number of functions for implementation. Except for the need to have the dynamically loaded version for Linux, UDFs work very well with few special configuration requirements.

Adding Native Functions

Native functions are those that are compiled as part of the MySQL server. They can be used without having to load them from a library, and they are therefore always available. They also have direct access to the server internals, which UDFs do not, permitting the native functions to respond to or initiate system operations. There is a long list of available native functions ranging from ABS() to UCASE(), and many more. For more information about the currently supported set of native functions, consult the online MySQL reference manual.

If the function that you want to use isn’t available (it’s not one of the built-in native functions), you can add your own native function by modifying the server source code. Now that you have a JULIAN function, wouldn’t it be best if there were an equivalent function to convert a Julian date back to a Gregorian date? I’ll show you how to add a new native function in this section.

The process for adding a new native function involves changing the mysqld source-code files. We need to create two classes: Item_func_gregorian and Create_func_gregorian. The server instantiates Item_func_gregorian once for each SQL statement that invokes the function; then it calls a member function of this class to do the actual computation, once for each row of the result set. Create_func_gregorian is instantiated only once, when the server starts. This class contains only a factory member function that the server calls when it needs to create an object of Item_func_gregorian..The files that you need to change are summarized in Table 7-1.

Table 7-1. Changes to mysqld Source-code Files for Adding a New Native Function

File

Description of Changes

item_create.cc

Add the function-class definition for registering the function, helper methods, and symbol definition.

item_str_func.h

Add the function class definition.

item_str_func.cc

Add the implementation for the Gregorian function.

image Note Files are located in the /sql directory off the root of the source-code tree.

WHAT HAPPENED TO THE LEX1 FILES?

Readers familiar with earlier versions of MySQL prior to version 5.6.5 may recall the lexical analyzer files lex* and sql_parse.yy. These files are still in the source-code files, but the MySQL developers have made it much easier to add new functions and commands by almost completely eliminating the need to modify the lex and yacc code. As we will see in a future section, we still must do this for SQL commands, but for functions and similar extensions, the code was changed to make it easier to modify and remove the restriction of creating new reserved words. The new reserved words could impose restrictions on users who may want to use the reserved words in the SQL statements.

Let’s get started adding the Gregorian-function registration code. Open the item_create.cc file and add the instantiation as shown in Listing 7-12. You can add this near line number 2000, right after one of the other Create_func_* class definitions.

Listing 7-12. Add the Create_func_gregorian class

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add gregorian class definition */
class Create_func_gregorian : public Create_func_arg1
{
public:
virtual Item *create(THD *thd, Item *arg1);

static Create_func_gregorian s_singleton;

protected:
Create_func_gregorian() {}
virtual ∼Create_func_gregorian() {}
};
/* END CAB MODIFICATION */

The code from Listing 7-12 creates a class that the parser can use to associate the Gregorian function (defined later) with the GREGORIAN symbol (see Listing 7-14). The Create function here creates a singleton (a single instantiation of the class that all threads use) of theCreate_func_gregorian class that the parser can use to execute the Gregorian function.

Next, we add the code for the Create_function_gregorian method itself. Listing 7-13 shows the modifications for this code. You can place this code in the file around line number 4700 after another Create_func_ method. This code is used to return the instance of the singleton and execute the Gregorian function and return its result. Here, then, is where the Gregorian function is called and the result returned to the user.

Listing 7-13. Add the Create_func_gregorian method

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add gregorian singleton create method */
Create_func_gregorian Create_func_gregorian::s_singleton;

Item*
Create_func_gregorian::create(THD *thd, Item *arg1)
{
return new (thd->mem_root) Item_func_gregorian(arg1);
}
/* END CAB MODIFICATION */

Last, we must add the GREGORIAN symbol. Listing 7-14 shows the code needed to define the symbol. You must place this in the section where the following array is defined.

static Native_func_registry func_array[] = {

I placed the code after the GREATEST symbol definition because the array is intended to define symbols alphabetically.

Listing 7-14. Add the GREGORIAN symbol

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add gregorian symbol */
{ { C_STRING_WITH_LEN("GREGORIAN") }, BUILDER(Create_func_gregorian)},
/* END CAB MODIFICATION */

Take a look at the symbol definition. Notice how Listing 7-14 calls the macro BUILDER with the Create_func_gregorian class. Calling the macro is how the parser associates our Gregorian code with the GREGORIAN symbol. You may be wondering how that association is used to tell the parser what to do when the symbol is detected. The mechanism used is called a lexical hash.

The lexical hash is an implementation of an advanced hashing lookup procedure from the works of Knuth.2 It is generated using a command-line utility that implements the algorithm. The utility, gen_lex_hash, has one source-code file named gen_lex_hash.cc. This program produces a file that you will use to replace the existing lexical-hash header file (lex_hash.h). I leave exploration of the BUILDER macro to you for further study.

Now that the create function is implemented, you need to create a new class to implement the code for the function. This is where most developers get very confused. Oracle has provided a number of the Item_xxx_func base (and derived) classes for you to use. For example, derive your class from Item_str_func for functions that return a string and Item_int_func for those that return an integer. Similarly, there are other classes for functions that return other types. This is a departure from the dynamically loadable UDF interface and is the main reason you would choose to create a native function versus a dynamically loadable one. For more information about what Item_xxx_func classes there are, see the item_func.h file in the /sql directory off the root of the source-code tree.

Since the Gregorian function will return a string, you need to derive from the Item_str_func class, define the class in item_strfunc.h, and implement the class in item_strfunc.cc. Open the item_strfunc.h file and add the class definition to the header file, as shown inListing 7-15.

Listing 7-15. Modifications to the item_strfunc.h File

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add gregorian Item function code */
class Item_func_gregorian :public Item_str_func
{
String tmp_value;
public:
Item_func_gregorian(Item *a) :Item_str_func(a) {}
const char *func_name() const { return "gregorian"; }
String *val_str(String *);
void fix_length_and_dec()
{
max_length=30;
}
};
/* END CAB MODIFICATION */

Notice that the class in Listing 7-15 has only four functions that must be declared. The minimal functions needed are the constructor (Item_func_gregorian), a function that contains the code that performs the conversion (val_str), a function that returns the name (func_name), and a function to set the maximum length of the string argument (fix_length_and_dec). You can add any others that you might need, but these four are required for functions that return strings.

Other item base (and derived) classes may require additional functions, such as val_int(), val_double(), and so on. Check the definition of the class you need to derive from in order to identify the methods that must be overridden; these are known as virtual functions.

Notice also that we implement in Listing 7-15 a fix_length_and_dec() method, which is used by the server to set the maximum length. In this case, we choose 30, which is largely arbitrary but large enough to not be an issue with the values we return.

Let’s add the class implementation. Open the item_strfunc.cc file and add the implementation of the Gregorian-class functions as shown in Listing 7-16. You need to implement the main function, val_str(), which does the work of the Julian-to-Gregorian operation. You can place this in the file around line 4030 after another val_str() implementation.

Listing 7-16. Modifications to the item_strfunc.cc File

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add gregorian function code */
String *Item_func_gregorian::val_str(String *str)
{
static int DAYS_IN_MONTH[] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
longlong jdate = args[0]->val_int();
int year = 0;
int month = 0;
int day = 0;
int i;
char cstr[30];
cstr[0] = 0;
str->length(0);

/* get date from value (right 4 digits */
year = jdate - ((jdate / 10000) * 10000);

/* get value for day of year and find current month*/
day = (jdate - year) / 10000;
for (i = 0; i < 12; i++)
if (DAYS_IN_MONTH[i] < day)
day = day - DAYS_IN_MONTH[i]; /* remainder is day of current month */
else
{
month = i + 1;
break;
}

/* format date string */
sprintf(cstr, "%d", month);
str->append(cstr);
str->append("/");
sprintf(cstr, "%d", day);
str->append(cstr);
str->append("/");
sprintf(cstr, "%d", year);
str->append(cstr);
if (null_value)
return 0;
return str;
}
/* END CAB MODIFICATION */

Compiling and Testing the New Native Function

Recompile your server and restart it. If you encounter errors during compile, go back and check the statements you entered for errors. Once the errors are corrected and you have a new executable, stop your server, copy the new executable to the location of your MySQL installation, and restart the server. You can now execute the native function Gregorian, as shown in Listings 7-17 and 7-18. To test the Gregorian function for correctness, run the julian() command first and use that value as input to the gregorian() function.

Listing 7-17. Running the julian() Function

mysql> select julian(8,15,2012);

+−−-----------------+
| julian(8,15,2012) |
+−−-----------------+
| 2272012 |
+−−-----------------+
1 row in set (0.00 sec)

Listing 7-18. Running the gregorian() Function

mysql> select gregorian(2272012);

+−−------------------+
| gregorian(2272012) |
+−−------------------+
| 8/15/2012 |
+−−------------------+
1 row in set (0.00 sec)

That’s about it for adding native functions. Now that you have had an introduction to creating native functions, you can further plan your integration with MySQL to include customizations to the server source code.

As an exercise, consider adding a new function that calculates the number years, months, weeks, days, and hours until a given date and time. This function could be used to tell you how long you need to wait for the event to occur. In many ways, this function will be a sort of countdown, such as a countdown until your next birthday, anniversary, or perhaps to your retirement.

Adding SQL Commands

If the native SQL commands do not meet your needs and you cannot solve your problems with user-defined functions, you may have to add a new SQL command to the server. This section shows you how to add your own SQL commands to the server.

Many consider adding new SQL commands to be the most difficult extension of all to the MySQL server source code. As you will see, the process isn’t as complicated as it is tedious. To add new SQL commands, you must modify the parser (in sql/ql_yacc.yy) and add the commands to the SQL command-processing code (in sql/sql_parse.cc), sometimes called the “big switch.”

When a client issues a query, a new thread is created, and the SQL statement is forwarded to the parser for syntactic validation (or rejection due to errors). The MySQL parser is implemented using a large Lex-YACC script that is compiled with Bison, and the symbols are converted to a hash for use in C code with a MySQL utility named gen_lex_hash. The parser constructs a query structure used to represent the query statement (SQL) in memory as a data structure that can be used to execute the query. Thus, to add a new command to the parser, you will need a copy of GNU Bison. You can download Bison from the GNU Web site3 and install it.

WHAT IS LEX AND YACC AND WHO’S BISON?

Lex stands for “lexical analyzer generator” and is used as a parser to identify tokens and literals as well as syntax of a language. YACC stands for “yet another compiler compiler” and is used to identify and act on the semantic definitions of the language. The use of these tools together with Bison (a YACC-compatible parser generator that generates C source code from the Lex/YACC code) provides a rich mechanism of creating subsystems that can parse and process language commands. Indeed, that is exactly how MySQL uses these technologies.

Let’s assume that you want to add a command to the server to show the current disk usage of all of the databases in the server. Although external tools can retrieve this information4, you desire a SQL equivalent function that you can easily use in your own database-driven applications. Let’s also assume you want to add this as a SHOW command. Specifically, you want to be able to execute the command SHOW DISK_USAGE and retrieve a result set that has each database listed as a row along with the total size of all of the files (tables) listed in kilobytes.

Adding a new SQL command involves adding symbols to the lexical analyzer and adding the SHOW DISK_USAGE command syntax to the YACC parser (sql_yacc.yy). The new parser must be compiled into a C program by Bison and then a new lexical hash created using thegen_lex_hash utility described earlier. The code for the parser to direct control to the new command is placed in the large case statement in sql_parse.cc with a case for the new command symbol.

Let’s begin with adding the symbols to the lexical analyzer. Open the lex.h file and locate the static SYMBOL symbols[] array. You can make the symbol anything you want, but it should be something meaningful (like all good variable names). Be sure to choose a symbol that isn’t already in use. In this case, use the symbol DISK_USAGE. This acts like a label to the parser, identifying it as a token. Place a statement in the array to direct the lexical analyzer to generate the symbol and call it DISK_USAGE_SYM. The list is in (roughly) alphabetic order, so place it in the proper location. Listing 7-19 shows an excerpt of the array with the symbols added.

Listing 7-19. Updates to the lex.h File for the SHOW DISK_USAGE Command

static SYMBOL symbols[] = {
{ "&&", SYM(AND_AND_SYM)},
...
{ "DISK", SYM(DISK_SYM)},
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section identifies the tokens for the SHOW DISK_USAGE command*/
{ "DISK_USAGE", SYM(DISK_USAGE_SYM)},
/* END CAB MODIFICATION */
{ "DISTINCT", SYM(DISTINCT)},
...

The next thing you need to do is add a mnemonic to identify the command. This mnemonic will be used in the parser to assign to the internal-query structure and to control the flow of execution via a case in the large switch statement in the sql_parse.cc file. Open the sql_cmd.hfile and add the new command to the enum_sql_command enumeration. Listing 7-20 shows the modifications with the new command mnemonic.

Listing 7-20. Changes to the sql_cmd.h File for the SHOW DISK_USAGE Command

enum enum_sql_command {
...
SQLCOM_SHOW_SLAVE_HOSTS, SQLCOM_DELETE_MULTI, SQLCOM_UPDATE_MULTI,
SQLCOM_SHOW_BINLOG_EVENTS, SQLCOM_DO,
SQLCOM_SHOW_WARNS, SQLCOM_EMPTY_QUERY, SQLCOM_SHOW_ERRORS,
SQLCOM_SHOW_STORAGE_ENGINES, SQLCOM_SHOW_PRIVILEGES,
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add SQLCOM_SHOW_DISK_USAGE reference */
SQLCOM_SHOW_STORAGE_ENGINES, SQLCOM_SHOW_PRIVILEGES, SQLCOM_SHOW_DISK_USAGE,
/* END CAB MODIFICATION */
SQLCOM_HELP, SQLCOM_CREATE_USER, SQLCOM_DROP_USER, SQLCOM_RENAME_USER,
SQLCOM_REVOKE_ALL, SQLCOM_CHECKSUM,
SQLCOM_CREATE_PROCEDURE, SQLCOM_CREATE_SPFUNCTION, SQLCOM_CALL,
...

Now that you have the new symbol and the command mnemonic, add code to the sql_yacc.yy file to define the new token that you used in the lex.h file, and add the source code for the new SHOW DISK_USAGE SQL command. Open the sql_yacc.yy file and add the new token to the list of tokens (near the top). These are defined (roughly) in alphabetical order, so place the new token in the proper order. Listing 7-21 shows the modifications to the sql_yacc.yy file.

Listing 7-21. Adding the Token to the sql_yacc.yy File

...
%token DISK_SYM
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add DISK_USAGE symbol */
%token DISK_USAGE_SYM
/* END CAB MODIFICATION */
%token DISTINCT /* SQL-2003-R */
%token DIV_SYM
%token DOUBLE_SYM /* SQL-2003-R */
...

You also need to add the command syntax to the parser YACC code (also in sql_yacc.yy). Locate the show: label and add the command as shown in Listing 7-22.

Listing 7-22. Parser Syntax Source Code for the SHOW DISK_USAGE Command

/* Show things */

show:
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add show disk usage symbol parsing */
SHOW DISK_USAGE_SYM
{
LEX *lex=Lex;
lex->sql_command= SQLCOM_SHOW_DISK_USAGE;
}
|
/* END CAB MODIFICATION */
SHOW
{

image Caution Don’t forget the | before the original SHOW statement.

You’re probably wondering what this code does. It looks rather benign, and yet it is important to get this part right. In fact, this is the stage at which most developers give up and fail to add new commands.

The set of code identified by the show: label is executed whenever the SHOW token is identified by the parser. YACC code is almost always written this way.5 The SHOW DISK_USAGE_SYM statement indicates the only valid syntax that has the SHOW and DISK_USAGE tokens appearing (in that order). If you look through the code, you’ll find other similar syntactical arrangements. The code block following the syntax statement gets a pointer to the lex structure and sets the command attribute to the new command token SQLCOM_SHOW_DISK_USAGE. This code matches the SHOW and DISK_USAGE_SYM symbols to the SQLCOM_SHOW_DISK_USAGE command so that the SQL command switch in the sql_parse.cc file can correctly route the execution to the implementation of the SHOW DISK_USAGE command.

Notice also that I placed this code at the start of the show: definition and used the vertical bar symbol (|) in front of the previous SHOW syntax statement. The vertical bar is used as an “or” for the syntax switch. Thus, the statement is valid if, and only if, it meets one of the syntax statement definitions. Feel free to look around in this file and get a feel for how the code works. Don’t sweat over learning every nuance. What I have shown you is the minimum of what you need to know to create a new command. If you decide to implement more complex commands, study the examples of similar commands to see how they handle tokens and variables.

Next, add the source code to the large command statement switch in sql_parse.cc. Open the file and add a new case to the switch statement, as shown in Listing 7-23.

Listing 7-23. Adding a Case for the New Command

...
case SQLCOM_SHOW_AUTHORS:
res= mysqld_show_authors(thd);
break;
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add SQLCOM_SHOW_DISK_USAGE case statement */
case SQLCOM_SHOW_DISK_USAGE:
res = show_disk_usage_command(thd);
break;
/* END CAB MODIFICATION */
case SQLCOM_SHOW_CONTRIBUTORS:
res= mysqld_show_contributors(thd);
break;
...

Notice that I just added a call to a new function named show_disk_usage_command(). You will add this function to the sql_show.cc file. The name of this function matches the tokens in the lex.h file, the symbols identified in the sql_yacc.yy file, and the command switch in the sql_parse.cc file. Not only does this make it clear what is going on, it also helps to keep the already-large switch statement within limits. Feel free to look around in this file, because it is the heart of the command-statement flow of execution. You should be able to find all of the commands, such as SELECT, CREATE, and so on.

Now, let’s add the code to execute the command. Open the sql_show.h file and add the function declaration for the new command, as shown in Listing 7-24. I have placed the function declaration near the same functions as defined in the sql_parse.cc file. This isn’t required, but it helps organize the code a bit.

Listing 7-24. Function Declaration for the New Command

...
int mysqld_show_variables(THD *thd,const char *wild);
bool mysqld_show_storage_engines(THD *thd);
bool mysqld_show_authors(THD *thd);
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add show disk usage method reference */
bool show_disk_usage_command(THD *thd);
/* END CAB MODIFICATION */
bool mysqld_show_contributors(THD *thd);
bool mysqld_show_privileges(THD *thd);
...

The last modification is to add the implementation for the show_disk_usage_command() function (Listing 7-25). Open the sql_show.cc file and add the function implementation for the new command. The code in Listing 7-25 is stubbed out. This is to ensure that the new command was working before I added any code. This practice is a great one to follow if you have to implement complex code. Implementing just the basics helps to establish that your code changes are working and that any errors encountered are not related to the stubbed code. This practice is especially important to follow whenever modifying or adding new SQL commands.

Listing 7-25. The show_disk_usage_command() Implementation

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add show disk usage method */
bool show_disk_usage_command(THD *thd)
{
List<Item> field_list;
Protocol *protocol= thd->protocol;
DBUG_ENTER("show_disk_usage");

/* send fields */
field_list.push_back(new Item_empty_string("Database",50));
field_list.push_back(new Item_empty_string("Size_in_bytes",30));

if (protocol->send_result_set_metadata(&field_list,
Protocol::SEND_NUM_ROWS |
Protocol::SEND_EOF))
DBUG_RETURN(TRUE);

/* send test data */
protocol->prepare_for_resend();
protocol->store("test_row", system_charset_info);
protocol->store("1024", system_charset_info);
if (protocol->write())
DBUG_RETURN(TRUE);

my_eof(thd);
DBUG_RETURN(FALSE);
}

/* END CAB MODIFICATION */

I want to call your attention to the source code for a moment. If you recall, in a previous chapter, I mentioned that there are low-level network functions that allow you to build a result set and return it to the client. Look at the lines of code indicated by the /* send fields */comment. This code creates the fields for the result set. In this case, I’m creating two fields (or columns), named Database and Size_in_bytes. These will appear as the column headings in the MySQL client utility when the command is executed.

Notice the protocol->XXX statements. This is where I use the Protocol class to send rows to the client. I first call prepare_for_resend() to clear the buffer, then make as many calls to the overloaded store() method setting the value for each field (in order). Finally, I call the write() method to write the buffer to the network. If anything goes wrong, I exit the function with a value of true (which means errors were generated). The last statement that ends the result set and finalizes communication to the client is the my_eof() function, which sends an end-of-file signal to the client. You can use these same classes, methods, and functions to send results from your commands.

COMPILE ERRORS FOR DISK_USAGE_SYM

If you want to compile the server, you can, but you may encounter errors concerning the DISK_USAGE_SYM symbol. This can happen if you built the server without cmake or skipped the cmake step. The following will help you resolve these issues.

If you’ve been studying the MySQL source code, you’ve probably noticed that there are sql_yacc.cc and sql_yacc.h files. These files are generated from the sql_yacc.yy file by Bison. Let’s use Bison to generate these files. Open a command window and navigate to the/sql directory off the root of your source-code tree. Run the command:

bison –y –p MYSQL –d sql_yacc.yy

This generates two new files: y.tab.c and y.tab.h. These files will replace the sql_yacc.cc and sql_yacc.h files, respectively. Before you copy them, make a backup of the original files. After you have made a backup of the files, copy y.tab.c to sql_yacc.ccand y.tab.h to sql_yacc.h.

Once the sql_yacc.cc and sql_yacc.h files are correct, generate the lexical hash by running this command:

gen_lex_hash > lex_hash.h

Everything is now set for you to compile the server. Since you have modified a number of the key header files, you may encounter longer-than-normal compilation times. Should you encounter compilation errors, please correct them before you proceed.

If you compile the code using debug, however, you may encounter a compilation error in mysqld.cc. If this occurs, it is likely the call to a compile_time_assert() macro. If that is the case, change the code as follows to compensate for the difference in number ofcom_status_vars enumerations.

compile_time_assert(sizeof(com_status_vars)/
sizeof(com_status_vars[0]) - 1 == SQLCOM_END + 8–1);

Once the server is compiled and you have a new executable, stop your server, copy the new executable to the location of your MySQL installation, and restart the server. You can now execute the new command in a MySQL client utility. Listing 7-26 shows an example of the SHOW DISK_USAGE command.

Listing 7-26. Example Execution of the SHOW DISK_USAGE Command

mysql> SHOW DISK_USAGE;

+−−--------+−−-------------+
| Database | Size_in_bytes |
+−−--------+−−-------------+
| test_row | 1024 |
+−−--------+−−-------------+
1 row in set (0.00 sec)

Now that everything is working, open the sql_show.cc file and add the actual code for the SHOW DISK_USAGE command ,as shown in Listing 7-27.

Listing 7-27. The Final show_disk_usage_command Source Code

/* This section adds the code to call the new SHOW DISK_USAGE command. */
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add show disk usage method */
bool show_disk_usage_command(THD *thd)
{
List<Item> field_list;
List<LEX_STRING> dbs;
LEX_STRING *db_name;
char *path;
MY_DIR *dirp;
FILEINFO *file;
longlong fsizes = 0;
longlong lsizes = 0;
Protocol *protocol= thd->protocol;
DBUG_ENTER("show_disk_usage");

/* send the fields "Database" and "Size" */
field_list.push_back(new Item_empty_string("Database",50));
field_list.push_back(new Item_return_int("Size_in_bytes", 7,
MYSQL_TYPE_LONGLONG));
if (protocol->send_result_set_metadata(&field_list,
Protocol::SEND_NUM_ROWS |
Protocol::SEND_EOF))
DBUG_RETURN(TRUE);

/* get database directories */
find_files_result res = find_files(thd, &dbs, 0, mysql_data_home,0,1);
if (res != FIND_FILES_OK)
DBUG_RETURN(1);
List_iterator_fast<LEX_STRING> it_dbs(dbs);
path = (char *)my_malloc(PATH_MAX, MYF(MY_ZEROFILL));
dirp = my_dir(mysql_data_home, MYF(MY_WANT_STAT));
fsizes = 0;
for (int i = 0; i < (int)dirp->number_off_files; i++)
{
file = dirp->dir_entry + i;
if (strncasecmp(file->name, "ibdata", 6) == 0)
fsizes = fsizes + file->mystat->st_size;
else if (strncasecmp(file->name, "ib", 2) == 0)
lsizes = lsizes + file->mystat->st_size;
}

/* send InnoDB data to client */
protocol->prepare_for_resend();
protocol->store("InnoDB TableSpace", system_charset_info);
protocol->store((longlong)fsizes);
if (protocol->write())
DBUG_RETURN(TRUE);
protocol->prepare_for_resend();
protocol->store("InnoDB Logs", system_charset_info);
protocol->store((longlong)lsizes);
if (protocol->write())
DBUG_RETURN(TRUE);

/* now send database name and sizes of the databases */
while ((db_name = it_dbs++))
{
fsizes = 0;
strcpy(path, mysql_data_home);
strcat(path, "/");
strcat(path, db_name->str);
dirp = my_dir(path, MYF(MY_WANT_STAT));
for (int i = 0; i < (int)dirp->number_off_files; i++)
{
file = dirp->dir_entry + i;
fsizes = fsizes + file->mystat->st_size;
}

protocol->prepare_for_resend();
protocol->store(db_name->str, system_charset_info);
protocol->store((longlong)fsizes);
if (protocol->write())
DBUG_RETURN(TRUE);
}
my_eof(thd);

/* free memory */
my_free(path);
my_dirend(dirp);
DBUG_RETURN(FALSE);
}

/* END CAB MODIFICATION */

image Note On Windows, you may need to substitute MAX_PATH for PATH_MAX in the my_malloc() calls and use strnicmp in place of the strncasecmp.

When you compile and load the server, and then run the command, you should see something similar to the example shown in Listing 7-28.

Listing 7-28. Example Execution of the new SHOW DISK_USAGE Command

mysql> show disk_usage;
+−−------------------+−−-------------+
| Database | Size_in_bytes |
+−−------------------+−−-------------+
| InnoDB TableSpace | 77594624 |
| InnoDB Logs | 10485760 |
| mtr | 33423 |
| mysql | 844896 |
| performance_schema | 493595 |
| test | 8192 |
+−−------------------+−−-------------+
6 rows in set (0.00 sec)

mysql>

The list shows you the cumulative size of each database on your server in the MySQL data directory. One thing you might want to do is add a row that returns the grand total of all disk space used (much like a WITH ROLLUP clause). I leave this modification for you to complete as you experiment with implementing the function.

I hope that this short section on creating new SQL commands has helped eliminate some of the confusion and difficulty surrounding the MySQL SQL command-processing source code. Now that you have this information, you can plan your own extensions to the MySQL commands to meet your own unique needs.

Adding to the Information Schema

The last area I want to cover in this chapter is adding information to the information schema. The information schema is an in-memory collection of logical tables that contain status and other pertinent data (also known as metadata) about the server and its environment. Introduced in version 5.0.2, the information schema has become an important tool for administration and debugging the MySQL server, its environment, and databases.6 For example, the information schema makes it easy to display all the columns for all the tables in a database by using this SQL command:

SELECT table_name, column_name, data_type FROM information_schema.columns
WHERE table_schema = 'test';

The metadata is grouped into logical tables that permit you to issue SELECT commands against them. One of the greatest advantages of creating an INFORMATION_SCHEMA view is the use of the SELECT command. Specifically, you can use a WHERE clause to restrict the output to matching rows. This provides a unique and useful way to get information about the server. Table 7-2 lists some of the logical tables and their uses.

Table 7-2. Information Schema Logical Tables

Name

Description

SCHEMATA

Provides information about databases.

TABLES

Provides information about the tables in all the databases.

COLUMNS

Provides information about the columns in all the tables.

STATISTICS

Provides information about indexes for the tables.

USER_PRIVILEGES

Provides information about the database privileges. It encapsulates the mysql.db grant table.

TABLE_PRIVILEGES

Provides information about the table privileges. It encapsulates the mysql.tables_priv grant table.

COLUMN_PRIVILEGES

Provides information about the column privileges. It encapsulates the mysql.columns_priv grant table.

COLLATIONS

Provides information about the collations for the character sets.

KEY_COLUMN_USAGE

Provides information about the key columns.

ROUTINES

Provides information about the procedures and functions (does not include user-defined functions).

VIEWS

Provides information about the views from all the databases.

TRIGGERS

Provides information about the triggers from all the databases.

Since the disk-usage command falls into the category of metadata, I’ll show you how to add it to the information-schema mechanism in the server. The process is actually pretty straightforward, with no changes to the sql_yacc.yy code or lexical hash. Instead, you add an enumeration and a case for the switch statement in the function that creates the data (rows) for the disk usage function, define a structure to hold the columns for the table, and then add the source code to execute it.

Let’s begin with modifying the header files for the new enumeration. Open the handler.h file and locate the enum_schema_tables enumeration. Add a new enumeration named SCH_DISKUSAGE to the list. Listing 7-29 shows an excerpt of the enumerations with the new enumeration added.

Listing 7-29. Changes to the enum_schema_tables Enumeration

enum enum_schema_tables
{
...
SCH_COLLATION_CHARACTER_SET_APPLICABILITY,
SCH_COLUMNS,
SCH_COLUMN_PRIVILEGES,
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add case enum for the new SHOW DISK_USAGE view. */
SCH_DISKUSAGE,
/* END CAB MODIFICATION */
SCH_ENGINES,
SCH_EVENTS,
SCH_FILES,
...

Now you need to add the case for the switch command in the prepare_schema_tables() function that creates the new schema table. Open the sql_parse.cc file and add the case statement shown in Listing 7-30. Notice that I just added the case without a break statement. This allows the code to fall through to code that satisfies all of the case. This is an elegant alternative to lengthy if-then-else-if statements that you see in most source code.

Listing 7-30. Modifications to the prepare_schema_table Function

int prepare_schema_table(THD *thd, LEX *lex, Table_ident *table_ident,
enum enum_schema_tables schema_table_idx)
{
...
DBUG_ENTER("prepare_schema_table");

switch (schema_table_idx) {
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add case statement for the new SHOW DISK_USAGE view. */
case SCH_DISKUSAGE:
/* END CAB MODIFICATION */
case SCH_SCHEMATA:
#if defined(DONT_ALLOW_SHOW_COMMANDS)
my_message(ER_NOT_ALLOWED_COMMAND,
...

You may have noticed I refer to the disk-usage schema table as DISKUSAGE. I do this because the DISK_USAGE token has already been defined in the parser and lexical hash. If I had used DISK_USAGE and issued the command SELECT * FROM DISK_USAGE, I’d have gotten an error. This is because the parser associates the DISK_USAGE token with the SHOW command and not with the SELECT command.

Now we’re at the last set of code changes. You need to add a structure that the information-schema functions can use to create the field list for the table. Open the sql_show.cc file and add a new array of type ST_FIELD_INFO as shown in Listing 7-31. Notice that the columns are named the same and have the same types as in the show_disk_usage_command().

Listing 7-31. New Field Information Structure for the DISKUSAGE Schema Table

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section adds the code to call the new SHOW DISK_USAGE command. */
ST_FIELD_INFO disk_usage_fields_info[]=
{
{"DATABASE", 40, MYSQL_TYPE_STRING, 0, 0, NULL, SKIP_OPEN_TABLE},
{"Size_in_bytes", 21 , MYSQL_TYPE_LONG, 0, 0, NULL, SKIP_OPEN_TABLE },
{0, 0, MYSQL_TYPE_STRING, 0, 0, 0, SKIP_OPEN_TABLE}
};
/* END CAB MODIFICATION */

The next change you need to make is to add a row in the schema_tables array (also in sql_show.cc). Locate the array and add a statement like that shown in Listing 7-32. This declares that the new table is called DISKUSAGE, that the column definitions are specified by disk_usage_fields_info, that Create_schema_table will be used to create the table, and that fill_disk_usage will be used to populate the table. The make_old_format tells the code to make sure that the column names are shown. The last four parameters are a pointer to a function to do some additional processing on the table, two index fields, and a bool variable to indicate that it is a hidden table. In the example, I set the pointer to the function to NULL (0); –1 indicates the indexes aren’t used, and 0 indicates the table is not hidden.

Listing 7-32. Modifications to the schema_tables Array

ST_SCHEMA_TABLE schema_tables[]=
{
...
{"ENGINES", engines_fields_info, create_schema_table,
fill_schema_engines, make_old_format, 0, -1, -1, 0, 0},
/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* This section adds the code to call the new SHOW DISK_USAGE command. */
{"DISKUSAGE", disk_usage_fields_info, create_schema_table,
fill_disk_usage, make_old_format, 0, -1, -1, 0, 0},
/* END CAB MODIFICATION */
#ifdef HAVE_EVENT_SCHEDULER
{"EVENTS", events_fields_info, create_schema_table,
fill_schema_events, make_old_format, 0, -1, -1, 0, 0},
...

OK, we’re on the home stretch. All that is left is to implement the fill_disk_usage() function. Scroll up from the schema_tables array7 and insert the implementation for the fill_disk_usage() function, as shown in Listing 7-33.

Listing 7-33. The fill_disk_usage Function Implementation

/* BEGIN CAB MODIFICATION */
/* Reason for Modification: */
/* Add code to fill the output for the new SHOW DISK_USAGE view. */
int fill_disk_usage(THD *thd, TABLE_LIST *tables, Item *cond)
{
TABLE *table= tables->table;
CHARSET_INFO *scs= system_charset_info;
List<Item> field_list;
List<LEX_STRING> dbs;
LEX_STRING *db_name;
char *path;
MY_DIR *dirp;
FILEINFO *file;
longlong fsizes = 0;
longlong lsizes = 0;
DBUG_ENTER("fill_disk_usage");

find_files_result res = find_files(thd, &dbs, 0, mysql_data_home,0,1);
if (res != FIND_FILES_OK)
DBUG_RETURN(1);
List_iterator_fast<LEX_STRING> it_dbs(dbs);
path = (char *)my_malloc(PATH_MAX, MYF(MY_ZEROFILL));
dirp = my_dir(mysql_data_home, MYF(MY_WANT_STAT));
fsizes = 0;
for (int i = 0; i < (int)dirp->number_off_files; i++)
{
file = dirp->dir_entry + i;
if (strncasecmp(file->name, "ibdata", 6) == 0)
fsizes = fsizes + file->mystat->st_size;
else if (strncasecmp(file->name, "ib", 2) == 0)
lsizes = lsizes + file->mystat->st_size;
}

/* send InnoDB data to client */
table->field[0]->store("InnoDB TableSpace",
strlen("InnoDB TableSpace"), scs);
table->field[1]->store((longlong)fsizes, TRUE);
if (schema_table_store_record(thd, table))
DBUG_RETURN(1);
table->field[0]->store("InnoDB Logs", strlen("InnoDB Logs"), scs);
table->field[1]->store((longlong)lsizes, TRUE);
if (schema_table_store_record(thd, table))
DBUG_RETURN(1);

/* now send database name and sizes of the databases */
while ((db_name = it_dbs++))
{
fsizes = 0;
strcpy(path, mysql_data_home);
strcat(path, "/");
strcat(path, db_name->str);
dirp = my_dir(path, MYF(MY_WANT_STAT));
for (int i = 0; i < (int)dirp->number_off_files; i++)
{
file = dirp->dir_entry + i;
fsizes = fsizes + file->mystat->st_size;
}
restore_record(table, s->default_values);

table->field[0]->store(db_name->str, db_name->length, scs);
table->field[1]->store((longlong)fsizes, TRUE);
if (schema_table_store_record(thd, table))
DBUG_RETURN(1);
}

/* free memory */
my_free(path);
DBUG_RETURN(0);
}
/* END CAB MODIFICATION */

image Note On Windows, substitute MAX_PATH for PATH_MAX in the my_malloc() calls and use strnicmp in place of the strncasecmp.

I copied the code from the previous DISK_USAGE command, removing the calls for creating fields (that’s handled via the disk_usage_fields_info array) and the code for sending rows to the client. Instead, I use an instance of the TABLE class/structure to store values in thefields array, starting at zero for the first column. The call to the function schema_table_store_record() function dumps the values to the network protocols.

Everything is now set for you to compile the server. Since you have modified one of the key header files (handler.h), you may encounter longer-than-normal compilation times as some of the dependencies for the mysqld project may have to be compiled. Should you encounter compilation errors, please correct them before you proceed.

Once the server is compiled and you have a new executable, stop your server, copy the new executable to the location of your MySQL installation, and restart the server. You can now execute the new command in a MySQL client utility. Listing 7-34 shows an example of using the information schema, displaying all of the available schema tables, and dumping the contents of the new DISKUSAGE table.

Listing 7-34. Example Information Schema Use with the new DISKUSAGE Schema Table

mysql> use INFORMATION_SCHEMA;
Database changed

mysql> SHOW TABLES LIKE 'DISK%';
+−−------------------------------------+
| Tables_in_information_schema (DISK%) |
+−−------------------------------------+
| DISKUSAGE |
+−−------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * from DISKUSAGE;
+−−------------------+−−-------------+
| DATABASE | Size_in_bytes |
+−−------------------+−−-------------+
| InnoDB TableSpace | 77594624 |
| InnoDB Logs | 10485760 |
| mtr | 33423 |
| mysql | 844896 |
| performance_schema | 493595 |
| test | 8192 |
+−−------------------+−−-------------+
6 rows in set (0.00 sec)

mysql>

Now that you know how to add to the information schema, the sky is the limit for what you can add to enable your database professionals to more closely monitor and tune your MySQL servers.

Summary

In this chapter, I’ve shown you how to extend the capabilities of the MySQL server by adding your own new functions and commands.

You learned how to build a UDF library that can be loaded and unloaded at runtime, how to add a native function to the server source code, and how to add a new SHOW command to the parser and the query-execution code. You also learned how to add a view to the information schema.

The ability to extend the server in this manner makes MySQL very flexible. The UDF mechanism is one of the easiest to code, and it far surpasses the competition in sophistication and speed of development. The fact that the server is open source means that you can also get right into the source code and add your own SQL commands for your specific environment. Regardless of whether you use these facilities, you can appreciate knowing that you aren’t limited by the “out-of-the-box” functions and commands.

The next chapter will explore one of the most popular features of MySQL that enables high availability – MySQL replication. I will introduce the basics of replication and take you on a tour of the replication source code. This is followed by example extensions to replication that you can use to learn the internals of replication as well as get an idea for extensions that you can use to enhance your own high availability solutions.

1 The lexical analyzer and yacc files—not to be confused with the often-quirky science fiction program named Lexx. (NOTE: UC YACC?)

2 Knuth, D. E., The Art of Computer Programming. 2nd ed. (Addison-Wesley, 1997).

3 Linux/Unix users can either use their package manager and install it or download it from the GNU Web site (www.gnu.org/software/bison). Windows users can download a Win32 version from http://gnuwin32.sourceforge.net/packages/bison.htm.

4 For example, the MySQL Utilities utility mysqldiskusage. MySQL Utilities is a subproject of the MySQL Workbench. You can download the MySQL Workbench from dev.mysql.com.

5 To learn more about the YACC parser and how to write YACC code, see http://dinosaur.compilertools.net/.

6 For more information about the information schema, see the online MySQL reference manual.

7 Remember, if you do not use function declarations, you must locate the code for functions in front of the code that references it.