Query Execution - Advanced Database Internals - Expert MySQL, Second Edition (2012)

Expert MySQL, Second Edition (2012)

PART 3. Advanced Database Internals

CHAPTER 14. Query Execution

The query-tree class shown in Chapter 12 and the heuristic optimizer shown in Chapter 13 form the first two of the three components that make up the DBXP query-execution engine. In this chapter, I show you how to expand the query-tree class to process project, restrict, and join operations. This will give you a glimpse into the world of database-query execution. I begin by briefly explaining the basic principles of the query execution algorithms and then jump into writing the code. Because the code for some of the methods is quite lengthy, not all of the code examples shown in this chapter include the complete source code. If you are following along by coding the examples, consider loading the source code for this chapter and using it rather than typing in the code from scratch.

Query Execution Revisited

The query-execution process is the implementation of the various relational theory operations. These operations include project, restrict, join, cross-product, union, and intersect. Few database systems implement union and intersect.

image Note Union and intersect are not the same as the UNION operator in SQL. The union and intersect relational operations are set operations, whereas the UNION operator in SQL simply combines the results of two or more SELECT statements that have compatible result columns.

Writing algorithms to implement these operations is very straightforward and often omitted from relational-theory and database-systems texts. It is unfortunate that the algorithms are omitted, because the join operation is nontrivial. The following sections describe the basic algorithms for the relational operations.

Project

The project (or projection) operation is one in which the result set contains a subset of the attributes (columns) in the original relation (table).1 Thus, the result set can contain fewer attributes than the original relation. Users specify projections in a SQL SELECT command by listing the desired columns in the column list immediately following the SELECT keyword. For example, This command projects the first_name and last_name columns from the staff table:

SELECT first_name, last_name FROM staff

The project algorithm to implement this operation is shown in Listing 14-1.

Listing 14-1. Project Algorithm

begin
do
get next tuple from relation
for each attribute in tuple
if attribute.name found in column_list
write attribute data to client
fi
while not end_of_relation
end

As you can see from the listing, the code to implement this algorithm limits sending data to the client to the data specified in the column list.

Restrict

The restrict (or restriction) operation is one in which the result set contains a subset of the tuples (rows) in the original relation (table). Thus, the result set can contain fewer tuples than the original relation. Users specify restrictions in a SQL SELECT command by listing the desired conditions in the WHERE clause immediately following the FROM clause. For example, the following command restricts the result set from the staff table to those employees who make more than $65,000.00 annually.

SELECT first_name, last_name FROM staff WHERE salary > 65000.00

The restrict algorithm to implement this operation is shown in Listing 14-2.

Listing 14-2. Restrict Algorithm

begin
do
get next tuple from relation
if tuple attribute values match conditions
write attribute data to client
fi
while not end_of_relation
end

As you can see from the listing, the code to implement this algorithm is where the data values in the tuple match the conditions in the WHERE clause. This algorithm is often implemented with an additional optimization step to reduce the expressions to a minimal set (e.g., omitting always true conditions).

Join

The join operation is one in which the result set contains the tuples (rows) that match a specified combination of two relations (tables). Three or more tables are joined using n-1 joins, where n is the number of tables. In the case of three tables joined (A, B, C), the join is a combination of two of the tables and the result of that join joined with the remaining table. The combinations of how the joins are linked—as a left-deep or right-deep tree or even as a bushy tree—are one of order of execution of the intermediate joins. Examples of these types of tree arrangements are shown inFigure 14-1.

9781430246596_Fig14-01.jpg

Figure 14-1. Example tree arrangements

Joins are most often used in a master/detail relationship in which one table (the base or master table) references one or more subtables (detail tables) in which one record in the base table matches one or more records in the detail tables. For example, you could create a customer tablethat contains information about customers and an orders table that contains data about the customers’ orders. The customer table is the base table, and the orders table is the subtable.

SELECT customer.name, orders.number
FROM customer JOIN orders on customer.id = orders.customerid

Users specify joins in a SQL SELECT command by listing the desired tables and join conditions in the FROM clause. For example, the following command joins the records from the customer table with those records in the orders table. Note that in this case, the join condition is a simple equal relationship with a common column that the tables share.

The algorithm for a join operation is not as straightforward as those described earlier. This is because the join operation can be represented in several forms. You can choose to join using a simple column from table A = column from table B expression as in the previous example, or you can elect to control the output by including only matching rows (inner), matching and nonmatching rows (outer) from the left, right, or both tables. The join operations therefore include inner join (sometimes called natural or equi-joins2), left outer join, right outer join, full outer join, cross-product, union, and intersect. The following sections describe each of these operations in detail.

image Note Some database texts treat the cross-product, union, and intersect as discrete operations. I consider them specialized forms of the join operation.

Inner Join

The inner-join operation is one in which the result set contains a subset of the tuples (rows) in the original relations (tables) where there is a match on the join condition. It is called an inner join because only those rows in the first relation whose join condition value matches that of a row in the second relation are included in the result set.

Users specify inner joins in a SQL SELECT command by listing the desired conditions in the FROM clause. For example, the following command joins the result set from the staff table to the directorate table, returning a result set of all employees who are assigned a directorate (one employee does not have a directorate assigned).

SELECT staff.last_name, staff.dept_name
FROM staff JOIN directorate on staff.dept_id = directorate.id

image Note The keyword INNER is usually optional for most database systems, because the default join operation is an inner join.

The inner-join algorithm to implement this operation is shown in Listing 14-3. This algorithm is but one of several forms of join algorithms. The algorithm shown is a variant of a merge-join. Thus, it is possible to implement this algorithm using another equally capable join algorithm, such as a nested loop join algorithm.

Listing 14-3. Join Algorithm

begin
sort relation a as rel_a on join column(s)
sort relation b as rel_b on join column(s)
do
get next tuple from rel_a
get next tuple from rel_b
if join column values match join conditions
write attribute data to client
fi
check rewind conditions
while not end_of_rel_a and not end_of_rel_b
end

Users can also specify an inner join by including the join condition in the WHERE clause, as shown here. Some database professionals discourage this variant, because it can be mistaken for a normal SELECT command. Most agree that the variant is functionally equivalent, and most database optimizers are written to accommodate it.


SELECT staff.last_name, directorate.dept_name
FROM staff, directorate WHERE staff.dept_id = directorate.id

As you can see from the listing, the code to implement this algorithm requires the use of a sort. A sort is needed to order the rows in the tables on the join columns so that the algorithm can correctly identify all of the matches should there be any duplicate condition values among the rows. To illustrate this point, consider the tables shown in Listing 14-4.

Listing 14-4. Example Join Tables (Unordered)

staff table
+-----------+------------+----------+------------+------+--------+-----------+
| id | first_name | mid_name | last_name | sex | salary | mgr_id |
+-----------+------------+----------+------------+------+--------+-----------+
| 333445555 | John | Q | Smith | M | 30000 | 333444444 |
| 123763153 | William | E | Walters | M | 25000 | 123654321 |
| 333444444 | Alicia | F | St.Cruz | F | 25000 | None |
| 921312388 | Goy | X | Hong | F | 40000 | 123654321 |
| 800122337 | Rajesh | G | Kardakarna | M | 38000 | 333445555 |
| 820123637 | Monty | C | Smythe | M | 38000 | 333445555 |
| 830132335 | Richard | E | Jones | M | 38000 | 333445555 |
| 333445665 | Edward | E | Engles | M | 25000 | 333445555 |
| 123654321 | Beware | D | Borg | F | 55000 | 333444444 |
| 123456789 | Wilma | N | Maxima | F | 43000 | 333445555 |
+-----------+------------+----------+------------+------+--------+-----------+

directorate table
+----------+-----------------+-------------+
| dir_code | dir_name | dir_head_id |
+----------+-----------------+-------------+
| N41 | Development | 333445555 |
| N01 | Human Resources | 123654321 |
| M00 | Management | 333444444 |
+----------+-----------------+-------------+

image Note Some database systems (such as MySQL) return the rows in the original, unordered sequence. The examples shown are included in order of the internal sort for emphasis.

Notice that the tables are not ordered. If you were to run the example join shown using the algorithm without ordering the rows, you’d have to read all of the rows from one of the tables for each row, read from the other. For example, if the staff table were read in the order shown, you would read one row from the directorate table for the first join, two rows from the directorate table for the next row from staff, followed by two, one, one, one, four, two rows, with a total of 14 reads from the directorate table to complete the operation. If the tables were ordered as shown in Listing 14-5, however, you could avoid rereading the rows from the directorate table.

Listing 14-5. Example Join Tables (Ordered by Join Column)

staff table
+-----------+------------+----------+------------+------+--------+-----------+
| id | first_name | mid_name | last_name | sex | salary | mgr_id |
+-----------+------------+----------+------------+------+--------+-----------+
| 123763153 | William | E | Walters | M | 25000 | 123654321 |
| 921312388 | Goy | X | Hong | F | 40000 | 123654321 |
| 333445555 | John | Q | Smith | M | 30000 | 333444444 |
| 123654321 | Beware | D | Borg | F | 55000 | 333444444 |
| 333445665 | Edward | E | Engles | M | 25000 | 333445555 |
| 830132335 | Richard | E | Jones | M | 38000 | 333445555 |
| 820123637 | Monty | C | Smythe | M | 38000 | 333445555 |
| 800122337 | Rajesh | G | Kardakarna | M | 38000 | 333445555 |
| 123456789 | Wilma | N | Maxima | F | 43000 | 333445555 |
| 333444444 | Alicia | F | St.Cruz | F | 25000 | None |
+-----------+------------+----------+------------+------+--------+-----------+

directorate table
+----------+-----------------+-------------+
| dir_code | dir_name | dir_head_id |
+----------+-----------------+-------------+
| N01 | Human Resources | 123654321 |
| M00 | Management | 333444444 |
| N41 | Development | 333445555 |
+----------+-----------------+-------------+

This creates another problem, however. How do you know not to read another row from either table? Notice the last step in the inner-join algorithm. This is where the implementation can get a bit tricky. What you need to do here is be able to reuse a row that has already been read so that you can compare one row from one table to many rows in another. This gets tricky when you consider that you may have to advance (go forward one row) or rewind (go back one row) from either table.

If you follow the algorithm by hand with the ordered example tables using staff.mgr_id = directorate.dir_head_id as the join criteria (reading from the staff table as rel_a first), you’ll see that the algorithm would require the “reuse” of the directorate row with a dir_head_id of 333444444 twice and the row with and dir_head_id of 333445555 three times. The caching of the rows is sometimes called “rewinding” the table read pointers. The result set of this example is shown in Listing 14-6.

Listing 14-6. Example Inner-join Result Set

+------------+-----------------+
| last_name | dir_name |
+------------+-----------------+
| Smith | Management |
| Walters | Human Resources |
| Hong | Human Resources |
| Kardakarna | Development |
| Smythe | Development |
| Jones | Development |
| Engles | Development |
| Borg | Management |
| Maxima | Development |
+------------+-----------------+

Listing 14-6 yields only 9 rows and not 10. This is because there is one row in the directorate table that does not have a mgr_id, because she is the boss. Thus, there is no corresponding row in the directorate table to match on mgr_id = dir_head_id.

Outer Join

Outer joins are similar to inner joins, but in this case, we are interested in obtaining all of the rows from the left, right, or both tables. That is, we include the rows from the table indicated (left, right, or both—also called full) regardless of whether there is a matching row in the other table. Each of these operations can be represented by a slight variance of the general outer-join algorithm.

Users specify outer joins in a SQL SELECT command by listing the desired conditions in the FROM clause and invoking one of the options (left, right, full). Some database systems default to using left if the option is omitted. For example, the following command joins the result set from the staff table to the directorate table, returning a result set of all employees and the directorate:

SELECT staff.last_name, directorate.dir_name
FROM staff LEFT OUTER JOIN directorate on staff.mgr_id = directorate.dir_head_id;

Note that this differs from the inner join, because no rows from the left table are omitted. Listing 14-7 shows the basic outer-join algorithm. The following sections describe how the algorithm implements each of the three types of outer joins.

Listing 14-7. The Outer-join Algorithm

begin
sort relation a as rel_a on join column(s)
sort relation b as rel_b on join column(s)
do
get next tuple from rel_a
get next tuple from rel_b
if type is FULL
if join column values match join conditions
write attribute data from both tuples to client
else
if rel_a has data
write NULLS for rel_b
else if rel_b has data
write NULLS for rel_a
fi
else if type is LEFT
if join column values match join conditions
write attribute data from rel_a to client
else
if rel_a has data
write NULLS for rel_b
fi
else if type is RIGHT
if join column values match join conditions
write attribute data from rel_b to client
else
if rel_b has data
write NULLS for rel_a
fi
fi
check rewind conditions
while not end_of_rel_a and not end_of_rel_b
end

Next, we discuss examples of each of the types of outer joins.

Left Outer Join

Left outer joins include all rows from the left table concatenated with rows from the right table. For those rows that do not match the join condition, null values are returned for the columns from the right table.

SELECT staff.last_name, directorate.dir_name
FROM staff LEFT OUTER JOIN directorate on staff.mgr_id = directorate.dir_head_id;

Listing 14-8 shows the result set for the left outer join of the sample tables.

Listing 14-8. Example Left Outer Join Result Set

+------------+-----------------+
| last_name | dir_name |
+------------+-----------------+
| Smith | Management |
| Walters | Human Resources |
| St.Cruz | NULL |
| Hong | Human Resources |
| Kardakarna | Development |
| Smythe | Development |
| Jones | Development |
| Engles | Development |
| Borg | Management |
| Maxima | Development |
+------------+-----------------+

Notice that now we see the row containing the row from the staff table that has no matching row in the directorate table. This is because we told the optimizer to use all of the rows from the staff table (the one on the left of the join specification) and match to the right table. Thus, we see one of the many uses of outer joins—they can be used to identify any mismatches.3

Right Outer Join

Right outer joins include all rows from the right table concatenated with rows from the left table. For those rows that do not match the join condition, null values are returned for the columns from the left table.

SELECT staff.last_name, directorate.dir_name
FROM staff RIGHT OUTER JOIN directorate on staff.mgr_id = directorate.dir_head_id;

Listing 14-9 shows the result set for the left outer join of the sample tables.

Listing 14-9. Example Right Outer Join Result Set

+------------+-----------------+
| last_name | dir_name |
+------------+-----------------+
| Kardakarna | Development |
| Smythe | Development |
| Jones | Development |
| Engles | Development |
| Maxima | Development |
| Walters | Human Resources |
| Hong | Human Resources |
| Smith | Management |
| Borg | Management |
+------------+-----------------+

Now we’re back to the nine rows. That’s because we instructed the optimizer to use all of the rows in the right table where there is a match in the left table and since there were no rows in the directorate table (the one on the right of the join specification) that did not match a row in the staff table, the row without a match in the left table was omitted.

Full Outer Join

Full outer joins include all rows from both tables concatenated together. For those rows that do not match the join condition, null values are returned for the columns from the nonmatching table.

SELECT staff.last_name, directorate.dir_name
FROM staff FULL OUTER JOIN directorate on staff.mgr_id = directorate.dir_head_id;

Listing 14-10 shows the result set for the full outer join of the sample tables.

Listing 14-10. Example Full Outer-join Result Set

+------------+-----------------+
| last_name | dir_name |
+------------+-----------------+
| Smith | Management |
| Walters | Human Resources |
| St.Cruz | NULL |
| Hong | Human Resources |
| Kardakarna | Development |
| Smythe | Development |
| Jones | Development |
| Engles | Development |
| Borg | Management |
| Maxima | Development |
+------------+-----------------+

While MySQL does not support a full outer join, the above would be representative of the output. Now consider what the output would be if the directorate table contained a row without a dir_head_id. I leave this as an exercise for you to ponder.

Cross-Product

The cross-product operation is the one in which the result set contains each row of the left table combined with every row from the right table. Thus, the result set contains n x m rows, where n represents the number of rows in the left table and m represents the number of rows in the right table. While simple in concept, not all database systems support the cross-product operation.

image Note It is possible, in most database systems, to represent a cross-product query using a query such as SELECT * FROM table1, table2. In this case, there are no join conditions, so all rows from table1 are matched with all of the rows from table2. Try it out yourself on MySQL. You’ll see that MySQL supports cross-product operations using this method.

Users specify the cross-product operation by including the keyword CROSS in place of JOIN in the FROM clause. You may be thinking that this operation has limited applicability, but you’d be surprised at its usefulness. Suppose you were modeling possible outcomes for an artificial-intelligence algorithm. You may have tables that store possible next moves (outcomes) and other tables that store stimuli. If you wanted to find all possible combinations, given a list of stimuli selected from one table and the possible effects on the moves selected from another, you can produce a result set that shows all of the combinations. Listing 14-11 presents an example of such a scenario.

Listing 14-11. Sample Cross-product Scenario

CREATE TABLE next_stim
SELECT source, stimuli_id FROM stimuli WHERE likelihood >= 0.75
+------------+------------+
| source | stimuli_id |
+------------+------------+
| obstacle | 13 |
| other_bot | 14 |
| projectile | 15 |
| chasm | 23 |
+------------+------------+

CREATE TABLE next_moves
SELECT move_name, next_move_id, likelihood FROM moves WHERE likelihood >= 0.90
+------------+--------------+------------+
| move_name | next_move_id | likelihood |
+------------+--------------+------------+
| turn left | 21 | 0.25 |
| reverse | 18 | 0.40 |
| turn right | 22 | 0.45 |
+------------+--------------+------------+

SELECT * FROM next_stim CROSS next_moves
+------------+------------+------------+--------------+------------+
| source | stimuli_id | move_name | next_move_id | likelihood |
+------------+------------+------------+--------------+------------+
| obstacle | 13 | turn left | 21 | 0.25 |
| obstacle | 13 | reverse | 18 | 0.40 |
| obstacle | 13 | turn right | 22 | 0.45 |
| other_bot | 14 | turn left | 21 | 0.25 |
| other_bot | 14 | reverse | 18 | 0.40 |
| other_bot | 14 | turn right | 22 | 0.45 |
| projectile | 15 | turn left | 21 | 0.25 |
| projectile | 15 | reverse | 18 | 0.40 |
| projectile | 15 | turn right | 22 | 0.45 |
| chasm | 23 | turn left | 21 | 0.25 |
| chasm | 23 | reverse | 18 | 0.40 |
| chasm | 23 | turn right | 22 | 0.45 |
+------------+------------+------------+--------------+------------+

Listing 14-12 shows the cross-product algorithm. Notice that this sample is written using two steps: one to combine the rows and one to remove the duplicates.

Listing 14-12. The Cross-product Algorithm

begin
do
get next tuple from rel_a
do
get next tuple from rel_b
write tuple from rel_a concat tuple from rel_b to client
while not end_of_rel_b
while not end_of_rel_a
remove duplicates from temp_table
return data from temp_table to client
end

As you can see from the listing, the code to implement this algorithm is really one of two loops where the rows from the left table are concatenated with the rows from the right table (i.e., a nested loop algorithm).

Union

The union operation is the same as the set operation of the same name. In this case, the join is the union of all of the rows in both tables with duplicate rows removed. Naturally, the tables must be of the same design for this operation to work. This differs from the SQL union in that the SQL union includes rows from all SELECT commands (with compatible column lists) regardless of duplicates. Unlike the other joins, the implementation of the union operation is sometimes implemented in two steps: one to combine the tables and another to remove the duplicates.

Users specify the union command by including the keyword UNION in place of JOIN in the FROM clause. Let’s say you wanted to combine two employee tables (one that includes all employees who work in the United States and another that includes employees who work in Canada) and ensure you get a result set that has all of the employees listed once. You could unite the two using a command like the following:

SELECT * from us_employees UNION ca_employees

Let’s look at this one a little closer. Listing 14-13 shows examples of the tables mentioned. A quick glance will show that there are two employees who work both in the United States and Canada. If you used the SQL UNION command, you’d get the contents of both tables, with those two employees counted twice. Listing 14-14 shows the results of the union operation using the sample tables.

Listing 14-13. Sample Employee Tables

US employees table
+------------+-----------+-----------+---------+
| first_name | last_name | id | dept_id |
+------------+-----------+-----------+---------+
| Chad | Borg | 990441234 | 1 |
| Alicia | Wallace | 330506781 | 4 |
| Howard | Bell | 333445555 | 5 |
| Tamra | English | 453453453 | 5 |
| Bill | Smith | 123456789 | 5 |
+------------+-----------+-----------+---------+

Canada employees table
+------------+-----------+-----------+---------+
| first_name | last_name | id | dept_id |
+------------+-----------+-----------+---------+
| William | Wallace | 220059009 | <null> |
| Aaron | Hill | 987987987 | 4 |
| Lillian | Wallace | 987654321 | 4 |
| Howard | Bell | 333445555 | 5 |
| Bill | Smith | 123456789 | 5 |
+------------+-----------+-----------+---------+

Listing 14-14. Example Union Result Set

+------------+-----------+-----------+---------+
| first_name | last_name | id | dept_id |
+------------+-----------+-----------+---------+
| Chad | Borg | 990441234 | 1 |
| Alicia | Wallace | 330506781 | 4 |
| Howard | Bell | 333445555 | 5 |
| Tamra | English | 453453453 | 5 |
| Bill | Smith | 123456789 | 5 |
| William | Wallace | 220059009 | <null> |
| Aaron | Hill | 987987987 | 4 |
| Lillian | Wallace | 987654321 | 4 |
+------------+-----------+-----------+---------+

Listing 14-15 shows the union algorithm. Notice that this sample is written using two steps to combine and then remove duplicates.

Note In MySQL, you can use the ALL option for the UNION clause to skip removal of duplicates.

Listing 14-15. The Union Algorithm

begin
do
get next tuple from rel_a
write tuple from rel_a to temp_table
get next tuple from rel_b
write tuple from rel_b to temp_table
while not (end_of_rel_a or end_of_rel_b)
remove duplicates from temp_table
return data from temp_table to client
end

Intersect

The intersect operation is the same as the set operation of the same name. In this case, the join is the intersection of the rows that are in both tables, with duplicate rows removed. Naturally, the tables must be of the same design for this operation to work.

Users specify the intersect operation by including the keyword INTERSECT in place of JOIN in the FROM clause. Let’s say you wanted to combine two employee tables (one that includes all employees who work in the United States and another that includes employees who work in Canada) and ensure you get a result set that has all of the employees who work in both the United States and Canada. You could intersect the two using a command like:

SELECT * from us_employees INTERSECT ca_employees

Let’s look at this one a little more closely. Using the example tables from Listing 14-13, you will see that there are two employees who work both in the United States and Canada. Listing 14-16 shows the results of the intersect operation using the sample tables. Listing 14-17 shows the intersect algorithm.

Listing 14-16. Example Intersect Result Set

+------------+-----------+-----------+---------+
| first_name | last_name | id | dept_id |
+------------+-----------+-----------+---------+
| Howard | Bell | 333445555 | 5 |
| Bill | Smith | 123456789 | 5 |
+------------+-----------+-----------+---------+

Listing 14-17. The Intersect Algorithm

begin
do
get next tuple from rel_a
get next tuple from rel_b
if join column values match intersection conditions
write tuple from rel_a to client
while not (end_of_rel_a or end_of_rel_b)
end

DBXP Query Execution

Query execution in DBXP is accomplished using the optimized query tree. The tree structure itself is used as a pipeline for processing the query. When a query is executed, a get_next() method is issued on each of the children of the root node. Another get_next() method call is made to each of their children. This process continues as the tree is traversed to the lowest level of the tree containing a reference to a single table. Consider the query:

SELECT col1, col2 FROM table_a JOIN
(SELECT col2, col8 FROM table_b WHERE col6 = 7)
ON col8 WHERE table_a.col7 > 14

The query is retrieving data from table_a that matches a subset of the data in table_b. Notice that I wrote the subset as a subquery. The query-tree execution easily accommodates a subquery mechanism in which a subquery would be represented as a subtree. Figure 14-2 shows a high-level view of the concept.

9781430246596_Fig14-02.jpg

Figure 14-2. Query-tree execution

The operation for each node is executed for one row in the relation. Upon completion, the result of that operation passes the result up to the next operation in the tree. If no result is produced, control remains in the current node until a result is produced or there are no more results to process. As the tree is being climbed back to the root, the results are passed to each parent in turn until the root node is reached. Once the operation in the root node is complete, the resulting tuple is passed to the client. In this way, execution of the query appears to produce results faster because data (results) are shown to the client much earlier than if the query were to be executed for the entire set of operations before any results are given to the client.

The Query_tree class is designed to include the operations necessary to execute the query. Operations are included for project, restrict, and join. A prepare() method is called at the start of query execution. The prepare() method walks the query tree, initializing all the nodes for execution. Execution is accomplished by using a while loop that iterates through the result set, issuing a pulse to the tree starting at the root node. A pulse is a call to the get_next() method that is propagated down the tree. Each node that is pulsed issues a pulse to each of its children, starting with the left child. A separate parameterized method is provided for each of the following operations: do_restrict(), do_project(), and do_join() .4 These methods operate using one or two tuples as input and return either a null or a tuple. A null return indicates that the tuple or tuples do not satisfy the current operation. For example, a do_restrict() operation accepting a tuple operates using the expression class to evaluate the values in the tuple. If the expression evaluates to false, a null result is returned. If the expression evaluates to true, the same tuple is returned.5

This process is repeated throughout the tree, passing a single tuple up the tree to the root. The resulting tuple from the root is then processed by the external while loop and presented to the client via the existing MySQL client-communication protocols. This form of execution is called a pipeline because of the way nodes are traversed, passing a single node through the tree and thus through all of the operations in the query.

Designing the Tests

Creating comprehensive tests for a query execution would require writing SQL statements that cover all possible paths through the optimizer. In essence, you would need to create a test that tests all possible queries, both valid and invalid. The DBXP execution is incomplete, however. Although the project and restrict operations are fully implemented, only the inner join is implemented in the do_join() method. This permits you to create your own implementations for the remaining join operations in the stable DBXP environment.

With this in mind, let’s design a few basic queries that exercise the execution engine to see how the DBXP engine processes queries. Listing 14-18 shows a sample test to exercise the query optimizer. Feel free to add your own queries to test the limits of the DBXP engine.

Listing 14-18. Sample DBXP Query-execution Test (Ch14.test)

#
# Sample test to test the DBXP_SELECT execution
#

# Test 1:
DBXP_SELECT first_name, last_name, sex, id FROM staff;

# Test 2:
DBXP_SELECT id FROM staff;

# Test 3:
DBXP_SELECT dir_name FROM directorate;

# Test 4a:
DBXP_SELECT id, dir_name FROM staff
JOIN directorate ON staff.mgr_id = directorate.dir_head_id;

# Test 4b:
DBXP_SELECT id, dir_name FROM staff, directorate
WHERE staff.mgr_id = directorate.dir_head_id;

# Test 5:
DBXP_SELECT * FROM staff WHERE staff.id = '123456789';

# Test 6:
DBXP_SELECT first_name, last_name FROM staff join directorate ON staff.mgr_id = directorate.dir_head_id
WHERE directorate.dir_code = 'N41';

# Test 7:
DBXP_SELECT * FROM directorate JOIN building ON directorate.dir_code = building.dir_code;

# Test 8:
DBXP_SELECT directorate.dir_code, dir_name, building, dir_head_id
FROM directorate JOIN building ON directorate.dir_code = building.dir_code;

Notice that there are two queries for test case 4. What do you expect to be displayed by these two different queries? Why would I include them? I include them because they are really the same query (they generate the same result set). Due to the flexibility of the SQL command structure6, it is possible to write a single query using several forms of the syntax. In this case, they should result in the same join and thus produce the same output. It also makes for a good test for an optimizer and an execution engine.

image Tip The database used in these examples is included in the Appendix.

Please refer to Chapter 4 for more details on how to create and run the test in Listing 14-18 using the MySQL Test Suite.

Updating the DBXP SELECT Command

Since we now have a means to execute queries, we can replace the code in the DBXP_select_command() method with code that runs the SELECT commands. This method will check table access, open and lock the tables, execute the query, send results to the client, and unlock the tables.Listing 14-19 shows the completed DBXP_select_command().

Listing 14-19. Completed DBXP SELECT Command

/*
Perform Select Command

SYNOPSIS
DBXP_select_command()
THD *thd IN the current thread

DESCRIPTION
This method executes the SELECT command using the query tree and optimizer.

RETURN VALUE
Success = 0
Failed = 1
*/
int DBXP_select_command(THD *thd)
{
bool res;
READ_RECORD *record;
select_result *result = thd->lex->result;

DBUG_ENTER("DBXP_select_command");

/* Prepare the tables (check access, locks) */
res = check_table_access(thd, SELECT_ACL, thd->lex->query_tables, 0, 1, 1);
if (res)
DBUG_RETURN(1);
res = open_and_lock_tables(thd, thd->lex->query_tables, 0,
MYSQL_LOCK_IGNORE_TIMEOUT);
if (res)
DBUG_RETURN(1);

/* Create the query tree and optimize it */
Query_tree *qt = build_query_tree(thd, thd->lex,
(TABLE_LIST*) thd->lex->select_lex.table_list.first);
qt->heuristic_optimization();
qt->cost_optimization();
qt->prepare(qt->root);
if (!(result= new select_send()))
DBUG_RETURN(1);

/* use the protocol class to communicate to client */
Protocol *protocol= thd->protocol;

/* write the field list for returning the query results */
if (protocol->send_result_set_metadata(&qt->result_fields,
Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF))
DBUG_RETURN(1);

/* pulse the execution engine to get a row from the result set */
while (!qt->Eof(qt->root))
{
record = qt->get_next(qt->root);
if (record != NULL)
{
/* send the data to the client */
send_data(protocol, qt->result_fields, thd);
}
}
my_eof(thd);

/* unlock tables and cleanup memory */
qt->cleanup(qt->root);
mysql_unlock_read_tables(thd, thd->lock);
delete qt;
DBUG_RETURN(0);
}

This implementation now has all the elements necessary to execute queries. It begins with checking table access and opening the tables. Assuming these steps complete successfully, the DBXP query-engine calls are next, beginning with building the query tree and then optimizing, and, finally, executing the query in a loop.

Notice that the loop is a simple while not end of file loop that calls the get_next() method on the root node. If a tuple (record) is returned, the code writes the row to the client; otherwise, it calls the get_next() method until the end of the file is detected. When all tuples have been processed, the code frees all used memory and unlocks the tables.

Since I placed the code that sends data to the client in one place outside the query-tree methods, the implementation for all of the relational operations is simplified a bit. As you will see in the following section, the query-tree methods resemble those of the theoretical algorithms.

The DBXP Algorithms

Now that the code to operate the DBXP query engine is complete, let’s turn our attention to how the DBXP Query_tree class implements the relational operations.

Project

The DBXP project operation is implemented in a method called do_project() of the Query_tree class. This method is easy to implement, because the MySQL base classes provide a fast way to do the projection. Instead of looping through the attributes in the row, we can use the MySQL base classes to send the data to the client.

The do_project() method can be simplified to just store the current row in the buffer and return the row to the next node in the tree. When control returns to the DBXP_select_command() method, a helper method named send_data() is used to send the data to the client.Listing 14-20 shows the code for the do_project() method.

Listing 14-20. DBXP Project Method

/*
Perform project operation.

SYNOPSIS
do_project()
query_node *qn IN the operational node in the query tree.
READ_RECORD *t -- the tuple to apply the operation to.

DESCRIPTION
This method performs the relational model operation entitled "project".
This operation is a narrowing of the result set vertically by
restricting the set of attributes in the output tuple.

NOTES
Returns 0 (null) if no tuple satisfies child operation (does NOT indicate
the end of the file or end of query operation. Use Eof() to verify.

RETURN VALUE
Success = new tuple with correct attributes
Failed = NULL
*/
READ_RECORD *Query_tree::do_project(query_node *qn, READ_RECORD *t)
{
DBUG_ENTER("do_project");
if (t != NULL)
{
if (qn == root)

/*
If the left table isn't NULL, copy the record buffer from
the table into the record buffer of the relations class.
This completes the read from the storage engine and now
provides the data for the projection (which is accomplished
in send_data().
*/
if (qn->relations[0] != NULL)
memcpy((uchar *)qn->relations[0]->table->record[0],
(uchar *)t->rec_buf,
qn->relations[0]->table->s->rec_buff_length);
}
DBUG_RETURN(t);
}

Notice that in this code, all that must be done is copying the data read from the storage engine into the record buffer of the table object. I accomplish this by copying the memory from the READ_RECORD read from the storage engine into the table’s first READ_RECORD buffer, copying in the number of bytes specified in the rec_buff_length attribute of the table.

Restrict

The DBXP restrict operation is implemented in a method called do_restrict() of the Query_tree class. The code uses the where_expr member variable of the Query_tree class that contains an instantiation of the Expression helper class. The implementation of the restrict operation is therefore simplified to calling the evaluate() method of the Expression class. Listing 14-21 shows the code for the do_restrict() method.

Listing 14-21. DBXP Restrict Method

/*
Perform restrict operation.

SYNOPSIS
do_restrict()
query_node *qn IN the operational node in the query tree.
READ_RECORD *t -- the tuple to apply the operation to.

DESCRIPTION
This method performs the relational model operation entitled "restrict".
This operation is a narrowing of the result set horizontally by
satisfying the expressions listed in the where clause of the SQL
statement being executed.

RETURN VALUE
Success = true
Failed = false
*/
bool Query_tree::do_restrict(query_node *qn, READ_RECORD *t)
{
bool found = false;

DBUG_ENTER("do_restrict");
if (qn != NULL)
{
/*
If the left table isn't NULL, copy the record buffer from
the table into the record buffer of the relations class.
This completes the read from the storage engine and now
provides the data for the projection (which is accomplished
in send_data().

Lastly, evaluate the where clause. If the where clause
evaluates to true, we keep the record else we discard it.
*/
if (qn->relations[0] != NULL)
memcpy((uchar *)qn->relations[0]->table->record[0], (uchar *)t->rec_buf,
qn->relations[0]->table->s->rec_buff_length);
if (qn->where_expr != NULL)
found = qn->where_expr->evaluate(qn->relations[0]->table);
}
DBUG_RETURN(found);
}

When a match is found, the data are copied to the record buffer of the table. This associates the data in the current record buffer with the table. It also allows the use of the many MySQL methods to manipulate fields and send data to the client.

Join

The DBXP join operation is implemented in a method called do_join() of the Query_tree class. The code uses the join_expr member variable of the Query_tree class that contains an instantiation of the Expression helper class. The implementation of the evaluation of the join conditions is therefore simplified to calling the evaluate() method of the Expression class.

This method is the most complex of all of the DBXP code. The reason for the complexity is due in part to the many conditions under which a join must be evaluated. The theoretical join algorithm described previously and the examples shown illustrate the complexity. I will expand on that a bit here in preparation for your examination of the do_join() source code. Listing 14-22 presents the simplified pseudocode for the do_join() method.

Listing 14-22. The DBXP Join Algorithm

begin
if preempt_pipeline
do
if no left child
get next tuple from left relation
else
get next tuple from left child
fi
insert tuple in left buffer in order by join column for left relation
until eof
do
if no right child
get next tuple from right relation
else
get next tuple from right child
fi
insert tuple in right buffer in order by join column for right relation
until eof
fi
if left record pointer is NULL
get next tuple from left buffer
fi
if right record pointer is NULL
get next tuple from right buffer
fi
if there are tuples to process
write attribute data of both tuples to table record buffers
if join column values match join conditions
check rewind conditions
clear record pointers
check for end of file
set return record to left record pointer (indicates a match)
else if left join value < right tuple join value
set return record to NULL (no match)
set left record pointer to NULL
else if left join value > right tuple join value
set return record to NULL (no match)
set right record pointer to NULL
fi
else
set return record to NULL (no match)
fi
end

Since the join method is called repeatedly from the get_next() method, the algorithm has been altered to use the preempt_pipeline member variable from the query_node. This variable is set to TRUE during the prepare() method prior to executing the query tree. This allows the join method to detect when the first call is made so that the temporary buffers can be created. In this way, the traversal of the tree is pre-empted until the join operation completes for the first match (or the end of the file if no matches).

Notice that the algorithm uses two buffers to store the ordered rows from the incoming tables. These buffers are used to read records for the join operation, and they are represented using a record pointer for each buffer. If a match is found, both record pointers are set to NULL, which forces the code to read the next record. If the evaluation of the join condition indicates that the join value from the left table is less than the right, the left record pointer is set to NULL so that on the next call to the do_join() method, the next record is read from the left record buffer. Similarly, if the left join value is greater than the right, the right record pointer is set to NULL and on the next call a new record is read from the right record buffer.

Now that the basics of the do_join() method have been explained, take a look at the source code. Listing 14-23 shows the code for the do_join() method.

image Note I chose to not use a helper function to create the temporary buffers for the first step of the join operation so that I could keep the code together for easier debugging. Thus, the decision was purely for convenience. You can save a bit of code if you want by making this part of the code a helper function.

Listing 14-23. DBXP Join Method

/*
Perform join operation.

SYNOPSIS
do_join()
query_node *qn IN the operational node in the query tree.
READ_RECORD *t -- the tuple to apply the operation to.

DESCRIPTION
This method performs the relational model operation entitled
"join". This operation is the combination of two relations to
form a composite view. This algorithm implements ALL variants
of the join operation.

NOTES
Returns 0 (null) if no tuple satisfies child operation (does
NOT indicate the end of the file or end of query operation.
Use Eof() to verify.

RETURN VALUE
Success = new tuple with correct attributes
Failed = NULL
*/
READ_RECORD *Query_tree::do_join(query_node *qn)
{
READ_RECORD *next_tup=0;
int i;
TABLE *ltable = NULL;
TABLE *rtable = NULL;
Field *fright = NULL;
Field *fleft = NULL;
record_buff *lprev=0;
record_buff *rprev=0;
expr_node *expr;

DBUG_ENTER("do_join");
if (qn == NULL)
DBUG_RETURN(NULL);

/* check join type because some joins require other processing */
switch (qn->join_type)
{
case (jnUNKNOWN) :
break;
case (jnINNER) :
case (jnLEFTOUTER) :
case (jnRIGHTOUTER) :
case (jnFULLOUTER) :
{

/*
preempt_pipeline == true means we need to stop the pipeline
and sort the incoming rows. We do that by making an in-memory
copy of the record buffers stored in left_record_buff and
right_record_buff
*/
if (qn->preempt_pipeline)
{
left_record_buff = NULL;
right_record_buff = NULL;
next_tup = NULL;

/* Build buffer for tuples from left child. */
do
{
/* if left child exists, get row from it */
if (qn->left != NULL)
lbuff = get_next(qn->left);

/* else, read the row from the table (the storage handler) */
else
{
/*
Create space for the record buffer and
store pointer in lbuff
*/
lbuff = (READ_RECORD *) my_malloc(sizeof(READ_RECORD),
MYF(MY_ZEROFILL | MY_WME));
lbuff->rec_buf =
(uchar *) my_malloc(qn->relations[0]->table->s->rec_buff_length,
MYF(MY_ZEROFILL | MY_WME));

/* check for end of file. Store result in eof array */
qn->eof[0] =
qn->relations[0]->table->file->ha_rnd_next(lbuff->rec_buf);
if (qn->eof[0] != HA_ERR_END_OF_FILE)
qn->eof[0] = false;
else
{
lbuff = NULL;
qn->eof[0] = true;
}
}
/* if the left buffer is not null, get a new row from table */
if (lbuff != NULL)
{
/* we need the table information for processing fields */
if (qn->left == NULL)
ltable = qn->relations[0]->table;
else
ltable = get_table(qn->left);
if (ltable != NULL)
memcpy((uchar *)ltable->record[0], (uchar *)lbuff->rec_buf,
ltable->s->rec_buff_length);

/* get the join expression */
expr = qn->join_expr->get_expression(0);
for (Field **field = ltable->field; *field; field++)
if (strcasecmp((*field)->field_name, ((Field *)expr->left_op)->field_name)==0)
fleft = (*field);

/*
If field was found, add the row to the in-memory buffer
ordered by the join column.
*/
if ((fleft != NULL) && (!fleft->is_null()))
insertion_sort(true, fleft, lbuff);
}
} while (lbuff != NULL);
/* Build buffer for tuples from right child. */
do
{
/* if right child exists, get row from it */
if (qn->right != NULL)
rbuff = get_next(qn->right);

/* else, read the row from the table (the storage handler) */
else
{
/*
Create space for the record buffer and
store pointer in rbuff
*/
rbuff = (READ_RECORD *) my_malloc(sizeof(READ_RECORD),
MYF(MY_ZEROFILL | MY_WME));
rbuff->rec_buf =
(uchar *) my_malloc(qn->relations[0]->table->s->rec_buff_length,
MYF(MY_ZEROFILL | MY_WME));

/* check for end of file. Store result in eof array */
qn->eof[1] =
qn->relations[1]->table->file->ha_rnd_next(rbuff->rec_buf);
if (qn->eof[1] != HA_ERR_END_OF_FILE)
qn->eof[1] = false;
else
{
rbuff = NULL;
qn->eof[1] = true;
}
}
/* if the right buffer is not null, get a new row from table */
if (rbuff != NULL)
{
/* we need the table information for processing fields */
if (qn->right == NULL)
rtable = qn->relations[1]->table;
else
rtable = get_table(qn->right);
if (rtable != NULL)
memcpy((uchar *)rtable->record[0], (uchar *)rbuff->rec_buf,
rtable->s->rec_buff_length);

/* get the join expression */
expr = qn->join_expr->get_expression(0);
for (Field **field = rtable->field; *field; field++)
if (strcasecmp((*field)->field_name, ((Field *)expr->right_op)->field_name)==0)
fright = (*field);

/*
If field was found, add the row to the in-memory buffer
ordered by the join column.
*/
if ((fright != NULL) && (!fright->is_null()))
insertion_sort(false, fright, rbuff);
}
} while (rbuff != NULL);
left_record_buffer_ptr = left_record_buff;
right_record_buffer_ptr = right_record_buff;
qn->preempt_pipeline = false;
}
/*
This is where the actual join code begins.
We get a tuple from each table and start the compare.
*/

/*
if lbuff is null and the left record buffer has data
get the row from the buffer
*/
if ((lbuff == NULL) && (left_record_buffer_ptr != NULL))
{
lbuff = left_record_buffer_ptr->record;
lprev = left_record_buffer_ptr;
left_record_buffer_ptr = left_record_buffer_ptr->next;
}

/*
if rbuff is null and the right record buffer has data
get the row from the buffer
*/
if ((rbuff == NULL) && (right_record_buffer_ptr != NULL))
{
rbuff = right_record_buffer_ptr->record;
rprev = right_record_buffer_ptr;
right_record_buffer_ptr = right_record_buffer_ptr->next;
}

/*
if the left buffer was null, check to see if a row is
available from left child.
*/
if (ltable == NULL)
{
if (qn->left == NULL)
ltable = qn->relations[0]->table;
else
ltable = get_table(qn->left);
}
/*
if the right buffer was null, check to see if a row is
available from right child.
*/
if (rtable == NULL)
{
if (qn->right == NULL)
rtable = qn->relations[1]->table;
else
rtable = get_table(qn->right);
}
/*
If there are two rows to compare, copy the record buffers
to the table record buffers. This transfers the data
from the internal buffer to the record buffer. It enables
us to reuse the MySQL code for manipulating fields.
*/
if ((lbuff != NULL) && (rbuff != NULL))
{
memcpy((uchar *)ltable->record[0], (uchar *)lbuff->rec_buf,
ltable->s->rec_buff_length);
memcpy((uchar *)rtable->record[0], (uchar *)rbuff->rec_buf,
rtable->s->rec_buff_length);

/* evaluate the join condition */
i = qn->join_expr->compare_join(qn->join_expr->get_expression(0),
ltable, rtable);

/* if there is a match...*/
if (i == 0)
{
/* return the row in the next_tup pointer */
next_tup = lbuff;

/* store next rows from buffer (already advanced 1 row) */
record_buff *left = left_record_buffer_ptr;
record_buff *right = right_record_buffer_ptr;

/*
Check to see if either buffer needs to be rewound to
allow us to process many rows on one side to one row
on the other
*/
check_rewind(left_record_buffer_ptr, lprev,
right_record_buffer_ptr, rprev);

/* set poointer to null to force read on next loop */
lbuff = NULL;
rbuff = NULL;

/*
If the left buffer has been changed and if the
buffer is not at the end, set the buffer to the next row.
*/
if (left != left_record_buffer_ptr)
{
if (left_record_buffer_ptr != NULL)
{
lbuff = left_record_buffer_ptr->record;
}
}

/*
If the right buffer has been changed and if the
buffer is not at the end, set the buffer to the next row.
*/
if (right != right_record_buffer_ptr)
{
if (right_record_buffer_ptr != NULL)
{
rbuff = right_record_buffer_ptr->record;
}
}

/* Now check for end of file and save results in eof array */
if (left_record_buffer_ptr == NULL)
qn->eof[2] = true;
else
qn->eof[2] = false;
if (right_record_buffer_ptr == NULL)
qn->eof[3] = true;
else
qn->eof[3] = false;
}

/* if the rows didn't match...*/
else
{
/* get next rows from buffers (already advanced) */
record_buff *left = left_record_buffer_ptr;
record_buff *right = right_record_buffer_ptr;

/*
Check to see if either buffer needs to be rewound to
allow us to process many rows on one side to one row
on the other. The results of this rewind must be
saved because there was no match and we may have to
reuse one or more of the rows.
*/
check_rewind(left_record_buffer_ptr, lprev,
right_record_buffer_ptr, rprev);

/*
If the left buffer has been changed and if the
buffer is not at the end, set the buffer to the next row
and copy the data into the record buffer/
*/
if (left != left_record_buffer_ptr)
{
if (left_record_buffer_ptr != NULL)
{
memcpy((uchar *)ltable->record[0],
(uchar *)left_record_buffer_ptr->record->rec_buf,
ltable->s->rec_buff_length);
lbuff = left_record_buffer_ptr->record;
}
}

/*
If the right buffer has been changed and if the
buffer is not at the end, set the buffer to the next row
and copy the data into the record buffer/
*/
if (right_record_buffer_ptr != NULL)
if ((right_record_buffer_ptr->next == NULL) &&
(right_record_buffer_ptr->prev == NULL))
lbuff = NULL;
if (right != right_record_buffer_ptr)
{
if (right_record_buffer_ptr != NULL)
{
memcpy((uchar *)rtable->record[0],
(uchar *)right_record_buffer_ptr->record->rec_buf,
rtable->s->rec_buff_length);
rbuff = right_record_buffer_ptr->record;
}
}

/* Now check for end of file and save results in eof array */
if (left_record_buffer_ptr == NULL)
qn->eof[2] = true;
else
qn->eof[2] = false;
if (right_record_buffer_ptr == NULL)
qn->eof[3] = true;
else
qn->eof[3] = false;
next_tup = NULL;
}
}
else
{
next_tup = NULL; /* at end, return null */
}
break;
}

/* placeholder for exercise... */
case (jnCROSSPRODUCT) :
{
break;
}
/*
placeholder for exercises...
Union and intersect are mirrors of each other -- same code will
work for both except the dupe elimination/inclusion part (see below)
*/
case (jnUNION) :
case (jnINTERSECT) :
{
break;
}
}
DBUG_RETURN(next_tup);
}

Notice in the code that under any condition other than a match, the record returned from the code is set to NULL. This allows the loop in the get_next() method to repeatedly call the do_join() method until a match is returned. This is similar to the way the do_restrict()method call is made.

I have not implemented the code for any of the other join operations. The main reason is that it allows you to experiment with the code (see the exercises at end of this chapter). Fortunately, you should find that the code can be modified with a few simple alterations to allow the processing of the outer joins. Adding code for the cross-product, union, and intersect operations can be accomplished by implementing the theoretical algorithm described in the first part of this chapter.

After you have studied the pseudocode for the method, you should find reading the code easier. The most complex part of this code is the check_rewind() method. This is implemented as a function in the class in order to make the code less complex and easier to read. Several other helper methods are described in more detail in the following section.

Other Methods

Several helper methods make up the DBXP execution engine. Table 14-1 lists the new methods and their uses. The more complex methods are described in more detail in the text that follows.

Table 14-1. The DBXP Execution Engine Helper Methods

Class::Method

Description

Query_tree::get_next()

Retrieves next tuple from child node.

Query_tree::insertion_sort()

Creates an ordered buffer of READ_RECORD pointers. Used in the join operations for ordering the incoming tuples.

Query_tree::eof()

Checks for the end-of-file condition for the storage engine or temporary buffers.

Query_tree::check_rewind()

Checks to see if the record buffers need to be adjusted to reread tuples for multiple matches.

send_data()

Sends data to the client. See sql_dbxp_parse.cc.

Expression::evaluate()

Evaluates the WHERE clause for a restrict operation.

Expression::compare_join()

Evaluates the join condition for a join operation.

Handler::rnd_init()

Initializes read from storage engine (see Chapter 10).

Handler::rnd_next()

Reads the next tuple from storage engine (see Chapter 10).

The get_next() Method

The get_next() method is the heart of the query-execution flow in DBXP. It is responsible for calling the do_... methods that implement the query operations. It is called once from the while loop in the DBXP_select_command() method. Once this method is initiated the first time, it performs the operation for the current node, calling the children nodes to get their result. The process is repeated in a recursive fashion until all the children in the current node have returned a single tuple. Listing 14-24 shows the code for the get_next() method.

Listing 14-24. The get_next() Method

/*
Get the next tuple (row) in the result set.

SYNOPSIS
Eof()
query_node *qn IN the operational node in the query tree.

DESCRIPTION
This method is used to get the next READ_RECORD from the pipeline.
The idea is to call prepare() after you've validated the query then call
get_next to get the first tuple in the pipeline.

RETURN VALUE
Success = next tuple in the result set
Failed = NULL
*/
READ_RECORD *Query_tree::get_next(query_node *qn)
{
READ_RECORD *next_tup = NULL;
DBUG_ENTER("get_next");

/*
For each of the possible node types, perform the query operation
by calling the method for the operation. These implement a very
high-level abstraction of the operation. The real work is left
to the methods.
*/
switch (qn->node_type)
{
/* placeholder for exercises... */
case Query_tree::qntDistinct :
break;

/* placeholder for exercises... */
case Query_tree::qntUndefined :
break;

/* placeholder for exercises... */
case Query_tree::qntSort :
if (qn->preempt_pipeline)
qn->preempt_pipeline = false;
break;

/*
For restrict, get a row (tuple) from the table and
call the do_restrict method looping until a row is returned
(data matches conditions), then return result to main loop
in DBXP_select_command.
*/
case Query_tree::qntRestrict :
do
{
/* if there is a child, get row from child */
if (qn->left != NULL)
next_tup = get_next(qn->left);

/* else get the row from the table stored in this node */
else
{
/* create space for the record buffer */
if (next_tup == NULL)
next_tup = (READ_RECORD *) my_malloc(sizeof(READ_RECORD),
MYF(MY_ZEROFILL | MY_WME));
next_tup->rec_buf = (uchar *) my_malloc(qn->relations[0]->table->s->rec_buff_length,
MYF(MY_ZEROFILL | MY_WME));

/* read row from table (storage handler */
qn->eof[0] = qn->relations[0]->table->file->ha_rnd_next(next_tup->rec_buf);

/* check for end of file */
if (qn->eof[0] != HA_ERR_END_OF_FILE)
qn->eof[0] = false;
else
{
qn->eof[0] = true;
next_tup = NULL;
}
}

/* if there is a row, call the do_restrict method */
if (next_tup)
if(!do_restrict(qn, next_tup))
{
/* if no row to return, free memory used */
my_free(next_tup->rec_buf);
my_free(next_tup);
next_tup = NULL;
}
} while ((next_tup == NULL) && !Eof(qn));
break;

/*
For project, get a row (tuple) from the table and
call the do_project method. If successful,
return result to main loop in DBXP_select_command.
*/
case Query_tree::qntProject :
/* if there is a child, get row from child */
if (qn->left != NULL)
{
next_tup = get_next(qn->left);
if (next_tup)
if (!do_project(qn, next_tup))
{
/* if no row to return, free memory used */
my_free(next_tup->rec_buf);
my_free(next_tup);
next_tup = NULL;
}
}

/* else get the row from the table stored in this node */
else
{
/* create space for the record buffer */
if (next_tup == NULL)
next_tup = (READ_RECORD *) my_malloc(sizeof(READ_RECORD),
MYF(MY_ZEROFILL | MY_WME));
next_tup->rec_buf = (uchar *) my_malloc(qn->relations[0]->table->s->rec_buff_length + 20,
MYF(MY_ZEROFILL | MY_WME));

/* read row from table (storage handler) */
qn->eof[0] = qn->relations[0]->table->file->ha_rnd_next(next_tup->rec_buf);

/* check for end of file */
if (qn->eof[0] != HA_ERR_END_OF_FILE)
{
qn->eof[0] = false;
}
else
{
qn->eof[0] = true;
next_tup = NULL;
}

/* if there is a row, call the do_project method */
if (next_tup)
{
if (!do_project(qn, next_tup))
{
/* no row to return, free memory used */
my_free(next_tup->rec_buf);
my_free(next_tup);
next_tup = NULL;
}
}
}
break;

/*
For join, loop until either a row is returned from the
do_join method or we are at end of file for both tables.
If successful (data matches conditions),
return result to main loop in DBXP_select_command.
*/
case Query_tree::qntJoin :
do
{
if (next_tup)
{
/* if no row to return, free memory used */
my_free(next_tup->rec_buf);
my_free(next_tup);
next_tup = NULL;
}
next_tup = do_join(qn);
}
while ((next_tup == NULL) && !Eof(qn));
break;
}
DBUG_RETURN(next_tup);
}

The send_data() Method

The send_data() method is a helper router that writes data to the client using the MySQL Protocol class to handle the communication chores. This method was borrowed from the MySQL source code and rewritten slightly to accommodate the (relative) simplistic execution of the DBXP execution engine. In this case, the Item superclass is used to send the field values to the client using the item->send() method. Listing 14-25 shows the code for the send_data() method.

Listing 14-25. The send_data() Method

/*
Send data

SYNOPSIS
send_data()
Protocol *p IN the Protocol class
THD *thd IN the current thread
List<Item> *items IN the list of fields identified in the row

DESCRIPTION
This method sends the data to the clien using the protocol class.

RETURN VALUE
Success = 0
Failed = 1
*/
bool send_data(Protocol *protocol, List<Item> &items, THD *thd)
{

/* use a list iterator to loop through items */
List_iterator_fast<Item> li(items);

char buff[MAX_FIELD_WIDTH];
String buffer(buff, sizeof(buff), &my_charset_bin);
DBUG_ENTER("send_data");

/* this call resets the transmission buffers */
protocol->prepare_for_resend();

/* for each item in the list (a field), send data to the client */
Item *item;
while ((item=li++))
{
/*
Use the MySQL send method for the item class to write to network.
If unsuccessful, free memory and send error message to client.
*/
if (item->send(protocol, &buffer))
{
protocol->free(); /* Free used buffer */
my_message(ER_OUT_OF_RESOURCES, ER(ER_OUT_OF_RESOURCES), MYF(0));
break;
}
}
/* increment row count */
thd->inc_sent_row_count(1);

/* if network write was ok, return */
if (thd->vio_ok())
DBUG_RETURN(protocol->write());

DBUG_RETURN(0);
}

The method uses the item class, calling the send() method and passing in a pointer to an instance of the protocol class. This is how data for a field item is written to the client. The send_data() method is one in which the projection and join column lists are processed to complete the operations. This is one of the nicest touches in the MySQL source code. But, how do the MySQL classes know what columns to send? Take a look back at the build_query_tree() method. Recall that there is a list identified in the select_lex class. The DBXP code captures these fields in the line of code shown here. This list is directly from the columns list in the SELECT command and populated by the parser code.

qt->result_fields = lex->select_lex.item_list;

These fields are captured in the thread extended structure. The MySQL code simply writes out any data that are present in this list of fields.

The check_rewind() Method

This method is the part of the join algorithms that is most often omitted in database texts. The method adjusts the buffers for rows coming from the tables to allow the algorithm to reuse processed rows. This is necessary because one row from one table may match more than one row from another. While the concept of the method is relatively straightforward, it can be a challenge to write the code yourself. Fortunately, I’ve saved you the trouble.

The code works by examining the rows in the record buffers. It takes as input pointers to the record buffers along with the previous record pointer in the buffer. The record buffer is implemented as a doubly linked list to allow movement forward and back through the buffers.

This code must process several conditions in order to keep the flow of data to the do_join() method. These conditions are the result of the evaluation of the join condition(s) after a match has been detected. The result of a failed match is handled in the do_join() method.

· If the next record in the left buffer is a match to the right buffer, rewind the right buffer until the join condition of the right buffer is less than the left.

· If the next record in the left buffer is not a match to the right buffer, set the right buffer to the previous right record pointer.

· If the left record buffer is at the end and there are still records in the right buffer, and if the join value of the previous left record pointer is a match to the right record pointer, set the left record pointer to the previous left record pointer.

The method is implemented with a bias to the left record buffer. In other words, the code keeps the right buffer synchronized with the left buffer (also called a left-deep join execution). Listing 14-26 shows the code for the check_rewind() method.

Listing 14-26. The check_rewind() Method

/*
Adjusts pointers to record buffers for join.

SYNOPSIS
check_rewind()
record_buff *cur_left IN the left record buffer
record_buff *cur_left_prev IN the left record buffer previous
record_buff *cur_right IN the left record buffer
record_buff *cur_right_prev IN the left record buffer previous

DESCRIPTION
This method is used to push a tuple back into the buffer
during a join operation that preempts the pipeline.

NOTES
Now, here's where we have to check the next tuple in each
relation to see if they are the same. If one of them is the
same and the other isn't, push one of them back.

We need to rewind if one of the following is true:
1. The next record in R2 has the same join value as R1
2. The next record in R1 has the same join value as R2
3. The next record in R1 has the same join value and R2 is
different (or EOF)
4. The next record in R2 has the same join value and R1 is
different (or EOF)

RETURN VALUE
Success = int index number
Failed = -1
*/
int Query_tree::check_rewind(record_buff *cur_left,
record_buff *curr_left_prev,
record_buff *cur_right,
record_buff *curr_right_prev)
{
record_buff *left_rcd_ptr = cur_left;
record_buff *right_rcd_ptr = cur_right;
int i;
DBUG_ENTER("check_rewind");

/*
If the next tuple in right record is the same as the present tuple
AND the next tuple in right record is different, rewind until
it is the same
else
Push left record back.
*/

/* if both buffers are at EOF, return -- nothing to do */
if ((left_rcd_ptr == NULL) && (right_rcd_ptr == NULL))
DBUG_RETURN(0);

/* if the currently processed record is null, get the one before it */
if (cur_right == NULL)
right_rcd_ptr = curr_right_prev;

/*
if left buffer is not at end, check to see
if we need to rewind right buffer
*/
if (left_rcd_ptr != NULL)
{
/* compare the join conditions to check order */
i = memcmp(left_rcd_ptr->field_ptr, right_rcd_ptr->field_ptr,
left_rcd_ptr->field_length < right_rcd_ptr->field_length ?
left_rcd_ptr->field_length : right_rcd_ptr->field_length);

/*
i == 0 means the rows are the same. In this case, we need to
check to see if we need to advance or rewind the right buffer.
*/
if (i == 0)
{
/*
If there is a next row in the right buffer, check to see
if it matches the left row. If the right row is greater
than the left row, rewind the right buffer to one previous
to the current row or until we hit the start.
*/
if (right_rcd_ptr->next != NULL)
{
right_rcd_ptr = right_rcd_ptr->next;
i = memcmp(left_rcd_ptr->field_ptr, right_rcd_ptr->field_ptr,
left_rcd_ptr->field_length < right_rcd_ptr->field_length ?
left_rcd_ptr->field_length : right_rcd_ptr->field_length);
if (i > 0)
{
do
{
if (right_rcd_ptr->prev != NULL)
{
right_rcd_ptr = right_rcd_ptr->prev;
i = memcmp(left_rcd_ptr->field_ptr, right_rcd_ptr->field_ptr,
left_rcd_ptr->field_length < right_rcd_ptr->field_length ?
left_rcd_ptr->field_length : right_rcd_ptr->field_length);
}
}
while ((i == 0) && (right_rcd_ptr->prev != NULL));

/* now advance one more to set pointer to correct location */
if (right_rcd_ptr->next != NULL)
right_rcd_ptr = right_rcd_ptr->next;
}
/* if no next right row, rewind to previous row */
else
right_rcd_ptr = right_rcd_ptr->prev;
}
/*
If there is a next row in the left buffer, check to see
if it matches the right row. If there is a match and the right
buffer is not at start, rewind the right buffer to one previous
to the current row.
*/
else if (left_rcd_ptr->next != NULL)
{
if (right_rcd_ptr->prev != NULL)
{
i = memcmp(left_rcd_ptr->field_ptr, right_rcd_ptr->prev->field_ptr,
left_rcd_ptr->field_length < right_rcd_ptr->prev->field_length ?
left_rcd_ptr->field_length : right_rcd_ptr->prev->field_length);
}
if ((i == 0) && (right_rcd_ptr->prev != NULL))
right_rcd_ptr = right_rcd_ptr->prev;
}
}
/* if the left row is less than right row, rewind right buffer */
else if (i < 0)
{
if (right_rcd_ptr->prev != NULL)
right_rcd_ptr = right_rcd_ptr->prev;
}
/* if the right row is less than the left row, advance right row */
else
{
if (right_rcd_ptr->next != NULL)
right_rcd_ptr = right_rcd_ptr->next;
}
}
/*
Rows don't match, so advance the right buffer and check match again.
if they still match, rewind left buffer.
*/
else
{
if (right_rcd_ptr->next != NULL)
{
i = memcmp(curr_left_prev->field_ptr, right_rcd_ptr->field_ptr,
curr_left_prev->field_length < right_rcd_ptr->field_length ?
curr_left_prev->field_length : right_rcd_ptr->field_length);
if (i == 0)
left_rcd_ptr = curr_left_prev;
}
}
/* set buffer pointers to adjusted rows from buffers */
left_record_buffer_ptr = left_rcd_ptr;
right_record_buffer_ptr = right_rcd_ptr;
DBUG_RETURN(0);
}

Now that you’ve had a close look at the source code for the DBXP query execution, it’s time to compile the code and take it for a test ride.

Compiling and Testing the Code

If you haven’t already, download the source code for this chapter and place the files in the /sql directory off the root of your source tree. In the example code, you will also find a difference file you can use to apply the changes to the server source code files (e.g. mysqld.cc sql_cmd.h, etc.). Or you can use the changes from Chapter 13 as the changes to the server code is the same.

Spend a few moments looking through the source code so that you are familiar with the methods. Taking the time to look through the code now will help should you need to debug the code to work with your configuration or if you want to add other enhancements or work the exercises.

image Tip See Chapter 11 for details on how to add the source files to the project and compile them.

Once you have the new code installed and compiled, you can run the server and perform the tests. You can run the test you created earlier, or you can enter the commands in a MySQL client utility. Listing 14-27 shows the expected output of running the commands listed in the test.

Listing 14-27. Example Test Runs

mysql> SELECT DBXP first_name, last_name, sex, id FROM staff;

+------------+------------+------+-----------+
| first_name | last_name | sex | id |
+------------+------------+------+-----------+
| John | Smith | M | 333445555 |
| William | Walters | M | 123763153 |
| Alicia | St.Cruz | F | 333444444 |
| Goy | Hong | F | 921312388 |
| Rajesh | Kardakarna | M | 800122337 |
| Monty | Smythe | M | 820123637 |
| Richard | Jones | M | 830132335 |
| Edward | Engles | M | 333445665 |
| Beware | Borg | F | 123654321 |
| Wilma | Maxima | F | 123456789 |
+------------+------------+------+-----------+
10 rows in set (0.01 sec)

mysql> DBXP_SELECT id FROM staff;
+-----------+
| id |
+-----------+
| 333445555 |
| 123763153 |
| 333444444 |
| 921312388 |
| 800122337 |
| 820123637 |
| 830132335 |
| 333445665 |
| 123654321 |
| 123456789 |
+-----------+
10 rows in set (0.00 sec)

mysql> DBXP_SELECT dir_name FROM directorate;
+-----------------+
| dir_name |
+-----------------+
| Development |
| Human Resources |
| Management |
+-----------------+
3 rows in set (0.00 sec)

mysql> DBXP_SELECT id, dir_name FROM staff
JOIN directorate ON staff.mgr_id = directorate.dir_head_id;
+-----------+-----------------+
| id | dir_name |
+-----------+-----------------+
| 123763153 | Human Resources |
| 921312388 | Human Resources |
| 333445555 | Management |
| 123654321 | Management |
| 800122337 | Development |
| 820123637 | Development |
| 830132335 | Development |
| 333445665 | Development |
| 123456789 | Development |
+-----------+-----------------+
9 rows in set (0.00 sec)

mysql> DBXP_SELECT id, dir_name FROM staff, directorate
WHERE staff.mgr_id = directorate.dir_head_id;
+-----------+-----------------+
| id | dir_name |
+-----------+-----------------+
| 123763153 | Human Resources |
| 921312388 | Human Resources |
| 333445555 | Management |
| 123654321 | Management |
| 800122337 | Development |
| 820123637 | Development |
| 830132335 | Development |
| 333445665 | Development |
| 123456789 | Development |
+-----------+-----------------+
9 rows in set (0.00 sec)

mysql> DBXP_SELECT * FROM staff WHERE staff.id = '123456789';
+-----------+------------+----------+-----------+------+--------+-----------+
| id | first_name | mid_name | last_name | sex | salary | mgr_id |
+-----------+------------+----------+-----------+------+--------+-----------+
| 123456789 | Wilma | N | Maxima | F | 43000 | 333445555 |
+-----------+------------+----------+-----------+------+--------+-----------+
1 row in set (0.00 sec)

mysql> DBXP_SELECT first_name, last_name FROM staff join directorate ON staff.mgr_id = directorate.dir_head_id WHERE directorate.dir_code = 'N41';
+------------+------------+
| first_name | last_name |
+------------+------------+
| Rajesh | Kardakarna |
| Monty | Smythe |
| Richard | Jones |
| Edward | Engles |
| Wilma | Maxima |
+------------+------------+
5 rows in set (0.00 sec)

mysql> DBXP_SELECT * FROM directorate JOIN building ON directorate.dir_code = building.dir_code;
+----------+-----------------+-------------+----------+----------+
| dir_code | dir_name | dir_head_id | dir_code | building |
+----------+-----------------+-------------+----------+----------+
| M00 | Management | 333444444 | M00 | 1000 |
| N01 | Human Resources | 123654321 | N01 | 1453 |
| N41 | Development | 333445555 | N41 | 1300 |
| N41 | Development | 333445555 | N41 | 1301 |
| N41 | Development | 333445555 | N41 | 1305 |
+----------+-----------------+-------------+----------+----------+
5 rows in set (0.00 sec)

mysql> DBXP_SELECT directorate.dir_code, dir_name, building, dir_head_id
FROM directorate JOIN building ON directorate.dir_code = building.dir_code;
+----------+-----------------+----------+-------------+
| dir_code | dir_name | building | dir_head_id |
+----------+-----------------+----------+-------------+
| M00 | Management | 1000 | 333444444 |
| N01 | Human Resources | 1453 | 123654321 |
| N41 | Development | 1300 | 333445555 |
| N41 | Development | 1301 | 333445555 |
| N41 | Development | 1305 | 333445555 |
+----------+-----------------+----------+-------------+
5 rows in set (0.00 sec)

mysql>

Summary

I presented in this chapter the internal-database query-execution operations. You learned how to expand the concept of the query tree to incorporate a query-execution engine that uses the tree structure in the execution process. The knowledge of these technologies should provide you with a greater understanding of the DBXP engine and how it can be used to study database technologies in more depth.

You’ve reached the end of the book and may be wondering what else there is to do. This part of the book has provided you with an experimental engine based in MySQL that will allow you to explore your own implementation of the internal-database technologies. Best of all, you can tweak the DBXP code any way you wish. Perhaps you just want to experiment, but you may also want to implement the union and intersect operations, or just expand the DBXP engine to implement the full set of query features in MySQL. Whatever you choose to do with what you have learned from this section of the book, you can always amaze your friends and coworkers by implementing an alternative query engine for MySQL!

EXERCISES

The following lists several areas for further exploration. They represent the types of activities you might want to conduct as experiments (or as a class assignment) to explore relational-database technologies.

1. Complete the code for the do_join() method to support all of the join types supported in MySQL. Hint: You must be able to identify the type of join before you begin optimization. Look to the parser for details.

2. Examine the code for the check_rewind() method in the Query_tree class. Change the implementation to use temporary tables to avoid high memory usage when joining large tables.

3. Evaluate the performance of the DBXP query engine. Run multiple test runs and record execution times. Compare these results to the same queries using the native MySQL query engine. How does the DBXP engine compare to MySQL?

4. Why is the remove duplicates operation not necessary for the intersect operation? Are there any conditions where this is false? If so, what are they?

5. (advanced) MySQL does not currently support a cross-product or intersect operation (as defined by Date). Change the MySQL parser to recognize these new keywords and process queries, such SELECT * FROM A CROSS B and SELECT * FROM A INTERSECT B, and add these functions to the execution engine. Hint: See the do_join() method.

6. (advanced) Form a more complete list of test queries and examine the limitations of the DBXP engine. What modifications are necessary to broaden the capabilities of the DBXP engine?

1 For simplicity, I use the attribute/column, tuple/row, and relation/table terms interchangeably.

2 Natural joins are equi-joins with the superfluous (duplicate) condition attribute values removed.

3 Forgive my generalization. They may not be mismatches exactly, because it depends on the data stored and their interpretation.

4 Set operations (intersect, union) and sorting are implemented as specialized forms of join operations.

5 Actually, all tuples are passed by reference so the item returned is the same pointer.

6 Some would argue that this is an inherent weakness, and I am inclined to agree.