Optimizer Transformations - The Cost-Based Optimizer - Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

PART 3. The Cost-Based Optimizer

CHAPTER 13. Optimizer Transformations

We have seen throughout this book that there are many ways to transform one SQL query (or DML subquery) into another without in any way changing the query’s meaning. For all but the most trivial of queries the CBO will consider a number of such transformations, sometimes estimating costing, to determine the best SQL statement on which to perform the final state optimization that we just covered in Chapter 12. This chapter is dedicated to the topic of optimizer transformations; you will see that there are a bewildering number of such transformations. Although not all optimizer transformations are included in this chapter, the vast majority that you will encounter in your day-to-day life are, as well as some that you will rarely encounter.

You may wonder why all this effort has been put into CBO transformations. After all, you may feel that it is a programmer’s responsibility to write SQL in an efficient way, and you would be correct in that feeling. However, you will see as this chapter progresses that there are some ways to rewrite a SQL statement that a mere mortal is unlikely to consider, and that these esoteric rewrites sometimes yield highly efficient execution plans.

However, some CBO transformations, such as join elimination, which I will cover later in this chapter, seem only to apply to very poorly written code. There are several reasons that such transformations are important:

· Queries often reference data dictionary views designed for a multitude of purposes. Once these views are expanded into a statement, the result is often obviously inefficient.

· Sometimes SQL code is generated by graphical tools or other language processors. Automatically generated code often seems hopelessly naïve.

· Well, there are actually quite a lot of bad SQL programmers out there!

There is a special class of transformations that I call no-brainers that I will discuss very shortly. With the exception of the no-brainers, all optimizer transformations can be disabled in one fell swoop by the NO_QUERY_TRANSFORMATION hint. This hint is sometimes useful for analysis and debugging, but I have never used it in a production SQL statement. However, in some cases, you will need to force or disable specific transformations in particular contexts. With the exception of the no-brainers, all optimizer transformations can be forced, where legal, with a hint that is specific to that transformation and can be disabled with another hint that is usually, but not always, generated by prefixing the characters NO_ to the start of the hint. For example, we have already discussed view merging in Chapter 8: the MERGE hint forces view merging while theNO_MERGE hint disables it. Two exceptions to this rule are:

· The or-expansion transformation is forced with the USE_CONCAT hint and disabled with the NO_EXPAND hint.

· Factored subquery materialization is forced with the MATERIALIZE hint and disabled with the INLINE hint.

Most of the query transformations explained in the chapter can be expressed in terms of legal SQL; in other words, you might have written the transformed SQL yourself in the first place. However, we have already seen in Chapter 11 that subquery unnesting can generate semi-joins andanti-joins that have no equivalent legal SQL syntax, and here we will discuss group by pushdown, another example of a transformation that cannot be expressed by legal SQL. Please bear in mind as you read the listings in this chapter that are designed to demonstrate individual transformations that the transformed query text may not be an exact representation. The listings show the original and, where possible, the transformed query together with the associated execution plans and the hints that force and disable the transformations in question.

The fact that the CBO can perform a number of transformations, one after the other, makes the task of grasping what the CBO is up to somewhat difficult. The good news is that, with the exception of the no-brainers, the hints that are associated with the transformations that the CBO has chosen will appear in the outline section of the plans displayed as a result of the DBMS_XPLAN functions, as I described in Chapter 8. You will, however, never see hints like NO_MERGE or NO_EXPAND in the outline section of a displayed execution plan because the presence of OUTLINEand OUTLINE_LEAF hints means that the absence of hints like MERGE or USE_CONCAT implies that the transformations haven’t been applied.

This introduction has had to make special mention of no-brainer optimizer transformations on several occasions. Let us get to those now.

No-brainer Transformations

I am sure you will not be surprised to read that Oracle does not officially refer to any of the CBO transformations as no-brainers; this is just a term that I have coined myself to refer to a special set of transformations that have some unusual properties. These transformations are so fundamental that they have no associated hints, so they cannot be forced or disabled. In fact, these transformations aren’t even disabled by the NO_QUERY_TRANSFORMATION hint. So invisible are these transformations that there are only two ways to know that they exist:

· You can wade through a 10053 trace file. That is what Christian Antognini did.

· You can read Christian’s book, Troubleshooting Oracle Performance. That is what I did!

There is some merit to discussing a couple of these no-brainers, but I won’t attempt to be exhaustive. Let us start with the count transformation.

Count Transformation

Take a look at the two queries and their associated execution plans, shown in Listing 13-1.

Listing 13-1. Count transformation

SELECT COUNT (cust_income_level) FROM sh.customers;
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | TABLE ACCESS FULL| CUSTOMERS |
----------------------------------------

SELECT COUNT (cust_id) FROM sh.customers;
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | BITMAP CONVERSION COUNT | |
| 3 | BITMAP INDEX FAST FULL SCAN| CUSTOMERS_GENDER_BIX |
--------------------------------------------------------------

Both the execution plans and the results of the two queries are different. It turns out that the CUST_INCOME_LEVEL column in SH.CUSTOMERS does not have a NOT NULL constraint and as such the column may be, and indeed is, NULL for some rows. Such rows are not counted in the first query of Listing 13-1. Since CUST_INCOME_LEVEL is not indexed, the only way to execute COUNT (CUST_INCOME_LEVEL) is to perform a full table scan. However, CUST_ID is NOT NULL and as such COUNT (CUST_ID) can be transformed to COUNT (*). We could implement COUNT (*) by performing a full table scan, but we can also implement COUNT (*) by means of a bitmap index or a B-tree index on a NOT NULL column. In the case of the second query, the index CUSTOMERS_GENDER_BIX is a bitmap index on a non-null column so it is a doubly suitable, cheap alternative to a full table scan.

Getting bored? The next transformation is more interesting, but not by much.

Predicate Move-around

Consider the query and execution plan in Listing 13-2.

Listing 13-2. Predicate move-around

WITH cust_q
AS ( SELECT cust_id, promo_id, SUM (amount_sold) cas
FROM sh.sales
GROUP BY cust_id, promo_id)
,prod_q
AS ( SELECT prod_id, promo_id, SUM (amount_sold) pas
FROM sh.sales
WHERE promo_id = 999
GROUP BY prod_id, promo_id)
SELECT promo_id
,prod_id
,pas
,cust_id
,cas
FROM cust_q NATURAL JOIN prod_q;

----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | HASH GROUP BY | |
| 4 | PARTITION RANGE ALL| |
|* 5 | TABLE ACCESS FULL | SALES |
| 6 | VIEW | |
| 7 | HASH GROUP BY | |
| 8 | PARTITION RANGE ALL| |
|* 9 | TABLE ACCESS FULL | SALES |
----------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("CUST_Q"."PROMO_ID"="PROD_Q"."PROMO_ID")
5 - filter("PROMO_ID"=999)
9 - filter("PROMO_ID"=999)

The query in Listing 13-2 doesn’t do anything meaningful. For each combination of product and customer it lists the total amount of sales for PROMO_ID 999 for the customer and the total amount of sales for PROMO_ID 999 for the product. There are two factored subqueries and only one WHERE clause, but the natural join means that, by using transitive closure, we can apply the predicate in the other subquery as well. However, Listing 13-3 makes the query a little more complicated and thus the predicate move-around transformation ceases to function.

Listing 13-3. Predicate move-around not recognized

WITH cust_q
AS ( SELECT cust_id
,promo_id
,SUM (amount_sold) cas
,MAX (SUM (amount_sold)) OVER (PARTITION BY promo_id) max_cust
FROM sh.sales
GROUP BY cust_id, promo_id)
,prod_q
AS ( SELECT prod_id
,promo_id
,SUM (amount_sold) pas
,MAX (SUM (amount_sold)) OVER (PARTITION BY promo_id) max_prod
FROM sh.sales
WHERE promo_id = 999
GROUP BY prod_id, promo_id)
SELECT promo_id
,prod_id
,pas
,cust_id
,cas
FROM cust_q NATURAL JOIN prod_q
WHERE cust_q.cas = cust_q.max_cust AND prod_q.pas = prod_q.max_prod;

------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | MERGE JOIN | |
|* 2 | VIEW | |
| 3 | WINDOW BUFFER | |
| 4 | SORT GROUP BY | |
| 5 | PARTITION RANGE ALL | |
| 6 | TABLE ACCESS FULL | SALES |
|* 7 | SORT JOIN | |
|* 8 | VIEW | |
| 9 | WINDOW BUFFER | |
| 10 | SORT GROUP BY | |
| 11 | PARTITION RANGE ALL| |
|* 12 | TABLE ACCESS FULL | SALES |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("CUST_Q"."CAS"="CUST_Q"."MAX_CUST")
7 - access("CUST_Q"."PROMO_ID"="PROD_Q"."PROMO_ID")
filter("CUST_Q"."PROMO_ID"="PROD_Q"."PROMO_ID")
8 - filter("PROD_Q"."PAS"="PROD_Q"."MAX_PROD")
12 - filter("PROMO_ID"=999)

Listing 13-3 does something a lot more meaningful: it identifies the one product that sold the most for PROMO_ID 999 and the one customer for PROMO_ID 999 that bought the most. But the addition of the analytic functions into the subqueries has caused the CBO to lose confidence in the correctness of predicate move-around. To make the query efficient you will have to add the WHERE clause to the CUST_Q factored subquery yourself.

There are one or two other no-brainer transformations:

· Filter pushdown is a simpler variant of predicate move-around.

· Distinct elimination removes a redundant DISTINCT operation from a query when the operands involve all columns of the primary key, all columns of a unique key that are NOT NULL, or the ROWID.

· Select list pruning removes items from the select list of a subquery that aren’t referenced.

The key point involving these transformations is that there is no possible downside to their application. For example, there is no chance that you can make a query slower by removing unreferenced items from the select list in a subquery. All of this is of some interest, but not very much, so let us move on to the second group of transformations: set and join transformations.

Set and Join Transformations

There are several transformations related to set and join operations, most of which have only been implemented recently, which is surprising given how useful many of them are. Let us get going with join elimination.

Join Elimination

Join elimination is one of those transformations that can be demonstrated with inline views and factored subqueries but is really intended for general purpose data dictionary views. I will demonstrate this focus on data dictionary views in Listing 13-4, but I will assume the point is understood in future examples.

Listing 13-4. Join elimination with data dictionary views

CREATE OR REPLACE VIEW cust_sales
AS
WITH sales_q
AS ( SELECT cust_id
,SUM (amount_sold) amount_sold
,AVG (amount_sold) avg_sold
,COUNT (*) cnt
FROM sh.sales s
GROUP BY cust_id)
SELECT c.*
,s.cust_id AS sales_cust_id
,s.amount_sold
,s.avg_sold
,cnt
FROM sh.customers c JOIN sales_q s ON c.cust_id = s.cust_id;

SELECT sales_cust_id
,amount_sold
,avg_sold
,cnt
FROM cust_sales;

-----------------------------------------------
| Id | Operation | Name |
-----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | VIEW | |
| 3 | HASH GROUP BY | |
| 4 | PARTITION RANGE ALL| |
| 5 | TABLE ACCESS FULL | SALES |
|* 6 | INDEX UNIQUE SCAN | CUSTOMERS_PK |
-----------------------------------------------

CREATE OR REPLACE VIEW cust_sales
AS
WITH sales_q
AS ( SELECT cust_id
,SUM (amount_sold) amount_sold
,AVG (amount_sold) avg_sold
,COUNT (*) cnt
FROM sh.sales s
GROUP BY cust_id)
SELECT c.*
,s.cust_id AS sales_cust_id
,s.amount_sold
,s.avg_sold
,cnt
FROM sh.customers c RIGHT JOIN sales_q s ON c.cust_id = s.cust_id;

SELECT /*+ eliminate_join(@SEL$13BD1B6A c@sel$2) */
/* no_eliminate_join(@SEL$13BD1B6A c@sel$2) */sales_cust_id
,amount_sold
,avg_sold
,cnt
FROM cust_sales;

-- Untransformed execution plan (NO_ELIMINATE_JOIN)

-----------------------------------------------
| Id | Operation | Name |
-----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS OUTER | |
| 2 | VIEW | |
| 3 | HASH GROUP BY | |
| 4 | PARTITION RANGE ALL| |
| 5 | TABLE ACCESS FULL | SALES |
|* 6 | INDEX UNIQUE SCAN | CUSTOMERS_PK |
-----------------------------------------------

-- Transformed query

SELECT cust_id AS sales_cust_id
,SUM (amount_sold) amount_sold
,AVG (amount_sold) avg_sold
,COUNT (*) cnt
FROM sh.sales s
GROUP BY cust_id;

-- Transformed execution plan (default)

--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | PARTITION RANGE ALL| |
| 3 | TABLE ACCESS FULL | SALES |
--------------------------------------

Listing 13-4 begins by creating data dictionary view CUST_SALES that adds some columns to the SH.CUSTOMERS example schema table, which are obtained by performing aggregation on the SH.SALES table. The query then uses the CUST_SALES view but is only interested in the columns from the SH.SALES table. Unfortunately, the aggregation operation has caused the CBO to lose track of the significance of the referential integrity constraint that ensures that all rows in SH.SALES have exactly one corresponding row in SH.CUSTOMERS. As a result of this the CBO does not apply the join elimination optimization.

Listing 13-4 continues by redefining the CUST_SALES view to use an outer join. This confirms to the CBO that the join of the aggregated SH.SALES data with SH.CUSTOMERS will not result in the loss of any rows and the fact that the join column CUST_ID is the primary key ofSH.CUSTOMERS means that the join will not result in the addition of any rows either. The CBO is now confident that the elimination of the join from SH.SALES to SH.CUSTOMERS will not affect the result and proceeds with the transformation.

The second query in Listing 13-4 includes two comments. The first is a hint to force the default behavior. If you remove the first comment and add a plus sign to the second you can disable the transformation. Unfortunately, because the join happens inside a view you have to look at the outline section of DBMS_XPLAN.DISPLAY to identify the query block and generate a global hint if you really want to disable the transformation. Listing 13-4 also shows the execution plan associated with the untransformed query, the transformed query, and the transformed execution plan.

Have a careful look at Listing 13-4 because I will be using this approach as I go through other transformations.

There is one other interesting application of the join elimination that applies to nested tables. Because nested tables offer no performance or administrative benefits they aren’t seen much in schemas designed in the 21st century. But just in case you are ever need to work with a 20th-century schema, take a look at Listing 13-5, which references the PM.PRINT_MEDIA example schema.

Listing 13-5. Join elimination on a nested table

SELECT /*+ eliminate_join(parent_tab) */
/* no_eliminate_join(parent_tab) */
nested_tab.document_typ, COUNT (*) cnt
FROM pm.print_media parent_tab
,TABLE (parent_tab.ad_textdocs_ntab)nested_tab
GROUP BY nested_tab.document_typ;

-- Untransformed execution plan (NO_ELIMINATE_JOIN)

--------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | NESTED LOOPS | |
| 3 | TABLE ACCESS FULL| TEXTDOCS_NESTEDTAB |
|* 4 | INDEX UNIQUE SCAN| SYS_C0011792 |
--------------------------------------------------

-- Transformed execution plan (default)

------------------------------------------------------------------------
| Id | Operation | Name |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| TEXTDOCS_NESTEDTAB |
|* 3 | INDEX FULL SCAN | SYS_FK0000091775N00007$ |
------------------------------------------------------------------------

What is interesting about Listing 13-5 is that prior to the introduction of the join elimination transformation in 10gR2 the possibility of referencing a nested table without its parent didn’t exist. There is no valid SQL syntax to express the results of the join elimination transformation!

You might think that join elimination should be a no-brainer: there can never be any performance benefit to retaining the join. However, I suspect that the creation of hints to control transformations is now a standard, and, if so, I approve. Seeing hints in the outline section ofDBMS_XPLAN functions means that you can work out what is happening more easily and allows you to disable the transformation for educational or investigative purposes.

Outer Join to Inner Join

Sometimes an outer join can’t be eliminated, but it can be converted to an inner join. Let us take another look at the second definition of the CUST_SALES view in Listing 13-4. Although the change to the view definition helped queries that only needed aggregated SH.SALES data, it made queries that do require columns from SH.CUSTOMERS less efficient. If we change such queries we can eliminate that overhead. Take a look at Listing 13-6.

Listing 13-6. Outer join to inner join transformation

SELECT /*+ outer_join_to_inner(@SEL$13BD1B6A c@sel$2) */
/* no_outer_join_to_inner(@SEL$13BD1B6A c@sel$2) */
*
FROM cust_sales c
WHERE cust_id IS NOT NULL;

-- Untransformed execution plan (NO_OUTER_JOIN_TO_INNER)
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
|* 2 | HASH JOIN OUTER | |
| 3 | VIEW | |
| 4 | HASH GROUP BY | |
| 5 | PARTITION RANGE ALL| |
| 6 | TABLE ACCESS FULL | SALES |
| 7 | TABLE ACCESS FULL | CUSTOMERS |
---------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("C"."CUST_ID" IS NOT NULL)
2 - access("C"."CUST_ID"(+)="S"."CUST_ID")

-- Transformed query

WITH sales_q
AS ( SELECT cust_id
,SUM (amount_sold) amount_sold
,AVG (amount_sold) avg_sold
,COUNT (*) cnt
FROM sh.sales s
GROUP BY cust_id)
SELECT c.*
,s.cust_id AS sales_cust_id
,s.amount_sold
,s.avg_sold
,cnt
FROM sh.customers c JOIN sales_q s ON c.cust_id = s.cust_id;

-- Transformed execution plan (default)

--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | HASH GROUP BY | |
| 4 | PARTITION RANGE ALL| |
| 5 | TABLE ACCESS FULL | SALES |
| 6 | TABLE ACCESS FULL | CUSTOMERS |
--------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("C"."CUST_ID"="S"."CUST_ID")

Leaving aside the referential integrity constraint for a minute, the potential difference between an outer join and an inner join with SH.CUSTOMERS is that extra rows might be added with C.CUST_ID NULL. The presence of the WHERE cust_id IS NOT NULL predicate means that any such additional rows would be eliminated, so the CBO will apply the transformation from an outer join to an inner join unless prohibited from doing so by a hint.

Full Outer Join to Outer Join

A logical extension to the outer-join-to-inner-join transformation is the full-outer-join-to-outer-join transformation. Precisely the same principles apply. If you understood Listing 13-6, Listing 13-7 should be self-explanatory.

Listing 13-7. Full outer join to outer join

SELECT /*+ full_outer_join_to_outer(cust) */
/* no_full_outer_join_to_outer(cust) */
*
FROM sh.countries FULL OUTER JOIN sh.customers cust USING (country_id)
WHERE cust.cust_id IS NOT NULL;

-- Untransformed execution plan (NO_FULL_OUTER_JOIN_TO_OUTER)
-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | VIEW | VW_FOJ_0 |
|* 2 | HASH JOIN FULL OUTER| |
| 3 | TABLE ACCESS FULL | COUNTRIES |
| 4 | TABLE ACCESS FULL | CUSTOMERS |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CUST"."CUST_ID" IS NOT NULL)
2 - access("COUNTRIES"."COUNTRY_ID"="CUST"."COUNTRY_ID")

-- Transformed query

SELECT *
FROM sh.countries RIGHT OUTER JOIN sh.customers cust USING (country_id);

-- Transformed execution plan (default)

-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN RIGHT OUTER| |
| 2 | TABLE ACCESS FULL | COUNTRIES |
| 3 | TABLE ACCESS FULL | CUSTOMERS |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("COUNTRIES"."COUNTRY_ID"(+)="CUST"."COUNTRY_ID")

Of course, if we were to add a predicate such as WHERE COUNTRIES.COUNTRY_NAME IS NOT NULL to Listing 13-7 we could apply the outer-join-to-inner-join transformation as well and we would get an inner join.

Semi-Join to Inner Join

Although the semi-join-to-inner-join transformation has been around for some time, the hint to manage the transformation is new to 11.2.0.3. Listing 13-8 demonstrates.

Listing 13-8. Semi-join to inner join

SELECT *
FROM sh.sales s
WHERE EXISTS
(SELECT /*+ semi_to_inner(c) */
/* no_semi_to_inner(c) */
1
FROM sh.customers c
WHERE c.cust_id = s.cust_id
AND cust_first_name = 'Abner'
AND cust_last_name = 'Everett');

-- Untransformed execution plan (NO_SEMI_TO_INNER)

------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN RIGHT SEMI| |
|* 2 | TABLE ACCESS FULL | CUSTOMERS |
| 3 | PARTITION RANGE ALL| |
| 4 | TABLE ACCESS FULL | SALES |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("C"."CUST_ID"="S"."CUST_ID")
2 - filter("CUST_FIRST_NAME"='Abner' AND "CUST_LAST_NAME"='Everett')

-- Transformed query (approximate)

WITH q1
AS (SELECT DISTINCT cust_id
FROM sh.customers
WHERE cust_first_name = 'Abner' AND cust_last_name = 'Everett')
SELECT s.*
FROM sh.sales s JOIN q1 ON s.cust_id = q1.cust_id;

-- Transfomed execution plan (default)
-------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | SORT UNIQUE | |
|* 4 | TABLE ACCESS FULL | CUSTOMERS |
| 5 | PARTITION RANGE ALL | |
| 6 | BITMAP CONVERSION TO ROWIDS | |
|* 7 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX |
| 8 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("CUST_FIRST_NAME"='Abner' AND "CUST_LAST_NAME"='Everett')
7 - access("C"."CUST_ID"="S"."CUST_ID")

The transformation shown in Listing 13-8 allows additional join orders for nested loops and merge semi-joins. Prior to the creation of this transformation semi-joins required that the unnested subquery be the probe row source of a nested loops or merge semi-join (join inputs can be swapped for hash joins). Now that we can transform the semi-join to an inner join, we can use the subquery as the driving row source.

Notice that there is a SORT UNIQUE operation in the transformed execution plan. The purpose of this operation is to ensure that any row from SH.SALES appears at most once in the result set. However, if you take the transformed query and run DBMS_XPLAN.DISPLAY on that query you will not see this SORT UNIQUE operation. This is because the no-brainer distinct elimination transformation is performed if the transformed query is directly executed or explained. I imagine that the redundant SORT UNIQUE operation introduced by the transformation will disappear in a future release.

Subquery Unnesting

We have already seen the subquery unnesting transformation in the context of semi-joins and anti-joins in Chapter 11. We will also be looking at unnesting of subqueries in the select list in Chapter 14. However, there are one or two other variants of subquery unnesting that I want to address here. Take a look at Listing 13-9.

Listing 13-9. Subquery unnesting with decorrelation

SELECT c.cust_first_name, c.cust_last_name, s1.amount_sold
FROM sh.customers c, sh.sales s1
WHERE s1.amount_sold = (SELECT /* unnest */
/*+ no_unnest */
MAX (amount_sold)
FROM sh.sales s2
WHERE s1.cust_id= s2.cust_id)
AND c.cust_id = s1.cust_id;

-- Untransformed execution plan (NO_UNNEST)

-----------------------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
|* 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL | CUSTOMERS |
| 4 | PARTITION RANGE ALL | |
| 5 | TABLE ACCESS FULL | SALES |
| 6 | SORT AGGREGATE | |
| 7 | PARTITION RANGE ALL | |
| 8 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES |
| 9 | BITMAP CONVERSION TO ROWIDS | |
|* 10 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("S1"."AMOUNT_SOLD"= (SELECT /*+ NO_UNNEST */
MAX("AMOUNT_SOLD") FROM "SH"."SALES" "S2" WHERE "S2"."CUST_ID"=:B1))
2 - access("C"."CUST_ID"="S1"."CUST_ID")
10 - access("S2"."CUST_ID"=:B1

-- Transformed query

WITH vw_sq_1
AS ( SELECT cust_id AS sq_cust_id, MAX (amount_sold) max_amount_sold
FROM sh.sales s2
GROUP BY cust_id)
SELECT c.cust_first_name, c.cust_last_name, s1.amount_sold
FROM sh.sales s1, sh.customers c, vw_sq_1
WHERE s1.amount_sold = vw_sq_1.max_amount_sold
AND s1.cust_id = vw_sq_1.sq_cust_id
AND s1.cust_id = c.cust_id;

-- Transfomed execution plan (default)
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
|* 3 | HASH JOIN | |
| 4 | VIEW | VW_SQ_1 |
| 5 | HASH GROUP BY | |
| 6 | PARTITION RANGE ALL | |
| 7 | TABLE ACCESS FULL | SALES |
| 8 | PARTITION RANGE ALL | |
| 9 | TABLE ACCESS FULL | SALES |
|* 10 | INDEX UNIQUE SCAN | CUSTOMERS_PK |
| 11 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS |
-----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("S1"."AMOUNT_SOLD"="MAX(AMOUNT_SOLD)" AND
"S1"."CUST_ID"="ITEM_1")
10 - access("C"."CUST_ID"="S1"."CUST_ID")

The original subquery in the WHERE clause is a correlated subquery, meaning that in the untransformed query that subquery is logically executed for every row returned by the join. However, in the transformed query the aggregation is performed just once, aggregating for every value ofCUST_ID.

Subquery unnesting is rather unusual in that it is sometimes applied as a cost-based transformation and sometimes as a heuristic transformation. If you use a construct such as WHERE C1 = (<subquery>), subquery unnesting is cost based. However, in all the listings in this book, subquery unnesting is applied as a heuristic transformation, meaning that the CBO does not use the cost-based framework to determine whether to apply the transformation. In common with all the other transformations we have discussed so far (except the semi-to-inner transformation), subquery unnesting is always performed (unless hinted) regardless of whether or not performance is expected to improve. In the cases of all the optimizer transformations that we have discussed so far, there hasn’t been any chance that the transformations would harm performance. It definitely is possible for subquery unnesting to harm performance in some cases. We will return to the topic of the potential performance impact of heuristic transformations in Chapters 14 and 18, but for now just make a mental note of it. We need to turn our attention to another case of subquery unnesting, seen in Listing 13-10.

Listing 13-10. Subquery unnesting using window functions

SELECT c.cust_first_name, c.cust_last_name, s1.amount_sold
FROM sh.customers c, sh.sales s1
WHERE amount_sold = (SELECT /* unnest */
/*+ no_unnest */
MAX (amount_sold)
FROM sh.sales s2
WHERE c.cust_id= s2.cust_id)
AND c.cust_id = s1.cust_id;

-- Untransformed execution plan (NO_UNNEST)
-----------------------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
|* 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL | CUSTOMERS |
| 4 | PARTITION RANGE ALL | |
| 5 | TABLE ACCESS FULL | SALES |
| 6 | SORT AGGREGATE | |
| 7 | PARTITION RANGE ALL | |
| 8 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES |
| 9 | BITMAP CONVERSION TO ROWIDS | |
|* 10 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("AMOUNT_SOLD"= (SELECT /*+ NO_UNNEST */
MAX("AMOUNT_SOLD") FROM "SH"."SALES" "S2" WHERE "S2"."CUST_ID"=:B1))
2 - access("C"."CUST_ID"="S1"."CUST_ID")
10 - access("S2"."CUST_ID"=:B1)

-- Transformed query

WITH vw_wif_1
AS (SELECT c.cust_first_name
,c.cust_last_name
,s.amount_sold
,MAX (amount_sold) OVER (PARTITION BY s.cust_id) AS item_4
FROM sh.customers c, sh.sales s
WHERE s.cust_id = c.cust_id)
SELECT cust_first_name, cust_last_name, amount_sold
FROM vw_wif_1
WHERE CASE WHEN item_4 = amount_sold THEN ROWID END IS NOT NULL;

-- Transformed execution plan (default)
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | VIEW | VW_WIF_1 |
| 2 | WINDOW SORT | |
|* 3 | HASH JOIN | |
| 4 | TABLE ACCESS FULL | CUSTOMERS |
| 5 | PARTITION RANGE ALL| |
| 6 | TABLE ACCESS FULL | SALES |
--------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("VW_COL_4" IS NOT NULL)
3 - access("C"."CUST_ID"="S1"."CUST_ID")

The difference in the queries shown in Listing 13-9 and Listing 13-10 is difficult to spot. In fact, the correlating column in Listing 13-9 is S1.CUST_ID and in Listing 13-10 it is C.CUST_ID. Given that there is also a predicate in the queries that equates S1.CUST_ID andC.CUST_ID, it may be a surprise that the resulting execution plan in Listing 13-10 differs so much from that in Listing 13-9. This is an example of a problem with transitive closure, and we will return to this topic in Chapter 14. For now, let us just accept that because the correlating column(C.CUST_ID) is from one table and the subquery is equated with a column from another table (S1.AMOUNT_SOLD), Listing 13-10 uses a different variant of subquery unnesting than did Listing 13-9. We can see that in Listing 13-9 we have two full scans of SH.SALES (bad) but no sort (good). In Listing 13-10 we have only one scan of SH.SALES (good) as well as a sort (bad). If you have a query like the ones in the last two listings it might be best to experiment with a code change to see which execution plan performs best.

The transformed query in Listing 13-10 sorts all the rows from the result of the join and then selects just the highest ranking. Why the convoluted predicate involving a case expression? I wouldn’t want to speculate.

Partial Joins

Partial joins are a new feature of 12cR1, but in my opinion they haven’t been explained well. Let me see if I can provide another angle on things. Take a look at Listing 13-11.

Listing 13-11. Partial join transformation

SELECT /*+ partial_join(iv) */
/* no_partial_join(iv) */
product_id, MAX (it.quantity)
FROM oe.order_items it JOIN oe.inventories iv USING (product_id)
GROUP BY product_id;

-- Untransformed execution plan (NO_PARTIAL_JOIN)
-----------------------------------------------
| Id | Operation | Name |
-----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
|* 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL | ORDER_ITEMS |
| 4 | INDEX FAST FULL SCAN| INVENTORY_IX |
-----------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("IT"."PRODUCT_ID"="IV"."PRODUCT_ID")
-- Transformed query

SELECT product_id, MAX (quantity)
FROM oe.order_items it
WHERE EXISTS
(SELECT 1
FROM oe.inventories iv
WHERE it.product_id = iv.product_id)
GROUP BY it.product_id;

-- Transformed execution plan (default)
-----------------------------------------------
| Id | Operation | Name |
-----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
|* 2 | HASH JOIN SEMI | |
| 3 | TABLE ACCESS FULL | ORDER_ITEMS |
| 4 | INDEX FAST FULL SCAN| INVENTORY_IX |
-----------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("IT"."PRODUCT_ID"="IV"."PRODUCT_ID")

As you can see, the partial join converts an inner join to a semi-join. But earlier we had the semi-to-inner-join transformation, didn’t we? Are you confused? I was. Let me unravel the mystery.

The benefits of the partial join are best explained with a join of two tables that have a many-to-many relationship, so in Listing 13-11 I have temporarily abandoned the SH example schema and used two tables from the OE example schema that have a many-to-many relationship.

Let us take PRODUCT_ID 1797 as an example and see first how the untransformed query processes it. This product forms part of three orders, and the values of QUANTITY in the three matching rows in OE.ORDER_ITEMS are 7, 9, and 12, respectively. PRODUCT_ID 1797 is also in stock at three warehouses, so there are three rows matching PRODUCT_ID 1797 in OE.INVENTORIES. When we join OE.ORDER_ITEMS and OE.INVENTORIES using the PRODUCT_ID we get nine rows for PRODUCT_ID 1797 with values of QUANTITY 7, 9, 12, 7, 9, 12, 7, 9, and 12 (the three rows from OE.ORDER_ITEMS being replicated three times). When we input these nine rows into our HASH GROUP BY operation we get a value of 12 for MAX (QUANTITY). If we had included MIN (QUANTITY), SUM (DISTINCT QUANTITY), AVG (QUANTITY), and SUM (QUANTITY) in our select list we would have got 7, 28, 9⅓, and 84, respectively.

The transformation to the semi-join prevents the duplication of rows from OE.ORDER_ITEMS as a result of any duplicate values of PRODUCT_ID 1797 in OE.INVENTORIES, so the values input to the HASH GROUP BY operation are just 7, 9, and 12. The value of MAX (PRODUCT_ID) is 12, precisely the same as if we had processed all nine rows. The values of MIN (QUANTITY), SUM (DISTINCT QUANTITY), AVG (QUANTITY), and SUM (QUANTITY) based on these rows would be 7, 28, 9⅓, and 28, respectively. All my example aggregate values are the same for the three rows in the transformed query as for the nine rows in the untransformed query, except SUM (QUANTITY), which yields a different result. As a consequence, if you try adding SUM (QUANTITY) to the select list of the query in Listing 13-11 you will find that the partial join is suddenly illegal. In 12.1.0.1 the AVG (QUANTITY) function will also disable partial joins, and I speculate that this is because COUNT (QUANTITY) and SUM (QUANTITY) are both genuinely illegal. This is the sort of thing that might get fixed in a later release.

The main benefit of the partial join is to reduce the number of rows being input to a GROUP BY or DISTINCT operation. In the case of the HASH JOIN SEMI operation in the transformed execution plan, this is accomplished by removing the entries from the hash cluster as soon as one match is found. From then on, no further rows from OE.INVENTORIES with the same PRODUCT_ID will match.

The partial join also has some interesting side benefits when applied to nested loops joins. Take a look at Listing 13-12, which displays edited runtime statistics along with the execution plan.

Listing 13-12. Partial join transformation with nested loops join

BEGIN
FOR r IN ( SELECT/*+ TAGPJ gather_plan_statistics */
cust_id, MAX (s.amount_sold)
FROM sh.customers c JOIN sh.sales s USING (cust_id)
WHERE s.amount_sold > 1782 AND s.prod_id = 18
GROUP BY cust_id)
LOOP
NULL;
END LOOP;
END;

SET LINES 200 PAGES 0

SELECT p.*
FROM v$sql s
,TABLE (
DBMS_XPLAN.display_cursor (s.sql_id
,s.child_number
,'BASIC IOSTATS LAST')) p
WHERE sql_text LIKE 'SELECT /*+ TAGPJ%';

----------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 83 |
| 1 | HASH GROUP BY | | 1 | 3 | 83 |
| 2 | NESTED LOOPS SEMI | | 1 | 3 | 108 |
| 3 | PARTITION RANGE ALL | | 1 | 3 | 108 |
|* 4 | TABLE ACCESS BY LOCAL IND| SALES | 28 | 3 | 108 |
| 5 | BITMAP CONVERSION TO ROW| | 16 | | 9591 |
|* 6 | BITMAP INDEX SINGLE VAL| SALES_PROD_BIX | 16 | | 16 |
|* 7 | INDEX UNIQUE SCAN | CUSTOMERS_PK | 83 | 55500 | 83 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("SALES"."AMOUNT_SOLD">1782)
6 - access("SALES"."PROD_ID"=18)
7 - access("CUSTOMERS"."CUST_ID"="SALES"."CUST_ID")

At first glance the partial join applied to the nested loops join in Listing 13-12 seems pointless. There is a primary key constraint on the CUST_ID column of SH.CUSTOMERS, so there will only be one row obtained by the INDEX UNIQUE SCAN on line 7, so what benefit comes from converting NESTED LOOPS to NESTED LOOPS SEMI? The answer can be seen by looking at the STARTS column of operation 7. Despite the fact that there are 108 rows from SH.SALES that match the predicates SH.PROD_ID=18 and SH.AMOUNT_SOLD > 1782, the INDEX UNIQUE SCAN was only performed 83 times! This is because there are only 83 distinct values of CUST_ID in those 108 rows and the NESTED LOOPS SEMI and NESTED LOOPS ANTI join operations can take advantage of scalar subquery caching. A regular NESTED LOOPS join cannot!

But perhaps we are getting distracted. The main purpose of the partial join transformation is to eliminate rows as early in the process as possible that would otherwise be pointlessly input to a DISTINCT or GROUP BY operation. This requires the conversion of an inner join to a semi-join. On the other hand, the purpose of the semi-to-inner-join transformation is to allow the subquery to act as the driving row source in a NESTED LOOPS or MERGE join operation. These two goals are mutually exclusive. Fortunately, both the partial join and the semi-to-inner-join transformations are cost based, and the CBO will not perform a partial join transformation when it thinks that doing so would create a disadvantageous join order.

Join Factorization

The purpose of the join factorization transformation is to remove common components from the operands of a UNION ALL operation and then apply them once—after the operation. Listing 13-13 provides an example.

Listing 13-13. Join factorization

SELECT /*+ factorize_join(@set$1(s@u1 S@u2)) qb_name(u1) */
/* no_factorize_join(@set$1) */
*
FROM sh.sales s, sh.customers c
WHERE c.cust_first_name = 'Abner'
AND c.cust_last_name = 'Everett'
AND s.cust_id = c.cust_id
/*AND prod_id = 13
AND time_id = DATE '2001-09-13' */

UNION ALL
SELECT /*+ qb_name(u2) */
*
FROM sh.sales s, sh.customers c
WHERE c.cust_first_name = 'Abigail'
AND c.cust_last_name = 'Ruddy'
AND s.cust_id = c.cust_id
/*AND prod_id = 13
AND time_id = DATE '2001-09-13' */;

-- Untransformed execution plan (NO_FACTORIZE_JOIN)

--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | UNION-ALL | |
| 2 | NESTED LOOPS | |
| 3 | NESTED LOOPS | |
|* 4 | TABLE ACCESS FULL | CUSTOMERS |
| 5 | PARTITION RANGE ALL | |
| 6 | BITMAP CONVERSION TO ROWIDS | |
|* 7 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX |
| 8 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES |
| 9 | NESTED LOOPS | |
| 10 | NESTED LOOPS | |
|* 11 | TABLE ACCESS FULL | CUSTOMERS |
| 12 | PARTITION RANGE ALL | |
| 13 | BITMAP CONVERSION TO ROWIDS | |
|* 14 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX |
| 15 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("C"."CUST_FIRST_NAME"='Abner' AND
"C"."CUST_LAST_NAME"='Everett')
7 - access("S"."CUST_ID"="C"."CUST_ID")
11 - filter("C"."CUST_FIRST_NAME"='Abigail' AND
"C"."CUST_LAST_NAME"='Ruddy')
14 - access("S"."CUST_ID"="C"."CUST_ID")

-- Transformed query

WITH vw_jf
AS (SELECT *
FROM sh.customers c
WHERE c.cust_first_name = 'Abner' AND c.cust_last_name = 'Everett'
UNION ALL
SELECT *
FROM sh.customers c
WHERE c.cust_first_name = 'Abigail' AND c.cust_last_name = 'Ruddy')
SELECT *
FROM sh.sales s, vw_jf
WHERE s.cust_id = vw_jf.cust_id;

-- Transformed execution plan (default)

---------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN | |
| 2 | VIEW | VW_JF_SET$F472D255 |
| 3 | UNION-ALL | |
|* 4 | TABLE ACCESS FULL| CUSTOMERS |
|* 5 | TABLE ACCESS FULL| CUSTOMERS |
| 6 | PARTITION RANGE ALL| |
| 7 | TABLE ACCESS FULL | SALES |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("TIME_ID"=TO_DATE(' 2001-09-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
7 - access("PROD_ID"=13)
10 - filter("C"."CUST_FIRST_NAME"='Abner' AND
"C"."CUST_LAST_NAME"='Everett')
11 - access("C"."CUST_ID"="S"."CUST_ID")
12 - filter("C"."CUST_FIRST_NAME"='Abigail' AND
"C"."CUST_LAST_NAME"='Ruddy')
13 - access("C"."CUST_ID"="S"."CUST_ID")

The two branches of the original query in Listing 13-13 both reference the SH.SALES table in identical ways. The transformed query extracts the joins in the two branches and performs just one join after the UNION ALL operation. As you can imagine, this transformation may generate huge savings, but in this case we have replaced two indexed accesses with one full table scan and the benefit is small. In fact, if you add the commented out predicates it would actually be disadvantageous to perform join factorization. Fortunately, join factorization is a cost-based transformation and you will find that when the additional predicates are included the transformation will not be applied unless forced with a hint.

Hinting join factorization is tricky because the hint applies to a set query block. This creates a series of cascading complications.

1. It is not possible to use a local FACTORIZE_JOIN hint, and we need to resort to global hint syntax.

2. The global hint syntax means that we need to refer to the row sources being factorized using the query blocks in which they are contained. Although it wasn’t strictly necessary, I have named the query blocks in the UNION ALL branches as U1 and U2 using QB_NAMEhints to make the example clearer.

3. QB_NAME hinting is only practical with traditional join syntax so I have had to abandon ANSI join syntax in Listing 13-13.

Set to Join

It is possible to express INTERSECT and MINUS set operations as joins. Listing 13-14 shows how this is done.

Listing 13-14. Set to join transformation

SELECT /*+ set_to_join(@set$1) */ /* no_set_to_join(@set$1) */
prod_id
FROM sh.sales s1
WHERE time_id < DATE '2000-01-01' AND cust_id = 13
MINUS
SELECT prod_id
FROM sh.sales s2
WHERE time_id >=DATE '2000-01-01';

-- Untransformed execution plan (default)

------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2396 (99)|
| 1 | MINUS | | |
| 2 | SORT UNIQUE | | 24 (0)|
| 3 | PARTITION RANGE ITERATOR | | 24 (0)|
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 24 (0)|
| 5 | BITMAP CONVERSION TO ROWIDS | | |
|* 6 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | |
| 7 | SORT UNIQUE | | 2372 (1)|
| 8 | PARTITION RANGE ITERATOR | | 275 (2)|
| 9 | TABLE ACCESS FULL | SALES | 275 (2)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("CUST_ID"=13)

-- Transformed query

SELECT DISTINCT s1.prod_id
FROM sh.sales s1
WHERE time_id < DATE '2000-01-01'
AND cust_id = 13
AND prod_id NOT IN (SELECT prod_id
FROM sh.sales s2
WHERE time_id >=DATE '2000-01-01');

-- Transfomed execution plan (SET_TO_JOIN)

------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 301 (2)|
| 1 | HASH UNIQUE | | 301 (2)|
|* 2 | HASH JOIN ANTI | | 301 (2)|
| 3 | PARTITION RANGE ITERATOR | | 24 (0)|
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 24 (0)|
| 5 | BITMAP CONVERSION TO ROWIDS | | |
|* 6 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | |
| 7 | PARTITION RANGE ITERATOR | | 275 (2)|
| 8 | TABLE ACCESS FULL | SALES | 275 (2)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("PROD_ID"="PROD_ID")
6 - access("CUST_ID"=13)

The query in Listing 13-14 identifies the products that CUST_ID 13 bought in the 20th century that were not sold to anyone in the 21st century. The untransformed query performs unique sorts on each of the two branches of the UNION ALL operation before identifying the elements of the result set; this is how MINUS and INTERSECT set operations are always implemented. The application of the transformation means that one of these two sorts can be avoided. But take a look at operation 2 in the execution plan of the transformed query—the workarea for the HASH JOIN ANTI operation includes duplicate values, so we need to ask ourselves whether this transformation is, in fact, worthwhile. In this case CUST_ID 13 only made 19 transactions for 6 products in the 20th century, so eliminating 13 rows from this workarea isn’t a big deal. However, the transformation has avoided a unique sort of the 72 products from the 492,064 rows from SH.SALES relating to transactions in the 21st century. The transformation is a good idea on this occasion.

Despite the fact that the cost estimates show that the CBO believes (correctly) that the set-to-join transformation results in a superior execution plan, the transformation will not be applied to the unhinted query and will normally never be applied unless hinted!

To understand the reason why we need to use the SET_TO_JOIN hint in Listing 13-14, we need to understand a little more about the difference between cost-based transformations and heuristic transformations. When the CBO considers cost-based transformations, it attempts to determine whether the transformation will reduce the elapsed time of the query; if the transformation will reduce the estimated elapsed time of the query—i.e., the cost is lower—the transformation will be applied, otherwise the transformation will not be applied. A heuristic transformation works by applying a rule of some kind. So far all the heuristic transformations that we have come across in this chapter are applied unconditionally unless disabled by a hint. The working assumption in these cases is that the hint will probably, if not certainly, help.,

In the case of the set-to-join heuristic transformation, the heuristic rule is that the transformation is never applied unless hinted, the working assumption being that the cases where the transformation is needed are rare. Given the need for the CBO to come up with a plan in milliseconds, the CBO will not waste its time looking at the set-to-join transformation unless you hint it.

If you like playing around with hidden initialization parameters and don’t care about the support implications of doing so, you can change the value of "_convert_set_to_join". If you change the value of this parameter from its default value of FALSE to TRUE at the session level, the heuristic rule will be altered so that the transformation is always applied. This is the only circumstance in which you would need to use the NO_SET_TO_JOIN hint to disable the transformation.

The set-to-join transformation provides a fitting end to our discussion of set- and join-related transformations. We can now turn our attention to a new class of transformations relating to aggregate operations.

Aggregation Transformations

Aggregations are a surprisingly complex matter in Oracle databases and figure heavily in the life of a SQL tuning specialist. We need to understand what information needs to be aggregated, when it is to be aggregated, and how it is to be aggregated. In recent releases Oracle has included a number of aggregate-related transformations that not only help query performance but also act as excellent educational guides that you can use to aid your analysis of queries containing aggregations. Let us get started with a look at how distinct aggregations can be optimized.

Distinct Aggregation

I don’t know about you, but I have always felt that an aggregate function including the DISTINCT keyword is inherently inefficient and that using it is just a bit lazy. I don’t have to worry now because these days the CBO will convert my concise query into something less concise but more efficient. Take a look at Listing 13-15.

Listing 13-15. Distinct aggregation transformation

SELECT /*+ transform_distinct_agg */
/* no_transform_distinct_agg */
COUNT (DISTINCT cust_id) FROM sh.sales;

-- Untransformed execution plan (NO_TRANSFORM_DISTINCT_AGG)
----------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 407 (0)|
| 1 | SORT GROUP BY | | |
| 2 | PARTITION RANGE ALL | | 407 (0)|
| 3 | BITMAP CONVERSION TO ROWIDS | | 407 (0)|
| 4 | BITMAP INDEX FAST FULL SCAN| SALES_CUST_BIX | |
----------------------------------------------------------------------
-- Transformed query

WITH vw_dag
AS ( SELECT cust_id
FROM sh.sales
GROUP BY cust_id)
SELECT COUNT (cust_id)
FROM vw_dag;

-- Transformed execution plan (default)

------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 428 (5)|
| 1 | SORT AGGREGATE | | |
| 2 | VIEW | VW_DAG_0 | 428 (5)|
| 3 | HASH GROUP BY | | 428 (5)|
| 4 | PARTITION RANGE ALL | | 407 (0)|
| 5 | BITMAP CONVERSION TO ROWIDS | | 407 (0)|
| 6 | BITMAP INDEX FAST FULL SCAN| SALES_CUST_BIX | |
------------------------------------------------------------------------

The distinct aggregation transformation was introduced in 11gR2 and, as far as I can see, is a heuristic transformation that is unconditionally applied where legal. The estimated cost is higher with the transformation than without, but by separating the job of identifying the distinct values from the job of counting them the query is actually much faster.

Distinct Placement

Distinct placement is a cost-based transformation used from 11gR2 onwards to eliminate duplicate rows as soon as possible. Conceptually, distinct placement and group by placement, which we will discuss in the next section, are very similar. Listing 13-16 shows the basic idea.

Listing 13-16. Distinct placement

SELECT /*+no_partial_join(s1) no_partial_join(s2) place_distinct(s1) */
/* no_partial_join(s1) no_partial_join(s2) no_place_distinct(s1) */
DISTINCT cust_id, prod_id
FROM sh.sales s1 JOIN sh.sales s2 USING (cust_id, prod_id)
WHERE s1.time_id < DATE '2000-01-01' AND s2.time_id >=DATE '2000-01-01';

-- Untransformed execution plan (NO_PLACE_DISTINCT)
---------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------
| 0 | SELECT STATEMENT | | 5302 (1)|
| 1 | HASH UNIQUE | | 5302 (1)|
|* 2 | HASH JOIN | | 1787 (1)|
| 3 | PARTITION RANGE ITERATOR| | 246 (2)|
| 4 | TABLE ACCESS FULL | SALES | 246 (2)|
| 5 | PARTITION RANGE ITERATOR| | 275 (2)|
| 6 | TABLE ACCESS FULL | SALES | 275 (2)|
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("S1"."PROD_ID"="S2"."PROD_ID" AND
"S1"."CUST_ID"="S2"."CUST_ID")
-- Transformed query

WITH vw_dtp
AS (SELECT /*+ no_partial_join(s1) no_merge */
DISTINCT s1.cust_id, s1.prod_id
FROM sh.sales s1
WHERE s1.time_id < DATE '2000-01-01')
SELECT /*+ no_partial_join(s2) */
DISTINCT cust_id, prod_id
FROM vw_dtp NATURAL JOIN sh.sales s2
WHERE s2.time_id >=DATE '2000-01-01';

-- Transformed execution plan (default)

---------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5595 (1)|
| 1 | HASH UNIQUE | | 5595 (1)|
|* 2 | HASH JOIN | | 3326 (1)|
| 3 | VIEW | VW_DTP_6DE9D1A7 | 2138 (1)|
| 4 | HASH UNIQUE | | 2138 (1)|
| 5 | PARTITION RANGE ITERATOR| | 246 (2)|
| 6 | TABLE ACCESS FULL | SALES | 246 (2)|
| 7 | PARTITION RANGE ITERATOR | | 275 (2)|
| 8 | TABLE ACCESS FULL | SALES | 275 (2)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ITEM_2"="S2"."PROD_ID" AND "ITEM_1"="S2"."CUST_ID")

The query in Listing 13-16 finds the combinations of customer and product that occurred in both the 20th and 21st centuries. To keep things simple I have disabled the partial join transformation that would have taken place without the NO_PARTIAL_JOIN hints. The addition of the extraHASH UNIQUE operation means that the workarea used by the HASH JOIN operation on line 2 in the transformed execution plan contains only 72 rows as opposed to the 426,779 rows from SH.SALES associated with 20th century sales, which are placed in the workarea associated with operation 2 of the untransformed query. The cost estimates suggest that the transformation isn’t worth the trouble, but somehow the CBO performs the transformation anyway and the query runs measurably faster.

Group by Placement

Group by placement is implemented in the same way and for the same reasons as distinct placement. Listing 13-17 highlights some new points, however.

Listing 13-17. Group by placement

SELECT /*+ place_group_by((s p)) */
/* no_place_group_by */
cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_category
,SUM (s.amount_sold) total_amt_sold
FROM sh.sales s
JOIN sh.customers c USING (cust_id)
JOIN sh.products p USING (prod_id)
GROUP BY cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_category;

-- Untransformed execution plan (NO_PLACE_GROUP_BY)

----------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19957 (1)|
| 1 | HASH GROUP BY | | 19957 (1)|
|* 2 | HASH JOIN | | 2236 (1)|
| 3 | VIEW | index$_join$_004 | 2 (0)|
|* 4 | HASH JOIN | | |
| 5 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 6 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
|* 7 | HASH JOIN | | 2232 (1)|
| 8 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
| 9 | PARTITION RANGE ALL | | 517 (2)|
| 10 | TABLE ACCESS FULL | SALES | 517 (2)|
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("S"."PROD_ID"="P"."PROD_ID")
4 - access(ROWID=ROWID)
7 - access("S"."CUST_ID"="C"."CUST_ID")

-- Transformed query

WITH vw_gbc
AS ( SELECT /*+ no_place_group_by */
s.cust_id
,p.prod_category
,SUM (s.amount_sold) total_amt_sold
FROM sh.sales s JOIN sh.products p USING (prod_id)
GROUP BY s.cust_id, p.prod_category, prod_id)
SELECT /*+ no_place_group_by leading(vw_gbc)
use_hash(c) no_swap_join_inputs(c) */
cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,vw_gbc.prod_category
,SUM (vw_gbc.total_amt_sold) total_amt_sold
FROM vw_gbc JOIN sh.customers c USING (cust_id)
GROUP BY cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,vw_gbc.prod_category;

-- Transformed execution plan (default)

-------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4802 (1)|
| 1 | HASH GROUP BY | | 4802 (1)|
|* 2 | HASH JOIN | | 4319 (1)|
| 3 | VIEW | VW_GBC_1 | 3682 (1)|
| 4 | HASH GROUP BY | | 3682 (1)|
|* 5 | HASH JOIN | | 521 (2)|
| 6 | VIEW | index$_join$_004 | 2 (0)|
|* 7 | HASH JOIN | | |
| 8 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 9 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
| 10 | PARTITION RANGE ALL | | 517 (2)|
| 11 | TABLE ACCESS FULL | SALES | 517 (2)|
| 12 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ITEM_1"="C"."CUST_ID")
5 - access("S"."PROD_ID"="P"."PROD_ID")
7 - access(ROWID=ROWID)

The query in Listing 13-17 identifies the total sales by each customer for each product category. The transformed query performs an aggregation after the join of SH.SALES with SH.PRODUCTS so that the workarea for the HASH JOIN on line 2 is small.

There seem to be some restrictions1 on the application of the group by placement transformation, not all of which I have been able to work out. However, it does seem that on this occasion we can’t group the rows from SH.SALES without joining them at least once.

In Listing 13-17 the two operands of the join that produce the final result are:

· The intermediate result set formed by the join of SH.SALES and SH.PRODUCTS

· The table SH.CUSTOMERS

It is possible to perform group by placement on either or both of these operands. You can see the syntax for forcing group placement on the first of these operands in Listing 13-17. The hint PLACE_GROUP_BY ((S P) (C)) would be used to perform group placement on both operands.

Listing 13-18 shows what happens when we remove the hints from the transformed query in Listing 13-17 and then run EXPLAIN PLAN.

Listing 13-18. Group placement in a subquery

WITH vw_gbc
AS ( SELECT /*+ place_group_by((s)) */
/* no_place_group_by */
s.cust_id, p.prod_category, SUM (s.amount_sold) total_amt_sold
FROM sh.sales s JOIN sh.products p USING (prod_id)
GROUP BY s.cust_id, p.prod_category, prod_id)
SELECT /*+ place_group_by((vw_gbc)) */
/* no_place_group_by leading(vw_gbc)use_hash(c) no_swap_join_inputs(c) */
cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,vw_gbc.prod_category
,SUM (vw_gbc.total_amt_sold) total_amt_sold
FROM vw_gbc JOIN sh.customers c USING (cust_id)
GROUP BY cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,vw_gbc.prod_category;

-- Untransformed execution plan (two NO_PLACE_GROUP_BY)

-------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29396 (1)|
| 1 | HASH GROUP BY | | 29396 (1)|
|* 2 | HASH JOIN | | 11675 (1)|
| 3 | VIEW | | 9046 (1)|
| 4 | HASH GROUP BY | | 9046 (1)|
|* 5 | HASH JOIN | | 521 (2)|
| 6 | VIEW | index$_join$_002 | 2 (0)|
|* 7 | HASH JOIN | | |
| 8 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 9 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
| 10 | PARTITION RANGE ALL | | 517 (2)|
| 11 | TABLE ACCESS FULL | SALES | 517 (2)|
| 12 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("VW_GBC"."CUST_ID"="C"."CUST_ID")
5 - access("S"."PROD_ID"="P"."PROD_ID")
7 - access(ROWID=ROWID)

-- Transformed query

WITH vw_gbc_1
AS ( SELECT s.cust_id, s.prod_id, SUM (s.amount_sold) AS total_amt_sold
FROM sh.sales s
GROUP BY s.cust_id, s.prod_id)
,vw_gbc_2
AS ( SELECT vw_gbc_1.cust_id, vw_gbc_1.total_amt_sold, p.prod_category
FROM vw_gbc_1 JOIN sh.products p USING (prod_id)
GROUP BY vw_gbc_1.cust_id
,vw_gbc_1.total_amt_sold
,p.prod_category
,prod_id)
,vw_gbc
AS ( SELECT vw_gbc_2.cust_id
,SUM (vw_gbc_2.total_amt_sold) AS total_amt_sold
,vw_gbc_2.prod_category
FROM vw_gbc_2
GROUP BY vw_gbc_2.cust_id, vw_gbc_2.prod_category)
SELECT cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,vw_gbc.prod_category
,SUM (vw_gbc.total_amt_sold) total_amt_sold
FROM vw_gbc JOIN sh.customers c USING (cust_id)
GROUP BY cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,vw_gbc.prod_category;

-- Transformed execution plan (default)

---------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7224 (1)|
| 1 | HASH GROUP BY | | 7224 (1)|
|* 2 | HASH JOIN | | 6741 (1)|
| 3 | VIEW | VW_GBC_3 | 6104 (1)|
| 4 | HASH GROUP BY | | 6104 (1)|
| 5 | VIEW | | 6104 (1)|
| 6 | HASH GROUP BY | | 6104 (1)|
|* 7 | HASH JOIN | | 3022 (1)|
| 8 | VIEW | index$_join$_002 | 2 (0)|
|* 9 | HASH JOIN | | |
| 10 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 11 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
| 12 | VIEW | VW_GBC_2 | 3019 (1)|
| 13 | HASH GROUP BY | | 3019 (1)|
| 14 | PARTITION RANGE ALL | | 517 (2)|
| 15 | TABLE ACCESS FULL | SALES | 517 (2)|
| 16 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ITEM_1"="C"."CUST_ID")
7 - access("ITEM_1"="P"."PROD_ID")
9 - access(ROWID=ROWID)

Listing 13-18 shows that the transformed query from Listing 13-17 will undergo further transformations if we run or explain it directly. Now that we have two separate query blocks, the CBO can aggregate the data from SH.SALES before any join. The data is aggregated again after we join with SH.PRODUCTS and before we join with SH.CUSTOMERS. There is an extra and unnecessary aggregation introduced at this stage: the operations on lines 4 and 6 could be merged.

This analysis leads to an interesting SQL tuning technique: by looking at what transformations the CBO has performed, we may be able to further improve matters by running the transformation process iteratively. It may be that not all of these transformations will work out in practice, but at least you will have some ideas to try out!

Group by Pushdown

Group by pushdown does for parallel aggregations what distinct and group by placements do for join aggregations. The idea is the same: do as much aggregation as possible as early in the process as you can. Listing 13-19 shows the group by pushdown transformation in action.

Listing 13-19. Group by pushdown

SELECT /*+ parallel gby_pushdown */ /* parallel no_gby_pushdown */
prod_id
,cust_id
,promo_id
,COUNT (*) cnt
FROM sh.sales
WHERE amount_sold > 100
GROUP BY prod_id, cust_id, promo_id;
-- Untransformed execution plan (NO_GBY_PUSHDOWN)

---------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------
| 0 | SELECT STATEMENT | | 289 (3)|
| 1 | PX COORDINATOR | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 289 (3)|
| 3 | HASH GROUP BY | | 289 (3)|
| 4 | PX RECEIVE | | 288 (2)|
| 5 | PX SEND HASH | :TQ10000 | 288 (2)|
| 6 | PX BLOCK ITERATOR | | 288 (2)|
|* 7 | TABLE ACCESS FULL| SALES | 288 (2)|
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

7 - filter("AMOUNT_SOLD">100)

-- Concept of the transformed query

WITH pq
AS ( SELECT prod_id
,cust_id
,promo_id
,COUNT (*) cnt
FROM sh.sales
WHERE amount_sold > 100 AND time_id < DATE '2000-01-01'
GROUP BY prod_id, cust_id, promo_id
UNION ALL
SELECT prod_id
,cust_id
,promo_id
,COUNT (*) cnt
FROM sh.sales
WHERE amount_sold > 100 AND time_id >=DATE '2000-01-01'
GROUP BY prod_id, cust_id, promo_id)
SELECT prod_id
,cust_id
,promo_id
,SUM (cnt) AS cnt
FROM pq
GROUP BY prod_id, cust_id, promo_id;

-- Transformed execution plan

----------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 325 (1)|
| 1 | PX COORDINATOR | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 325 (1)|
| 3 | HASH GROUP BY | | 325 (1)|
| 4 | PX RECEIVE | | 325 (1)|
| 5 | PX SEND HASH | :TQ10000 | 325 (1)|
| 6 | HASH GROUP BY | | 325 (1)|
| 7 | PX BLOCK ITERATOR | | 144 (2)|
|* 8 | TABLE ACCESS FULL| SALES | 144 (2)|
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("AMOUNT_SOLD">100)

The query in Listing 13-19 is a straightforward COUNT aggregation performed on SH.SALES in parallel. Without the transformation, each row from SH.SALES where AMOUNT_SOLD > 100 would be sent from one parallel query server to another for aggregation. The concept of this transformation is shown in Listing 13-19 by a UNION ALL factored subquery. Each branch of the UNION ALL is supposed to relate to a parallel query server that reads a portion of the SH.SALES table. The conceptual query uses centuries to partition the data between branches, whereas the parallel query would use block-range granules.

In the conceptual query, each branch of the UNION ALL performs its own COUNT aggregation before the main query performs a SUM aggregation on the result to obtain the final result. In the parallel query, each parallel query slave that reads SH.SALES performs a COUNT aggregation on its subset of data so that the number of rows sent to DFO:TQ10001 for a final SUM aggregation is much reduced.

Once again, we can see that the CBO has performed a transformation that actually increases the estimated cost. The evidence is mounting that the cost-based transformation framework doesn’t actually perform final state optimization on each possible transformation, but instead performs some kind of approximate costing calculation.

Group by pushdown is a cost-based transformation that works for DISTINCT operations as well as for GROUP BY operations. Since GROUP BY and DISTINCT operations almost always significantly reduce cardinality, group by pushdown is almost always applied where legal.

The group by pushdown transformation is the last of the aggregation transformations that I wanted to cover. It is time now to look at subqueries.

Subquery Transformations

As I explained in Chapter 7, I have used the term subquery throughout this book in an informal way to indicate either a set query block or any part of a SQL statement that begins with the keyword SELECT and that is not the main query block of a query. So, for example, an inline view or data dictionary view is considered to be a subquery for the purposes of discussion here.

We have already discussed subquery unnesting both in Chapter 11 and earlier in this chapter, so I won’t be revisiting that topic again. But we still have a lot of other transformations to cover. Let us get started by briefly revisiting simple view merging.

Simple View Merging

We have already seen countless examples of simple view merging in this book, but let us take a brief formal look at the transformation in Listing 13-20.

Listing 13-20. Simple view merging

WITH q1
AS (SELECT /*+ MERGE */
/* NO_MERGE */
CASE prod_category
WHEN 'Electronics' THEN amount_sold * 0.9
ELSE amount_sold
END
AS adjusted_amount_sold
FROM sh.sales JOIN sh.products USING (prod_id))
SELECT adjusted_amount_sold, COUNT (*) cnt
FROM q1
GROUP BY adjusted_amount_sold;

-- Untransformed execution plan (NO_MERGE)

-----------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 542 (6)|
| 1 | HASH GROUP BY | | 542 (6)|
| 2 | VIEW | | 521 (2)|
|* 3 | HASH JOIN | | 521 (2)|
| 4 | VIEW | index$_join$_002 | 2 (0)|
|* 5 | HASH JOIN | | |
| 6 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 7 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
| 8 | PARTITION RANGE ALL | | 517 (2)|
| 9 | TABLE ACCESS FULL | SALES | 517 (2)|
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
5 - access(ROWID=ROWID)

-- Transformed query

SELECT CASE prod_category
WHEN 'Electronics' THEN amount_sold * 0.9
ELSE amount_sold
END
AS adjusted_amount_sold
FROM sh.sales JOIN sh.products USING (prod_id)
GROUP BY CASE prod_category
WHEN 'Electronics' THEN amount_sold * 0.9
ELSE amount_sold
END;

-- Transformed execution plan (default)

----------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3233 (1)|
| 1 | HASH GROUP BY | | 3233 (1)|
|* 2 | HASH JOIN | | 521 (2)|
| 3 | VIEW | index$_join$_002 | 2 (0)|
|* 4 | HASH JOIN | | |
| 5 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 6 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
| 7 | PARTITION RANGE ALL | | 517 (2)|
| 8 | TABLE ACCESS FULL | SALES | 517 (2)|
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
4 - access(ROWID=ROWID)

The concepts of subquery unnesting and view merging are easy to confuse:

· View merging applies to inline views, factored subqueries, and data dictionary views that appear as row sources in the FROM clause of an enclosing query block. View merging is controlled by the MERGE and NO_MERGE hints.

· Subquery unnesting relates to subqueries in the SELECT list, WHERE clause, or anywhere else that Oracle may in the future support. Subquery unnesting is controlled by the UNNEST and NO_UNNEST hints.

Listing 13-20 shows how the use of a factored subquery can save typing; we didn’t have to replicate the GROUP BY expression to the select list. We just used the column alias ADJUSTED_AMOUNT_SOLD from the subquery. Once again, Oracle transformed our concise query into something conceptually simpler, if less concise.

Simple view merging is a heuristic transformation that is unconditionally applied where legal. The main benefit of simple view merging is that additional join orders are possible (though not in Listing 13-20).

I won’t go into the restrictions on simple view merging. If you just remember that simple row sources can be merged and complex ones can’t then you will be okay. There is one special case where a complex row source can be merged, but not by simple view merging. Let us look at that now.

Complex View Merging

If a subquery contains a GROUP BY clause and/or a DISTINCT keyword, and that is the only reason that simple view merging can’t be used, then complex view merging is an alternative. Listing 13-21 gives an example.

Listing 13-21. Complex view merging

WITH agg_q
AS ( SELECT /*+ merge */
/* no_merge */
s.cust_id, s.prod_id, SUM (s.amount_sold) total_amt_sold
FROM sh.sales s
GROUP BY s.cust_id, s.prod_id)
SELECT cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_name
,agg_q.total_amt_sold
FROM agg_q
JOIN sh.customers c USING (cust_id)
JOIN sh.countries co USING (country_id)
JOIN sh.products p USING (prod_id)
WHERE co.country_name = 'Japan'
AND prod_category = 'Photo'
AND total_amt_sold > 20000;

-- Untransformed execution plan (NO_MERGE)

---------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 541 (6)|
| 1 | NESTED LOOPS | | |
| 2 | NESTED LOOPS | | 541 (6)|
| 3 | NESTED LOOPS | | 540 (6)|
| 4 | NESTED LOOPS | | 539 (6)|
| 5 | VIEW | | 538 (6)|
|* 6 | FILTER | | |
| 7 | HASH GROUP BY | | 538 (6)|
| 8 | PARTITION RANGE ALL | | 517 (2)|
| 9 | TABLE ACCESS FULL | SALES | 517 (2)|
| 10 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS | 1 (0)|
|* 11 | INDEX UNIQUE SCAN | CUSTOMERS_PK | 0 (0)|
|* 12 | TABLE ACCESS BY INDEX ROWID | COUNTRIES | 1 (0)|
|* 13 | INDEX UNIQUE SCAN | COUNTRIES_PK | 0 (0)|
|* 14 | INDEX UNIQUE SCAN | PRODUCTS_PK | 0 (0)|
|* 15 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 1 (0)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - filter(SUM("S"."AMOUNT_SOLD")>20000)
11 - access("AGG_Q"."CUST_ID"="C"."CUST_ID")
12 - filter("CO"."COUNTRY_NAME"='Japan')
13 - access("C"."COUNTRY_ID"="CO"."COUNTRY_ID")
14 - access("AGG_Q"."PROD_ID"="P"."PROD_ID")
15 - filter("P"."PROD_CATEGORY"='Photo')

-- Transformed query

SELECT cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_name
,SUM (s.amount_sold) AS total_amt_sold
FROM sh.sales s
JOIN sh.customers c USING (cust_id)
JOIN sh.countries co USING (country_id)
JOIN sh.products p USING (prod_id)
WHERE co.country_name = 'Japan' AND prod_category = 'Photo'
GROUP BY cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_name
HAVING SUM (s.amount_sold) > 20000;

-- Transformed execution plan (default)

------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 948 (2)|
|* 1 | FILTER | | |
| 2 | HASH GROUP BY | | 948 (2)|
|* 3 | HASH JOIN | | 948 (2)|
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 3 (0)|
|* 5 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
|* 6 | HASH JOIN | | 945 (2)|
|* 7 | HASH JOIN | | 426 (1)|
|* 8 | TABLE ACCESS FULL | COUNTRIES | 3 (0)|
| 9 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
| 10 | PARTITION RANGE ALL | | 517 (2)|
| 11 | TABLE ACCESS FULL | SALES | 517 (2)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(SUM("S"."AMOUNT_SOLD")>20000)
3 - access("S"."PROD_ID"="P"."PROD_ID")
5 - access("P"."PROD_CATEGORY"='Photo')
6 - access("S"."CUST_ID"="C"."CUST_ID")
7 - access("C"."COUNTRY_ID"="CO"."COUNTRY_ID")
8 - filter("CO"."COUNTRY_NAME"='Japan')

Listing 13-21 provides the total sales of photographic products in Japan to each customer. The benefit of view merging here is that the aggregation is deferred until the rows from other countries and other product categories have been filtered out.

Now, you may be wondering why simple and complex view merging are considered separately. After all, both transformations do the same thing, and both are managed with MERGE and NO_MERGE hints. The crucial difference is that simple view merging is a heuristic transformation and complex view merging is a cost-based transformation.

If you don’t merge a complex view you have the opportunity to aggregate early and reduce the rows that are input to the join. If you do merge then your join can reduce the number of rows being aggregated. So whether you do the join or the aggregation first depends on which operation will reduce the intermediate result set the most. In fact, if you remove the WHERE clause from the original query in Listing 13-21, complex view merging will not happen unless you include a MERGE hint.

Does this sound familiar? It should. The discussion we are having here is the exact same discussion as the one we had with the distinct placement and group by placement transformations, but in reverse! The downloadable materials show that if you take the transformed query from Listing 13-16 and add a MERGE hint you can recreate the original query by reversing the original transformation.2

Factored Subquery Materialization

In Chapter 1 I explained that one of the key benefits of factored subqueries is that they can be used more than once in a query. In fact, when they are used more than once a temporary table is created that is specific to the statement; this table holds the results of the factored subquery. Listing 13-22 demonstrates.

Listing 13-22. Factored subquery materialization

WITH q1
AS ( SELECT prod_name, prod_category, SUM (amount_sold) total_amt_sold
FROM sh.sales JOIN sh.products USING (prod_id)
WHERE prod_category = 'Electronics'
GROUP BY prod_name, prod_category)
,q2
AS (SELECT 1 AS order_col, prod_name, total_amt_sold FROM q1
UNION ALL
SELECT 2, 'Total', SUM (total_amt_sold) FROM q1)
SELECT prod_name, total_amt_sold
FROM q2
ORDER BY order_col, prod_name;

------------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 529 (3)|
| 1 | TEMP TABLE TRANSFORMATION | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D672C_4E5E5D | |
| 3 | HASH GROUP BY | | 525 (3)|
|* 4 | HASH JOIN | | 522 (2)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 3 (0)|
|* 6 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 7 | PARTITION RANGE ALL | | 517 (2)|
| 8 | TABLE ACCESS FULL | SALES | 517 (2)|
| 9 | SORT ORDER BY | | 4 (0)|
| 10 | VIEW | | 4 (0)|
| 11 | UNION-ALL | | |
| 12 | VIEW | | 2 (0)|
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D672C_4E5E5D | 2 (0)|
| 14 | SORT AGGREGATE | | |
| 15 | VIEW | | 2 (0)|
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D672C_4E5E5D | 2 (0)|
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
6 - access("PRODUCTS"."PROD_CATEGORY"='Electronics')

Listing 13-22 is a little different from the other listings in this chapter, as I am using an unhinted query and showing only the transformed execution plan. The query shows the total sales by product, with a grand total given at the end. Listing 13-22 shows the sort of query that might be written by somebody unfamiliar with the use of the ROLLUP keyword in GROUP BY operations.

The TEMP TABLE TRANSFORMATION operation signals the creation of one or more temporary tables that exist purely for the duration of the operation. The children of this operation are always one or more LOAD AS SELECT operations that insert rows into the temporary tables along with one extra child for the main query that utilizes the temporary table or tables created.

Listing 13-22 uses the factored subquery Q1 twice, but the subquery is only evaluated once; the results of the query are stored in a temporary table with a bizarre name (SYS_TEMP_0FD9D672C_4E5E5D in this case), and that temporary table is used in the two branches of the UNION ALL operation.

Factored subquery materialization is not possible when an object type or an LOB is selected, and the transformation is also illegal in distributed transactions.

When legal, factored subquery materialization is a heuristic optimization. If a subquery is referenced once then it will not be materialized, but if it is referenced twice or more it normally will be.3 You can see in Listing 13-22 that factored subquery Q2 is referenced only once and so it is not materialized. The CBO chooses instead to perform simple view merging of the factored subquery with the main query.

All this sounds very sensible, but don’t think for one minute that factored subquery materialization is a no-brainer. We will discuss in Chapter 18 why you might need to override the heuristic rules, but for now we need to focus on how we override the default behavior.

The hinting process is a little strange in several ways:

· To force factored subquery materialization you use the MATERIALIZE hint, but to disable it you use the INLINE hint.

· Neither the MATERIALIZE hint nor the INLINE hint support global hint syntax. The hints have to be placed in the factored subquery itself.

· A consequence of the above restriction is that MATERIALIZE and INLINE hints never appear in outline hints, SQL baselines, SQL profiles, etc. Placing a hint in the code is the only way!

Bearing these points in mind, Listing 13-23 shows how you might prevent the materialization of factored subquery Q1 and materialize Q2 (sort of).

Listing 13-23. Use of MATERIALIZE and INLINE hints

WITH q1
AS ( SELECT /*+ inline */
prod_name, prod_category, SUM (amount_sold) total_amt_sold
FROM sh.sales JOIN sh.products USING (prod_id)
WHERE prod_category = 'Electronics'
GROUP BY prod_name, prod_category)
,q2
AS (SELECT 1 AS order_col, prod_name, total_amt_sold FROM q1
UNION ALL
SELECT 2, 'Total', SUM (total_amt_sold) FROM q1)
,q3AS (SELECT /*+ materialize */
* FROM q2)
SELECT prod_name, total_amt_sold
FROM q3
ORDER BY order_col, prod_name;

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1053 (3)|
| 1 | TEMP TABLE TRANSFORMATION | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D672D_4E5E5D | |
| 3 | VIEW | | 1051 (3)|
| 4 | UNION-ALL | | |
| 5 | HASH GROUP BY | | 525 (3)|
|* 6 | HASH JOIN | | 522 (2)|
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED | PRODUCTS | 3 (0)|
|* 8 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 9 | PARTITION RANGE ALL | | 517 (2)|
| 10 | TABLE ACCESS FULL | SALES | 517 (2)|
| 11 | SORT AGGREGATE | | |
| 12 | VIEW | | 525 (3)|
| 13 | HASH GROUP BY | | 525 (3)|
|* 14 | HASH JOIN | | 522 (2)|
| 15 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 3 (0)|
|* 16 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 17 | PARTITION RANGE ALL | | 517 (2)|
| 18 | TABLE ACCESS FULL | SALES | 517 (2)|
| 19 | SORT ORDER BY | | 2 (0)|
| 20 | VIEW | | 2 (0)|
| 21 | TABLE ACCESS FULL | SYS_TEMP_0FD9D672D_4E5E5D | 2 (0)|
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
8 - access("PRODUCTS"."PROD_CATEGORY"='Electronics')
14 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
16 - access("PRODUCTS"."PROD_CATEGORY"='Electronics')

The use of the MATERIALIZE and INLINE hints in Listing 13-23 has resulted in the join of SH.PRODUCTS and SH.SALES being done twice and has also resulted in the results of the UNION ALL being placed in a temporary table, even though said temporary table is referenced only once.

I had to cheat a bit here. You will notice that I have surreptitiously added a third factored subquery, Q3, to Listing 13-23. This is because the query block from Q2 was the set query block SET$1, which could only be hinted using a global hint (a hint applied after the first SELECT would apply to the first branch of the UNION ALL). Given that MATERIALIZE only supports local hinting, the only thing to do was to create a new query block into which SET$1 could be merged and then materialize that.

I want to emphasize that this discussion has focused on the mechanics of factored subquery materialization. I don’t want to leave you with the impression that the hinting that I have applied in Listing 13-23 makes any sense from a performance perspective. We will return to performance issues in Chapter 18.

Subquery Pushdown

When a subquery appears in a WHERE clause, the CBO will usually unnest it if it can. If the subquery cannot be unnested for some reason then the CBO is left with a similar question to the one it has with aggregations: should the subquery be evaluated as early as possible or be deferred until after any joins? Curiously, this decision is made by a query transformation.

Although subquery pushdown is implemented as a query transformation it isn’t logically a transformation. Indeed, I assumed for some time that the CBO decided when a subquery should be evaluated as part of the final state transformation logic that we discussed in Chapter 12. It certainly isn’t possible to provide any kind of transformed SQL that shows the result of the so-called transformation.

It works like this: the CBO begins by assuming that subqueries in a WHERE clause should be evaluated after all joins have been performed. Only if the cost-based transformation framework decides that this should change (or if there is a hint) will this behavior change. Listing 13-24 might help explain the point a bit better.

Listing 13-24. Subquery pushdown

WITH q1
AS ( SELECT prod_id
FROM sh.sales
GROUP BY prod_id
ORDER BY SUM (amount_sold) DESC)
SELECT c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_name
FROM sh.sales s
JOIN sh.customers c USING (cust_id)
JOIN sh.products p USING (prod_id)
WHERE prod_id = (SELECT /*+ push_subq */
/* no_push_subq */
prod_id
FROM q1
WHERE ROWNUM = 1)
GROUP BY c.cust_first_name
,c.cust_last_name
,c.cust_email
,p.prod_name
HAVING SUM (s.amount_sold) > 20000;

-- Subquery not pushed (NO_PUSH_SUBQ)

-------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10277 (1)|
|* 1 | FILTER | | |
| 2 | HASH GROUP BY | | 10277 (1)|
|* 3 | FILTER | | |
|* 4 | HASH JOIN | | 2237 (1)|
| 5 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
|* 6 | HASH JOIN | | 2232 (1)|
| 7 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
| 8 | PARTITION RANGE ALL | | 517 (2)|
| 9 | TABLE ACCESS FULL | SALES | 517 (2)|
|* 10 | COUNT STOPKEY | | |
| 11 | VIEW | | 559 (9)|
|* 12 | SORT ORDER BY STOPKEY| | 559 (9)|
| 13 | SORT GROUP BY | | 559 (9)|
| 14 | PARTITION RANGE ALL| | 517 (2)|
| 15 | TABLE ACCESS FULL | SALES | 517 (2)|
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(SUM("S"."AMOUNT_SOLD")>20000)
3 - filter("P"."PROD_ID"= (SELECT /*+ NO_PUSH_SUBQ */ "PROD_ID" FROM
(SELECT "PROD_ID" "PROD_ID" FROM "SH"."SALES" "SALES" GROUP BY
"PROD_ID" ORDER BY SUM("AMOUNT_SOLD") DESC) "Q1" WHERE ROWNUM=1))
4 - access("S"."PROD_ID"="P"."PROD_ID")
6 - access("S"."CUST_ID"="C"."CUST_ID")
10 - filter(ROWNUM=1)
12 - filter(ROWNUM=1)

-- Subquery pushed (default)

--------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 865 (1)|
|* 1 | FILTER | | |
| 2 | HASH GROUP BY | | 865 (1)|
|* 3 | HASH JOIN | | 864 (1)|
| 4 | NESTED LOOPS | | 441 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 1 (0)|
|* 6 | INDEX UNIQUE SCAN | PRODUCTS_PK | 0 (0)|
|* 7 | COUNT STOPKEY | | |
| 8 | VIEW | | 559 (9)|
|* 9 | SORT ORDER BY STOPKEY | | 559 (9)|
| 10 | SORT GROUP BY | | 559 (9)|
| 11 | PARTITION RANGE ALL | | 517 (2)|
| 12 | TABLE ACCESS FULL | SALES | 517 (2)|
| 13 | PARTITION RANGE ALL | | 441 (0)|
| 14 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 441 (0)|
| 15 | BITMAP CONVERSION TO ROWIDS | | |
|* 16 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | |
| 17 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(SUM("S"."AMOUNT_SOLD")>20000)
3 - access("S"."CUST_ID"="C"."CUST_ID")
6 - access("P"."PROD_ID"= (SELECT /*+ PUSH_SUBQ */ "PROD_ID" FROM (SELECT
"PROD_ID" "PROD_ID" FROM "SH"."SALES" "SALES" GROUP BY "PROD_ID" ORDER BY
SUM("AMOUNT_SOLD") DESC) "Q1" WHERE ROWNUM=1))
7 - filter(ROWNUM=1)
9 - filter(ROWNUM=1)
16 - access("S"."PROD_ID"="P"."PROD_ID")

The presence of the ROWNUM=1 predicate in the subquery of Listing 13-24 precludes any subquery unnesting. If the subquery pushdown transformation is suppressed by the NO_PUSH_SUBQ hint then a FILTER operation appears in the execution plan. Theoretically, if you see a FILTERoperation with more than one operand then the second and subsequent operands (the subquery or subqueries) are evaluated for each row returned by the first operand (the main query). In fact, in Listing 13-24 the subquery is not correlated, and scalar subquery caching will ensure that the subquery is evaluated only once.

But in the case of Listing 13-24, suppressing the transformation slows down the query. The point is that without the transformation we will be joining all rows from SH.SALES with the SH.PRODUCTS and SH.CUSTOMERS tables, discarding most of the joined rows at the end. What we want to do is to evaluate the subquery early and apply the filtering to the rows from SH.PRODUCTS so as to get the one row back that we want. We can then use a nested loops join and indexed access to get just the rows matching the one value of PROD_ID from SH.SALES.

When subquery pushdown is applied, the subquery gets applied as a child of an operation that you wouldn’t normally expect to have any children. In this case, the INDEX UNIQUE SCAN of operation 6 obtains the value of PROD_ID that it needs by evaluating its child.

image Caution The documentation in the SQL Language Reference manual for the PUSH_SUBQ hint claims that the hint will have no effect when applied to a table that is joined using a merge join. The actual restriction only applies when the subquery is correlated to a second table, in which case the second correlated table must precede the first in the join order, and the first table must be accessed with a nested loops join.

Listing 13-25 shows an example in which subquery pushdown is illegal because of the join method.

Listing 13-25. Subquery pushdown as illegal and as a bad idea

SELECT /* leading(p s1) use_nl(s1) */
-- Add plus sign for hint to be recognised
*
FROM sh.sales s1
JOIN sh.customers c USING (cust_id)
JOIN sh.products p ON s1.prod_id = p.prod_id
WHERE s1.amount_sold > (SELECT /*+ push_subq */
-- Hint inapplicable to hash joins
AVG (s2.amount_sold)
FROM sh.sales s2
WHERE s2.prod_id = p.prod_id)
AND p.prod_category = 'Electronics'
AND c.cust_year_of_birth = 1919;

-- Execution plan with hash join (push_subq hint ignored)

------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 942 (2)|
|* 1 | FILTER | | |
|* 2 | HASH JOIN | | 610 (2)|
|* 3 | TABLE ACCESS FULL | CUSTOMERS | 271 (1)|
|* 4 | HASH JOIN | | 339 (3)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 3 (0)|
|* 6 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 7 | PARTITION RANGE ALL | | 334 (3)|
| 8 | TABLE ACCESS FULL | SALES | 334 (3)|
| 9 | SORT AGGREGATE | | |
| 10 | PARTITION RANGE ALL | | 332 (2)|
|* 11 | TABLE ACCESS FULL | SALES | 332 (2)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("S1"."AMOUNT_SOLD"> (SELECT /*+ PUSH_SUBQ */
AVG("S2"."AMOUNT_SOLD") FROM "SH"."SALES" "S2" WHERE "S2"."PROD_ID"=:B1))
2 - access("S1"."CUST_ID"="C"."CUST_ID")
3 - filter("C"."CUST_YEAR_OF_BIRTH"=1919)
4 - access("S1"."PROD_ID"="P"."PROD_ID")
6 - access("P"."PROD_CATEGORY"='Electronics')
11 - filter("S2"."PROD_ID"=:B1)

-- Transformed query when nested loops used (push_subq hint honored)

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6166 (1)|
|* 1 | HASH JOIN | | 5726 (1)|
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMERS | 10 (0)|
| 3 | BITMAP CONVERSION TO ROWIDS | | |
|* 4 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_YOB_BIX | |
| 5 | NESTED LOOPS | | 5716 (1)|
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | PRODUCTS | 3 (0)|
|* 7 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 8 | PARTITION RANGE ALL | | 5716 (1)|
|* 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | SALES | 5716 (1)|
| 10 | BITMAP CONVERSION TO ROWIDS | | |
|* 11 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | |
| 12 | SORT AGGREGATE | | |
| 13 | PARTITION RANGE ALL | | 440 (0)|
| 14 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 440 (0)|
| 15 | BITMAP CONVERSION TO ROWIDS | | |
|* 16 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("S1"."CUST_ID"="C"."CUST_ID")
4 - access("C"."CUST_YEAR_OF_BIRTH"=1919)
7 - access("P"."PROD_CATEGORY"='Electronics')
9 - filter("S1"."AMOUNT_SOLD"> (SELECT /*+ PUSH_SUBQ */ AVG("S2"."AMOUNT_SOLD")
FROM "SH"."SALES" "S2" WHERE "S2"."PROD_ID"=:B1))
11 - access("S1"."PROD_ID"="P"."PROD_ID")
16 - access("S2"."PROD_ID"=:B1)

Listing 13-25 can’t use the USING clause in the join with SH.PRODUCTS because of the need to qualify the reference in the subquery with a table alias. The execution plan for the query in Listing 13-25 uses a hash join to access the SH.SALES table and so it is not possible to evaluate the subquery early. All the illegal hint did was to prevent subquery unnesting. However, if we apply LEADING and USE_NL hints to force a nested loops join, it is now perfectly legal, if inadvisable, to push the subquery so that it is evaluated before the join with SH.CUSTOMERS. The table access on line 9 now has two children. The first child obtains ROWIDs for each row returned by the bitmap index operation. Theoretically, the second child of operation 9 is evaluated for every ROWID, but subquery caching ensures that the subquery is evaluated just 13 times, once for each of the 13 products in the electronics category. If subquery evaluation had been deferred until after the join with SH.CUSTOMERS, the subquery would have been evaluated just 4 times, since customers born in 1919 only bought 4 of the 13 electronics products.

Join Predicate Pushdown

When view merging is inadvisable or impossible, the join predicate pushdown (JPPD) transformation is a fallback option. Take a look at Listing 13-26.

Listing 13-26. Join predicate pushdown

WITH agg_q
AS ( SELECT /*+ push_pred */
/* no_push_pred */
s.cust_id
,prod_id
,p.prod_name
,SUM (s.amount_sold) total_amt_sold
FROM sh.sales s JOIN sh.products p USING (prod_id)
GROUP BY s.cust_id, prod_id)
SELECT cust_id
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,agg_q.total_amt_sold
FROM agg_q RIGHT JOIN sh.customers c USING (cust_id)
WHERE cust_first_name = 'Abner' AND cust_last_name = 'Everett';

-- Untransformed execution plan (NO_PUSH_PRED)

--------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5519 (1)|
|* 1 | HASH JOIN OUTER | | 5519 (1)|
|* 2 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
| 3 | VIEW | | 5095 (1)|
| 4 | HASH GROUP BY | | 5095 (1)|
|* 5 | HASH JOIN | | 3021 (1)|
| 6 | INDEX FULL SCAN | PRODUCTS_PK | 1 (0)|
| 7 | VIEW | VW_GBC_6 | 3019 (1)|
| 8 | HASH GROUP BY | | 3019 (1)|
| 9 | PARTITION RANGE ALL| | 517 (2)|
| 10 | TABLE ACCESS FULL | SALES | 517 (2)|
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("C"."CUST_ID"="AGG_Q"."CUST_ID"(+))
2 - filter("C"."CUST_FIRST_NAME"='Abner' AND
"C"."CUST_LAST_NAME"='Everett')
5 - access("ITEM_1"="P"."PROD_ID")
-- Transformed query

SELECT c.cust_id
,agg_q.prod_id
,agg_q.prod_name
,c.cust_first_name
,c.cust_last_name
,c.cust_email
,agg_q.total_amt_sold
FROM sh.customers c
OUTER APPLY
( SELECT prod_id, p.prod_name, SUM (s.amount_sold) total_amt_sold
FROM sh.sales s JOIN sh.products p USING (prod_id)
WHERE s.cust_id = c.cust_id
GROUP BY prod_id, p.prod_name) agg_q
WHERE cust_first_name = 'Abner' AND cust_last_name = 'Everett';

-- Transformed execution plan (default)

----------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 480 (0)|
| 1 | NESTED LOOPS OUTER | | 480 (0)|
|* 2 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
| 3 | VIEW PUSHED PREDICATE | | 58 (0)|
| 4 | SORT GROUP BY | | 58 (0)|
|* 5 | HASH JOIN | | 58 (0)|
| 6 | VIEW | VW_GBC_6 | 55 (0)|
| 7 | SORT GROUP BY | | 55 (0)|
| 8 | PARTITION RANGE ALL | | 55 (0)|
| 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 55 (0)|
| 10 | BITMAP CONVERSION TO ROWIDS | | |
|* 11 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | |
| 12 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C"."CUST_FIRST_NAME"='Abner' AND "C"."CUST_LAST_NAME"='Everett')
5 - access("ITEM_1"="P"."PROD_ID")
11 - access("S"."CUST_ID"="C"."CUST_ID")

The query in Listing 13-26 shows the total sales for each product made to the 14 customers named Abner Everett. The use of an outer join precludes both simple and complex view merging, and so without JPPD the GROUP BY operation would generate 328,395 rows for each combination of customer and product. The JPPD transformation requires a nested loops join from SH.CUSTOMERS into the view so that a predicate on CUST_ID can be applied each time the subquery is evaluated. Notice that both variants of the execution plan in Listing 13-26 show the application of the group by placement transformation; rows from SH.SALES are grouped by CUST_ID before joining with the products table.

According to the optimizer team’s blog, https://blogs.oracle.com/optimizer/entry/basics_of_join_predicate_pushdown_in_oracle, JPPD is only supported for the following types of view:

· UNION ALL/UNION view

· Outer-joined view

· Anti-joined view

· Semi-joined view

· DISTINCT view

· GROUP-BY view

Why is JPPD restricted in this way? Well, if none of these conditions apply it is almost certain that the heuristic simple view merging transformation will apply anyway, rendering the JPPD restriction moot. However, I will cover in Chapter 19 an obscure case in which neither JPPD nor simple view merging are supported.

JPPD is a cost-based transformation that can easily be recognized in an execution plan by the presence of either the VIEW PUSHED PREDICATE or UNION ALL PUSHED PREDICATE operations. The benefit of filtering at an early stage comes at the cost of having to execute the subquery multiple times. If evaluating the subquery multiple times is too costly, the CBO may elect not to apply the transformation and then use a hash join so that the subquery is evaluated only once.

Although JPPD has been around for a long time, it is only with 12c SQL syntax that we have the ability to show the results of the transformation. The transformed query in Listing 13-26 uses the ANSI syntax variant for an outer lateral join, which should have precisely the same effect as the join predicate pushdown transformation. In 12.1.0.1, however, there seems to be a bug that prevents the CBO from generating an optimal execution plan for the lateral join, even when hinted.

Subquery Decorrelation

JPPD and subquery decorrelation are inverse transformations of each other in the same way that complex view merging and group by placement are inverse transformations of each other. Subquery decorrelation is shown in Listing 13-27.

Listing 13-27. Subquery decorrelation

SELECT o.order_id
,o.order_date
,o.order_mode
,o.customer_id
,o.order_status
,o.order_total
,o.sales_rep_id
,o.promotion_id
,agg_q.max_quantity
FROM oe.orders o
CROSS APPLY(SELECT /*+ decorrelate */
/* no_decorrelate */
MAX (oi.quantity) max_quantity
FROM oe.order_items oi
WHERE oi.order_id = o.order_id) agg_q
WHERE o.order_id IN (2458, 2397);

-- Untransformed execution plan (NO_DECORRELATE)

-------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 (0)|
| 1 | NESTED LOOPS | | 8 (0)|
| 2 | INLIST ITERATOR | | |
| 3 | TABLE ACCESS BY INDEX ROWID | ORDERS | 2 (0)|
|* 4 | INDEX UNIQUE SCAN | ORDER_PK | 1 (0)|
| 5 | VIEW | VW_LAT_535DE542 | 3 (0)|
| 6 | SORT AGGREGATE | | |
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| ORDER_ITEMS | 3 (0)|
|* 8 | INDEX RANGE SCAN | ITEM_ORDER_IX | 1 (0)|
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("O"."ORDER_ID"=2397 OR "O"."ORDER_ID"=2458)
8 - access("OI"."ORDER_ID"="O"."ORDER_ID")

-- Transformed query

SELECT o.order_id order_id
,o.order_date order_date
,o.order_mode order_mode
,o.customer_id customer_id
,o.order_status order_status
,o.order_total order_total
,o.sales_rep_id sales_rep_id
,o.promotion_id promotion_id
,MAX (oi.quantity) max_quantity
FROM oe.orders o
LEFT JOIN oe.order_items oi
ON oi.order_id = o.order_id
AND (oi.order_id = 2397 OR oi.order_id = 2458)
WHERE (o.order_id = 2397 OR o.order_id = 2458)
GROUP BY o.order_id
,o.ROWID
,o.promotion_id
,o.sales_rep_id
,o.order_total
,o.order_status
,o.customer_id
,o.order_mode
,o.order_date
,o.order_id;

-- Transformed execution plan (default)

-------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 (0)|
| 1 | HASH GROUP BY | | 5 (0)|
|* 2 | HASH JOIN OUTER | | 5 (0)|
| 3 | INLIST ITERATOR | | |
| 4 | TABLE ACCESS BY INDEX ROWID| ORDERS | 2 (0)|
|* 5 | INDEX UNIQUE SCAN | ORDER_PK | 1 (0)|
|* 6 | TABLE ACCESS FULL | ORDER_ITEMS | 3 (0)|
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OI"."ORDER_ID"(+)="O"."ORDER_ID")
5 - access("O"."ORDER_ID"=2397 OR "O"."ORDER_ID"=2458)
6 - filter("OI"."ORDER_ID"(+)=2397 OR "OI"."ORDER_ID"(+)=2458)

I could have used the transformed query from Listing 13-26 to demonstrate subquery decorrelation, but some considerable fancy hinting would have been needed to prevent JPPD from being applied to the decorellated result! Instead of doing that, I have used a lateral join using theOE.ORDERS and OE.ORDER_ITEMS tables to demonstrate some additional points.

Listing 13-27 lists the two rows from orders 2397 and 2458 in OE.ORDERS together with the maximum value of QUANTITY for the matching order from OE.ORDER_ITEMS.

The transformed query can now use a hash join and is something a mere mortal is unlikely to dream up. Here are some interesting points to note:

· Even though this was an inner lateral join (CROSS APPLY) and not an outer lateral join (OUTER APPLY), the transformed query still used an outer join. This is because the subquery is guaranteed to return exactly one row, even if there were no line items for the order.

· The GROUP BY clause includes O.ROWID. This isn’t necessary in this case because a primary key is in the GROUP BY list, but in general this is needed to ensure that there is no aggregation of rows from the left side of the join; aggregation only applies to the right side, in this case OE.ORDER_ITEMS.

· The transformation has successfully applied transitive closure to allow filtering of the OE.ORDERS and OE.ORDER_ITEMS separately.

Subquery Coalescing

Subquery coalescing is a transformation introduced in 11gR2 and is still in its infancy. Listing 13-28 shows one of the few cases where the transformation can be applied.

Listing 13-28. Subquery coalescing

SELECT *
FROM sh.sales s1
WHERE EXISTS
(SELECT /*+ coalsesce_sq */
/* no_coalesce_sq */
*
FROM sh.sales s2
WHERE s1.time_id = s2.time_id
AND s2.amount_sold > s1.amount_sold + 100)
OR EXISTS
(SELECT /*+ coalsesce_sq */
/* no_coalesce_sq */
*
FROM sh.sales s3
WHERE s1.time_id = s3.time_id
AND s3.amount_sold < s1.amount_sold - 100);

-- Untransformed execution plan (NO_COALESCE_SQ)

-----------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1205K (1)|
|* 1 | FILTER | | |
| 2 | PARTITION RANGE ALL | | 518 (2)|
| 3 | TABLE ACCESS FULL | SALES | 518 (2)|
| 4 | PARTITION RANGE SINGLE | | 1 (0)|
|* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 1 (0)|
| 6 | BITMAP CONVERSION TO ROWIDS | | |
|* 7 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | |
| 8 | PARTITION RANGE SINGLE | | 1 (0)|
|* 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 1 (0)|
| 10 | BITMAP CONVERSION TO ROWIDS | | |
|* 11 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ NO_COALESCE_SQ */ 0 FROM "SH"."SALES"
"S2" WHERE "S2"."TIME_ID"=:B1 AND "S2"."AMOUNT_SOLD">:B2+100) OR EXISTS
(SELECT 0 FROM "SH"."SALES" "S3" WHERE "S3"."TIME_ID"=:B3 AND
"S3"."AMOUNT_SOLD"<:B4-100))
5 - filter("S2"."AMOUNT_SOLD">:B1+100)
7 - access("S2"."TIME_ID"=:B1)
9 - filter("S3"."AMOUNT_SOLD"<:B1-100)
11 - access("S3"."TIME_ID"=:B1)

-- Transformed query

SELECT *
FROM sh.sales s1
WHERE EXISTS
(SELECT *
FROM sh.sales s2
WHERE s2.amount_sold > s1.amount_sold + 100
AND s2.time_id = s1.time_id
OR s2.amount_sold < s1.amount_sold - 100
AND s2.time_id = s1.time_id);

-- Transformed execution plan (default)

-----------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 358K (1)|
|* 1 | FILTER | | |
| 2 | PARTITION RANGE ALL | | 517 (2)|
| 3 | TABLE ACCESS FULL | SALES | 517 (2)|
| 4 | PARTITION RANGE INLIST | | 5 (0)|
|* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 5 (0)|
| 6 | BITMAP CONVERSION TO ROWIDS | | |
| 7 | BITMAP OR | | |
|* 8 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | |
|* 9 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT 0 FROM "SH"."SALES" "S2" WHERE
("S2"."TIME_ID"=:B1 OR "S2"."TIME_ID"=:B2) AND ("S2"."TIME_ID"=:B3 AND
"S2"."AMOUNT_SOLD">:B4+100 OR "S2"."TIME_ID"=:B5 AND
"S2"."AMOUNT_SOLD"<:B6-100)))
5 - filter("S2"."TIME_ID"=:B1 AND "S2"."AMOUNT_SOLD">:B2+100 OR
"S2"."TIME_ID"=:B3 AND "S2"."AMOUNT_SOLD"<:B4-100)
8 - access("S2"."TIME_ID"=:B1)
9 - access("S2"."TIME_ID"=:B1)

The fact that the two subqueries in Listing 13-28 are combined using OR means that they cannot be unnested. However, the two subqueries can be coalesced. Notice that the execution plan for the untransformed query contains a FILTER operation with three children, implying the separate application of the two subquery filters. The FILTER operation in the transformed execution plan has just two children, implying one filter.

You will see that in the transformed execution plan we have unnecessarily combined the two identical bitmaps from lines 8 and 9. If you rewrite the query yourself this unnecessary step can be avoided.

Subquery coalescing is the last of the subquery-related transformations that I want to discuss. There are however, a few miscellaneous transformations still to cover.

Miscellaneous Transformations

In this chapter I have tried to place the transformations into logical groups, but there are a few that don’t conveniently fit into any category. Let us begin by looking at or expansion.

Or Expansion

Sometimes it makes sense to change a predicate concisely written with OR and process the bits separately, as Listing 13-29 demonstrates.

Listing 13-29. Or expansion

SELECT /*+ use_concat */
/* no_expand */
SUM (amount_sold) cnt
FROM sh.sales JOIN sh.products USING (prod_id)
WHERE time_id = DATE '1998-03-31' OR prod_name = 'Y Box';

-- Untransformed execution plan (NO_EXPAND)

-------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------
| 0 | SELECT STATEMENT | | 522 (2)|
| 1 | SORT AGGREGATE | | |
|* 2 | HASH JOIN | | 522 (2)|
| 3 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
| 4 | PARTITION RANGE ALL| | 517 (2)|
| 5 | TABLE ACCESS FULL | SALES | 517 (2)|
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
filter("SALES"."TIME_ID"=TO_DATE(' 1998-03-31 00:00:00',
'syyyy-mm-dd hh24:mi:ss') OR "PRODUCTS"."PROD_NAME"='Y Box')

-- Transformed query (approximate)

WITH q1
AS (SELECT amount_sold
FROM sh.sales JOIN sh.products USING (prod_id)
WHERE prod_name = 'Y Box'
UNION ALL
SELECT amount_sold
FROM sh.sales JOIN sh.products USING (prod_id)
WHERE time_id = DATE '1998-03-31' AND LNNVL (prod_name = 'Y Box'))
SELECT SUM (amount_sold) cnt
FROM q1;

-- Transformed execution plan (default)

-------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 472 (0)|
| 1 | SORT AGGREGATE | | |
| 2 | CONCATENATION | | |
| 3 | NESTED LOOPS | | |
| 4 | NESTED LOOPS | | 445 (0)|
|* 5 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
| 6 | PARTITION RANGE ALL | | |
| 7 | BITMAP CONVERSION TO ROWIDS | | |
|* 8 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | |
| 9 | TABLE ACCESS BY LOCAL INDEX ROWID | SALES | 445 (0)|
|* 10 | HASH JOIN | | 27 (0)|
|* 11 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
| 12 | PARTITION RANGE SINGLE | | 24 (0)|
| 13 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 24 (0)|
| 14 | BITMAP CONVERSION TO ROWIDS | | |
|* 15 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter("PRODUCTS"."PROD_NAME"='Y Box')
8 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
10 - access("SALES"."PROD_ID"="PRODUCTS"."PROD_ID")
11 - filter(LNNVL("PRODUCTS"."PROD_NAME"='Y Box'))
15 - access("SALES"."TIME_ID"=TO_DATE(' 1998-03-31 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))

Without the availability of the cost-based or-expansion transformation, the presence of the two independent predicates in Listing 13-29 leaves us with no choice but to use a full table scan on SH.SALES. By adding a UNION ALL set operator we can use two indexes to separately obtain the rows that match the individual predicates.

There is, however, a slight semantic difference between an OR expression and a UNION ALL set operator. It turns out that there was one sale of a Y Box on March 31, 1998. A UNION ALL set operator would normally list this row twice, and the OR condition would list it just once. To avoid such duplicates, rows matching both conditions are excluded from the second branch of the UNION ALL by the means of the LNNVL function. If you are not familiar with the function, LNNVL (PRODUCTS.PROD_NAME='Y Box')) means (PROD_NAME != 'Y Box' OR PROD_NAME IS NULL).

Here are a few key points about the or-expansion transformation:

· As mentioned earlier, the way to suppress or-expansion is with the NO_EXPAND hint. There is no NO_USE_CONCAT hint.

· When the UNION ALL operation is created by the or-expansion transformation it is shown as CONCATENATION. However, the UNION ALL and CONCATENATION operations function identically.

· When you interpret an execution plan that contains a CONCATENATION operation bear in mind that the children are typically listed in reverse order to that in which they appear in the original query.

· The documented variant of USE_CONCAT is an all or nothing approach. On the one hand, you can’t decide to expand just one of the OR conditions in a WHERE clause that has several. On the other hand, if you look at the outline section of your displayed execution plan you will see that USE_CONCAT has some undocumented parameters that would allow more control, as would the alternative OR_EXPAND hint. However, I haven’t worked out how these work. The CBO will, of its own accord, sometimes expand a subset of the ORconditions.

Materialized View Rewrite

The topic of materialized views is a large one, and it would be possible to write an entire chapter about them. In fact, the Data Warehousing Guide has two! At this point I just want to briefly demonstrate the principle of how a query written against some base tables can be transformed into a query against a materialized view. Listing 13-30 provides a simple example.

Listing 13-30. Materialized view rewrite

EXEC dbms_mview.refresh('SH.CAL_MONTH_SALES_MV');

ALTER SESSION SET query_rewrite_integrity=trusted;

SELECT /*+ rewrite(sh.cal_month_sales_mv) */
/* no_rewrite */
t.calendar_month_desc, SUM (s.amount_sold) AS dollars
FROM sh.sales s, sh.times t
WHERE s.time_id = t.time_id
GROUP BY t.calendar_month_desc;

-- Untransfomed execution plan (NO_REWRITE)

---------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------
| 0 | SELECT STATEMENT | | 367 (8)|
| 1 | HASH GROUP BY | | 367 (8)|
|* 2 | HASH JOIN | | 367 (8)|
| 3 | VIEW | VW_GBC_5 | 355 (8)|
| 4 | HASH GROUP BY | | 355 (8)|
| 5 | PARTITION RANGE ALL| | 334 (3)|
| 6 | TABLE ACCESS FULL | SALES | 334 (3)|
| 7 | TABLE ACCESS FULL | TIMES | 12 (0)|
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ITEM_1"="T"."TIME_ID")

-- Transformed query

SELECT *
FROM sh.cal_month_sales_mv mv;

-- Transformed execution plan (default)

------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 (0)|
| 1 | MAT_VIEW REWRITE ACCESS FULL| CAL_MONTH_SALES_MV | 3 (0)|
------------------------------------------------------------------------

Even though we refresh the SH.CAL_MONTH_SALES_MV materialized view before we start, we still have to set query_rewrite_integrity=trusted before the query can be rewritten. The problem is that the referential integrity constraint between SH.SALES and SH.TIMESisn’t validated. Although working out when and where a materialized view rewrite is legal and advisable is tricky, the basic concept is straightforward. In this example, the legwork of aggregating the monthly sales data has been done in advance and doesn’t need to be repeated.

There is one more materialized view transformation that I want to discuss. This one is quite obscure.

Grouping Sets to Union Expansion

The grouping-sets-to-union-expansion transformation was introduced in 9iR2, but there has been very little written about it. The transformation only applies to queries that involve grouping sets where the rolled-up data can be obtained from a materialized view. Listing 13-31 demonstrates.

Listing 13-31. Grouping sets to union expansion

ALTER SESSION SET query_rewrite_integrity=trusted;

SELECT /*+ expand_gset_to_union */
/* no_expand_gset_to_union */
DECODE (GROUPING (t.time_id), 1, 'Month total', t.time_id) AS time_id
,t.calendar_month_desc
,SUM (s.amount_sold) AS dollars
FROM sh.sales s, sh.times t
WHERE s.time_id = t.time_id
GROUP BY ROLLUP (t.time_id), t.calendar_month_desc
ORDER BY calendar_month_desc, time_id;

-- Untransformed execution plan (NOEXPAND_GSET_TO_UNION)

---------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6340 (1)|
| 1 | SORT ORDER BY | | 6340 (1)|
| 2 | SORT GROUP BY ROLLUP | | 6340 (1)|
|* 3 | HASH JOIN | | 537 (2)|
| 4 | PART JOIN FILTER CREATE | :BF0000 | |
| 5 | TABLE ACCESS FULL | TIMES | 18 (0)|
| 6 | PARTITION RANGE JOIN-FILTER| | 517 (2)|
| 7 | TABLE ACCESS FULL | SALES | 517 (2)|
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("S"."TIME_ID"="T"."TIME_ID")

-- Transformed query

WITH vw_gbc
AS ( SELECT s.time_id, SUM (s.amount_sold) AS dollars
FROM sh.sales s
GROUP BY s.time_id)
,gset_union
AS ( SELECT TO_CHAR (vw_gbc.time_id) AS time_id
,t.calendar_month_desc
,SUM (vw_gbc.dollars) AS dollars
FROM vw_gbc, sh.times t
WHERE vw_gbc.time_id = t.time_id
GROUP BY vw_gbc.time_id, t.calendar_month_desc
UNION ALL
SELECT 'Month Total' AS time_id, mv.calendar_month_desc, mv.dollars
FROM sh.cal_month_sales_mv mv)
SELECT *
FROM gset_union u
ORDER BY u.calendar_month_desc, u.time_id;

-- Transformed execution plan (default)

---------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 369 (8)|
| 1 | SORT ORDER BY | | 369 (8)|
| 2 | VIEW | | 369 (8)|
| 3 | UNION-ALL | | |
| 4 | HASH GROUP BY | | 367 (8)|
|* 5 | HASH JOIN | | 367 (8)|
| 6 | VIEW | VW_GBC_6 | 355 (8)|
| 7 | HASH GROUP BY | | 355 (8)|
| 8 | PARTITION RANGE ALL | | 334 (3)|
| 9 | TABLE ACCESS FULL | SALES | 334 (3)|
| 10 | TABLE ACCESS FULL | TIMES | 12 (0)|
| 11 | MAT_VIEW REWRITE ACCESS FULL| CAL_MONTH_SALES_MV | 2 (0)|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("ITEM_1"="T"."TIME_ID")

The query in Listing 13-31 provides the total value of sales per day with a rolled-up total by month. The untransformed query includes some variant of bloom-filter pruning despite being a serial query, but let us not get distracted.

The transformed query performs a simple GROUP BY with no ROLLUP and then adds rows for the monthly totals from the materialized view. What is interesting is that apart from replacing the SORT GROUP BY ROLLUP operation with a more straightforward HASH GROUP BYoperation, the transformation has made group-by-placement possible! Now that the new group-by-placement transformation has been created, the old expand-grouping-sets-to-union transformation might be revisited so as to operate even in the absence of any materialized view. But for now this transformation seems only to operate when a materialized view is available.

Order by Elimination

It is perfectly legal and quite common for ORDER BY expressions to appear in subqueries, and you are entitled to rely on the ordering of any results from such subqueries. For example, in Listing 13-24 I used a ROWNUM=1 predicate on the results of a subquery that included an ORDER BYclause.

Sometimes, however, you may not rely on the ORDER BY clause of a subquery because the result queries are input to another join, aggregation, or set operation. In such cases the CBO feels entitled to remove your apparently superfluous ORDER BY clause.

The question is this: why would you add an ORDER BY clause and not use it? Almost always, superfluous ORDER BY clauses appear in data dictionary views, and it is these cases that the order-by-elimination transformation is intended to address. On the other hand, it is possible to demonstrate the order-by-elimination transformation with inline views, as Listing 13-32 demonstrates.

Listing 13-32. Order-by-elimination

SELECT COUNT (*)
FROM ( SELECT /*+ eliminate_oby */
/* no_eliminate_oby */
o1.*
FROM oe.order_items o1
WHERE product_id = (SELECT MAX (o2.product_id)
FROM oe.order_items o2
WHERE o2.order_id = o1.order_id)
ORDER BY order_id) v;

-- Untransformed execution plan (NO_ELIMINATE_OBY)

------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 (0)|
| 1 | SORT AGGREGATE | | |
| 2 | VIEW | | 5 (0)|
| 3 | SORT ORDER BY | | 5 (0)|
|* 4 | HASH JOIN | | 5 (0)|
| 5 | VIEW | VW_SQ_1 | 2 (0)|
| 6 | HASH GROUP BY | | 2 (0)|
| 7 | INDEX FAST FULL SCAN| ORDER_ITEMS_UK | 2 (0)|
| 8 | TABLE ACCESS FULL | ORDER_ITEMS | 3 (0)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("PRODUCT_ID"="MAX(O2.PRODUCT_ID)" AND
"ITEM_1"="O1"."ORDER_ID")

-- Transformed query

SELECT COUNT (*)
FROM (SELECT o1.*
FROM oe.order_items o1
WHERE product_id = (SELECT MAX (o2.product_id)
FROM oe.order_items o2
WHERE o2.order_id = o1.order_id)) v;

-- Transformed execution plan (default)

----------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 (0)|
| 1 | SORT AGGREGATE | | |
| 2 | NESTED LOOPS | | 2 (0)|
| 3 | VIEW | VW_SQ_1 | 2 (0)|
| 4 | HASH GROUP BY | | 2 (0)|
| 5 | INDEX FAST FULL SCAN| ORDER_ITEMS_UK | 2 (0)|
|* 6 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | 0 (0)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("ITEM_1"="O1"."ORDER_ID" AND
"PRODUCT_ID"="MAX(O2.PRODUCT_ID)")

The query in Listing 13-32 contains an unused ORDER BY clause buried within an inline view. The untransformed query includes a SORT ORDER BY operation, which sorts, and a SORT AGGREGATE operation, which doesn’t. The transformed query has removed the redundant ORDER BY clause, and the resulting execution plan includes only the non-sorting SORT AGGREGATE operation that evaluates COUNT (*).

You may wonder why I have suddenly abandoned my beloved factored subqueries in Listing 13-32. The reason is that order-by-elimination does not occur with factored subqueries! I actually think this is a sensible thing. If you put an ORDER BY clause in a factored subquery you did so for a reason. It is really only data dictionary views that need the order-by-elimination transformation.

Table Expansion

Table expansion is a transformation introduced in 11gR2 that is specific to partitioned tables. Prior to the introduction of the table-expansion transformation the unusability of a local index on one partition might preclude the use of other usable indexes on other partitions.

To demonstrate this transformation I have to change the usability of index partitions, and so I will create a new table rather than change the state of example schema data. Listing 13-33 creates a partitioned table with an unusable local index and then rebuilds just one partition.

Listing 13-33. Creation of a table for table expansion demonstration

CREATE TABLE t
PARTITION BY RANGE
(d)
(
PARTITION
t_q1_2013 VALUES LESS THAN (TO_DATE ('2013-04-01', 'yyyy-mm-dd'))
,PARTITION
t_q2_2013 VALUES LESS THAN (TO_DATE ('2013-07-01', 'yyyy-mm-dd'))
,PARTITION
t_q3_2013 VALUES LESS THAN (TO_DATE ('2013-10-01', 'yyyy-mm-dd'))
,PARTITION
t_q4_2013 VALUES LESS THAN (TO_DATE ('2014-01-01', 'yyyy-mm-dd')))
PCTFREE 99
PCTUSED 1
AS
SELECT DATE '2013-01-01' + ROWNUM - 1 d, ROWNUM n
FROM DUAL
CONNECT BY LEVEL <= 365;

CREATE INDEX i
ON t (n)
LOCAL
UNUSABLE;

ALTER INDEX i
REBUILD PARTITION t_q3_2013;

Now that we have our partitioned table available, Listing 13-34 can show how the table expansion transformation works.

Listing 13-34. Table expansion transformation

SELECT /*+ expand_table(t) */
/* no_expand_table(t) */
COUNT (*)
FROM t
WHERE n = 8;

-- Untransformed execution plan (NO_EXPAND_TABLE)

--------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
--------------------------------------------------
| 0 | SELECT STATEMENT | | 45 (0)|
| 1 | SORT AGGREGATE | | |
| 2 | PARTITION RANGE ALL| | 45 (0)|
|* 3 | TABLE ACCESS FULL | T | 45 (0)|
--------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("N"=8)

-- Transformed query

WITH vw_te
AS (SELECT *
FROM t PARTITION (t_q3_2013)
WHERE n = 8
UNION ALL
SELECT *
FROM t
WHERE n = 8
AND ( d < TO_DATE ('2013-07-01', 'yyyy-mm-dd')
OR d >=TO_DATE ('2013-10-01', 'yyyy-mm-dd')))
SELECT COUNT (*)
FROM vw_te;

-- Transformed execution plan (default)

----------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 34 (0)|
| 1 | SORT AGGREGATE | | |
| 2 | VIEW | VW_TE_1 | 34 (0)|
| 3 | UNION-ALL | | |
| 4 | PARTITION RANGE SINGLE| | 1 (0)|
|* 5 | INDEX RANGE SCAN | I | 1 (0)|
| 6 | PARTITION RANGE OR | | 33 (0)|
|* 7 | TABLE ACCESS FULL | T | 33 (0)|
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("N"=8)
7 - filter("N"=8 AND ("T"."D"<TO_DATE(' 2013-07-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss') OR "T"."D">=TO_DATE(' 2013-10-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "T"."D"<TO_DATE(' 2014-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')))

Without the table-expansion transformation the query in Listing 13-34 would be unable to take advantage of the local index because some of its partitions are unusable. The application of the table-expansion transformation splits the query into two branches of a UNION-ALL operation. The first branch is specific to the partition that has a usable local index, and the second branch is specific to the partitions that have no choice but to use a full table scan.

This transformation is likely to be of benefit to many data warehousing applications that regularly rebuild partitions of local bitmap indexes.

The table expansion transformation is the last of our miscellaneous transformations, but I have saved the best for last. The famous star transformation is quite literally in a class of its own. Let us approach the conclusion of this chapter with an explanation of what this most famous of optimizer transformations does and how it can help us.

Star Transformation

The star transformation is the most difficult to grasp of the optimizer transformations, so we will walk through it carefully. We will begin by considering the problem the CBO faces and then move on to looking at how it may be addressed.

The Distributed Join Filter Problem

Let us begin by looking at Listing 13-35 and considering what we would do if we were the CBO.

Listing 13-35. Candidate query for a star transformation

SELECT prod_id
,cust_id
,p.prod_name
,c.cust_first_name
,s.time_id
,s.amount_sold
FROM sh.sales s
JOIN sh.customers c USING (cust_id)
JOIN sh.products p USING (prod_id)
WHERE c.cust_last_name = 'Everett' AND p.prod_category = 'Electronics';

Let us look at some statistics:

· There are 918,843 rows in the SH.SALES table.

· 116,267 of these 918,843 rows match the product filter.

· 740 of the 918,843 rows match the customer filter.

· 115 rows match both customer and product filters.

So, what join order do we pick?

· If we begin by joining the SH.SALES table with either SH.PRODUCTS or SH.CUSTOMERS we end up reading far too many rows from SH.SALES; over 80% of the rows read from SH.SALES will subsequently be discarded.

· If we join SH.PRODUCTS and SH.CUSTOMERS to start with we end up with 5,760 rows, making even a multi-column index expensive.

This is an example of a distributed filter where two or more filtering conditions combine in a way that is far more selective than any individual filter is.

There is a strong similarity between this problem and the problem addressed by the INDEX_COMBINE and AND_EQUAL hints that we discussed in Chapter 10; we want to identify the rows matching both our filters and then just read the rows that match both criteria. The difference here is that our filtering conditions are derived from other tables.

So that is our problem statement. Let us now turn our attention to how we might address it.

Solving the Distributed Join Filter Problem

Listing 13-36 is an attempt at rewriting Listing 13-35.

Listing 13-36. First attempt at solving distributed join filters

ALTER SESSION SET star_transformation_enabled=temp_disable;

SELECT /* no_table_lookup_by_nl(s) */
s.time_id, s.amount_sold
FROM sh.sales s
WHERE s.cust_id IN (SELECT c.cust_id
FROM sh.customers c
WHERE cust_last_name = 'Everett')
AND s.prod_id IN (SELECT p.prod_id
FROM sh.products p
WHERE prod_category = 'Electronics');

----------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 923 (1)|
| 1 | VIEW | VW_ST_8230C7ED | 923 (1)|
| 2 | NESTED LOOPS | | 498 (1)|
| 3 | PARTITION RANGE ALL | | 440 (1)|
| 4 | BITMAP CONVERSION TO ROWIDS| | 440 (1)|
| 5 | BITMAP AND | | |
| 6 | BITMAP MERGE | | |
| 7 | BITMAP KEY ITERATION | | |
| 8 | BUFFER SORT | | |
|* 9 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
|* 10 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | |
| 11 | BITMAP MERGE | | |
| 12 | BITMAP KEY ITERATION | | |
| 13 | BUFFER SORT | | |
|* 14 | VIEW | index$_join$_051 | 2 (0)|
|* 15 | HASH JOIN | | |
|* 16 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 17 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
|* 18 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX | |
| 19 | TABLE ACCESS BY USER ROWID | SALES | 483 (1)|
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - filter("CUST_LAST_NAME"='Everett')
10 - access("S"."CUST_ID"="C"."CUST_ID")
14 - filter("PROD_CATEGORY"='Electronics')
15 - access(ROWID=ROWID)
16 - access("PROD_CATEGORY"='Electronics')
18 - access("S"."PROD_ID"="P"."PROD_ID")

Note that because CUST_ID is unique in SH.CUSTOMERS and PROD_ID is unique in SH.PRODUCTS the number of rows returned by Listing 13-35 and Listing 13-36 should be the same; if there were more than one row in SH.CUSTOMERS for the same CUST_ID then Listing 13-36would return fewer rows than Listing 13-35 did.

This rewritten query can’t use index combining because, as I mentioned in Chapter 10, the technique doesn’t support IN lists. The execution plan in Listing 13-36 shows the star transformation, and this alternative approach gives an efficient plan.

The execution plan shows that an index join on the SH.PRODUCTS table has been used to obtain the list of values for PROD_ID and that these values have been used to make several lookups in the SALES_PROD_BIX index of the SH.SALES table in order to obtain multiple bitmaps. These bitmaps are obtained using the BITMAP KEY ITERATION operation on line 12. The use of BITMAP KEY ITERATION is a signature of the star transformation: every time you have a BITMAP KEY ITERATION you have a star transformation and vice versa.4 As these bitmaps are obtained, they are merged on line 11 to produce a single bitmap that identifies all the rows in SH.SALES from products in the Electronics category.

A similar process applies to the values of CUST_ID using the SALES_CUST_BIX index. The bitmap on line 6 identifies all sales to the 14 Abner Everetts. The bitmaps generated on lines 6 and 11 are fed into the BITMAP AND operation on line 5 to generate a single bitmap that identifies all the rows that match both predicates. Now that we have the final list of ROWIDs we can access the SH.SALES table to obtain just the rows we need.

image Note The SH.SALES table is accessed by a nested loops join and a TABLE ACCESS BY USER ROWID operation. The approach is actually the result of another transformation introduced in 11gR2 that can be disabled by adding a plus sign to the comment in bold. If the table-lookup-by-nested-loops transformation is disabled, the BITMAP CONVERSION TO ROWIDS operation on line 4 becomes a child of the TABLE ACCESS BY LOCAL INDEX ROWID BATCHED operation and the nested loops join disappears. I have no idea what benefit the table-lookup-by-nested-loops transformation provides.

Be aware that for the star transformation to be applied you must have enterprise edition, and for unhinted queries the initialization parameter STAR_TRANSFORMATION_ENABLED must be set to TRUE or TEMP_DISABLE (I will explain the difference shortly).

But the query in Listing 13-36 is not the same as the query in Listing 13-35: we have lost the ability to select dimension columns. We can address this by adding the dimension tables back into our list of tables, as shown by Listing 13-37.

Listing 13-37. Adding dimension columns with a star transformation

ALTER SESSION SET star_transformation_enabled=temp_disable;

SELECT prod_id
,cust_id
,prod_name
,cust_first_name
,time_id
,amount_sold
FROM sh.sales s
JOIN sh.products USING (prod_id)
JOIN sh.customers USING (cust_id)
WHERE cust_id IN (SELECT c.cust_id
FROM sh.customers c
WHERE cust_last_name = 'Everett')
AND prod_id IN (SELECT p.prod_id
FROM sh.products p
WHERE prod_category = 'Electronics');

----------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1348 (1)|
|* 1 | HASH JOIN | | 1348 (1)|
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 3 (0)|
|* 3 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
|* 4 | HASH JOIN | | 1345 (1)|
|* 5 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
| 6 | VIEW | VW_ST_C72FA945 | 923 (1)|
| 7 | NESTED LOOPS | | 498 (1)|
| 8 | PARTITION RANGE ALL | | 440 (1)|
| 9 | BITMAP CONVERSION TO ROWIDS | | 440 (1)|
| 10 | BITMAP AND | | |
| 11 | BITMAP MERGE | | |
| 12 | BITMAP KEY ITERATION | | |
| 13 | BUFFER SORT | | |
|* 14 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
|* 15 | BITMAP INDEX RANGE SCAN | SALES_CUST_BIX | |
| 16 | BITMAP MERGE | | |
| 17 | BITMAP KEY ITERATION | | |
| 18 | BUFFER SORT | | |
|* 19 | VIEW | index$_join$_055 | 2 (0)|
|* 20 | HASH JOIN | | |
|* 21 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 22 | INDEX FAST FULL SCAN | PRODUCTS_PK | 1 (0)|
|* 23 | BITMAP INDEX RANGE SCAN | SALES_PROD_BIX | |
| 24 | TABLE ACCESS BY USER ROWID | SALES | 483 (1)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("ITEM_1"="P"."PROD_ID")
3 - access("PROD_CATEGORY"='Electronics')
4 - access("ITEM_2"="C"."CUST_ID")
5 - filter("CUST_LAST_NAME"='Everett')
14 - filter("CUST_LAST_NAME"='Everett')
15 - access("S"."CUST_ID"="C"."CUST_ID")
19 - filter("PROD_CATEGORY"='Electronics')
20 - access(ROWID=ROWID)
21 - access("PROD_CATEGORY"='Electronics')
23 - access("S"."PROD_ID"="P"."PROD_ID")

The execution plan shown in Listing 13-37 results when STAR_TRANSFORMATION_ENABLED is set to TEMP_DISABLE.

You will notice that lines 6 to 24 in Listing 13-37 are identical to lines 1 to 19 in Listing 13-36, but this time we access the SH.PRODUCTS and SH.CUSTOMERS tables a second time each to get the dimension columns. Although we have accessed the dimension tables twice each, we have avoided most of the expensive table fetches from the SH.SALES table.

Although this query is starting to look a little difficult to read, it still doesn’t look optimal: why do we have to access the dimension tables twice? Couldn’t we cache the columns for our select list when we read the SH.PRODUCTS and SH.CUSTOMERS tables the first time?

Suppose we ran the script shown in Listing 13-38.

Listing 13-38. A star transformation with manual creation of temporary tables

CREATE GLOBAL TEMPORARY TABLE cust_cache ON COMMIT PRESERVE ROWS
AS
SELECT cust_id, cust_first_name
FROM sh.customers c
WHERE cust_last_name = 'Everett';

CREATE GLOBAL TEMPORARY TABLE prod_cache ON COMMIT PRESERVE ROWS
AS
SELECT prod_id, prod_name
FROM sh.products p
WHERE prod_category = 'Electronics';

SELECT s.prod_id
,s.cust_id
,p.prod_name
,c.cust_first_name
,s.time_id
,s.amount_sold
FROM sh.sales s
JOIN cust_cache c ON s.cust_id = c.cust_id
JOIN prod_cache p ON s.prod_id = p.prod_id
WHERE s.prod_id IN (SELECT prod_id FROM prod_cache)
AND s.cust_id IN (SELECT cust_id FROM cust_cache);

------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 358 (0)|
|* 1 | HASH JOIN | | 358 (0)|
| 2 | TABLE ACCESS FULL | CUST_CACHE | 2 (0)|
|* 3 | HASH JOIN | | 356 (0)|
| 4 | TABLE ACCESS FULL | PROD_CACHE | 2 (0)|
| 5 | VIEW | VW_ST_456F1C80 | 354 (0)|
| 6 | NESTED LOOPS | | 350 (0)|
| 7 | PARTITION RANGE ALL | | 58 (0)|
| 8 | BITMAP CONVERSION TO ROWIDS| | 58 (0)|
| 9 | BITMAP AND | | |
| 10 | BITMAP MERGE | | |
| 11 | BITMAP KEY ITERATION | | |
| 12 | BUFFER SORT | | |
| 13 | TABLE ACCESS FULL | PROD_CACHE | 2 (0)|
|* 14 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX | |
| 15 | BITMAP MERGE | | |
| 16 | BITMAP KEY ITERATION | | |
| 17 | BUFFER SORT | | |
| 18 | TABLE ACCESS FULL | CUST_CACHE | 2 (0)|
|* 19 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | |
| 20 | TABLE ACCESS BY USER ROWID | SALES | 296 (0)|
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("ITEM_1"="C"."CUST_ID")
3 - access("ITEM_2"="P"."PROD_ID")
14 - access("S"."PROD_ID"="P"."PROD_ID")
19 - access("S"."CUST_ID"="C"."CUST_ID")

You may wonder why I created real temporary tables rather than use factored subqueries. Well, the logic flow I am using is, in fact, quite different than the logic that the CBO is really using, and as of 12cR1 it doesn’t seem the CBO can cope with such factored subqueries; the CBO even gets confused by the USING clause of an ANSI join in this example.

Note the BUFFER SORT operations on lines 12 and 17. It seems that the BITMAP KEY ITERATION operation requires the inputs be sorted. This happens automatically when the input is an index (as has been the case so far); we could have avoided the sort by making theCUST_CACHE and PROD_CACHE tables index organized, but I am drifting off the point.

The good news is that in real life you don’t have to create the temporary tables or do anything else fancy. If you set STAR_TRANSFORMATION_ENABLED to TRUE (as opposed to TEMP_DISABLE), the CBO can sort this all out for you. Listing 13-39 shows the execution plan forListing 13-35 when STAR_TRANSFORMATION_ENABLED is set to TRUE.

Listing 13-39. Final star transformation with star_transformation_enabled=true

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 509 (1)|
| 1 | TEMP TABLE TRANSFORMATION | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6747_4E5E5D | |
|* 3 | TABLE ACCESS FULL | CUSTOMERS | 423 (1)|
|* 4 | HASH JOIN | | 87 (2)|
| 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6747_4E5E5D | 2 (0)|
|* 6 | HASH JOIN | | 85 (2)|
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 3 (0)|
|* 8 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 9 | VIEW | VW_ST_B49D23E2 | 82 (2)|
| 10 | NESTED LOOPS | | 78 (2)|
| 11 | PARTITION RANGE ALL | | 19 (0)|
| 12 | BITMAP CONVERSION TO ROWIDS | | 19 (0)|
| 13 | BITMAP AND | | |
| 14 | BITMAP MERGE | | |
| 15 | BITMAP KEY ITERATION | | |
| 16 | BUFFER SORT | | |
| 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6747_4E5E5D | 2 (0)|
|* 18 | BITMAP INDEX RANGE SCAN | SALES_CUST_BIX | |
| 19 | BITMAP MERGE | | |
| 20 | BITMAP KEY ITERATION | | |
| 21 | BUFFER SORT | | |
|* 22 | VIEW | index$_join$_052 | 2 (0)|
|* 23 | HASH JOIN | | |
|* 24 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 (0)|
| 25 | INDEX FAST FULL SCAN | PRODUCTS_PK | 1 (0)|
|* 26 | BITMAP INDEX RANGE SCAN | SALES_PROD_BIX | |
| 27 | TABLE ACCESS BY USER ROWID | SALES | 62 (0)|
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("C"."CUST_LAST_NAME"='Everett')
4 - access("ITEM_1"="C0")
6 - access("ITEM_2"="P"."PROD_ID")
8 - access("P"."PROD_CATEGORY"='Electronics')
18 - access("S"."CUST_ID"="C0")
22 - filter("P"."PROD_CATEGORY"='Electronics')
23 - access(ROWID=ROWID)
24 - access("P"."PROD_CATEGORY"='Electronics')
26 - access("S"."PROD_ID"="P"."PROD_ID")

Listing 13-39 shows that even though the CBO can’t deal with factored subqueries that you write, it can generate them itself and use them just fine!

Let us go through this step by step. At lines 2 and 3 we scan the customers table, find the matching rows, and store the CUST_FIRST_NAME and CUST_ID columns in a temporary table. Lines 9 through 27 are very similar to the other plans we have seen with the exception of the fact that at line 17 we have used the temporary table to retrieve the CUST_ID. Similarly, we have used the same temporary table at line 5 to pick up the CUST_FIRST_NAME for the select list.

Notice that no temporary table was generated for SH.PRODUCTS; the CBO reckoned that it was better to use an index join for the PROD_ID (lines 22 to 25) and then repeat the index lookup to get PROD_NAME (at lines 7 and 8). This was probably a very close call in terms of cost.

There are many weird things about star transformations. Various restrictions are documented in the Data Warehousing Guide, such as the fact that they don’t work if bind variables are in use. However, one documented restriction is untrue. It turns out that star transformations can be used when there are no bitmap indexes: Oracle can convert a b-tree index to a bitmap index and use that.

The star transformation is the last of the optimizer transformations that I want to discuss in this chapter, but before wrapping up I want to give you a quick peek at what may be coming very soon.

In the Future

This chapter is by far the longest in the book and has covered over two dozen optimizer transformations. However, you can expect more to come. In 2009 the VLDB endowment published an academic paper authored by the CBO development team entitled “Enhanced Subquery Optimization in Oracle,” which showed what might be coming down the road in the future. Take a look at the two queries and associated execution plans in Listing 13-40.

Listing 13-40. The future of subquery coalescing

-- Original query

SELECT prod_id
,p1.prod_name
,p1.prod_desc
,p1.prod_category
,p1.prod_list_price
FROM sh.products p1
WHERE EXISTS
(SELECT 1
FROM sh.products p2
WHERE p1.prod_category = p2.prod_category
AND p1.prod_id <> p2.prod_id)
AND NOT EXISTS
(SELECT 1
FROM sh.products p3
WHERE p1.prod_category = p3.prod_category
AND p1.prod_id <> p3.prod_id
AND p3.prod_list_price > p1.prod_list_price);

---------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 (0)|
|* 1 | HASH JOIN RIGHT SEMI | | 8 (0)|
| 2 | VIEW | index$_join$_002 | 2 (0)|
|* 3 | HASH JOIN | | |
| 4 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 (0)|
| 5 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 (0)|
|* 6 | HASH JOIN ANTI | | 6 (0)|
| 7 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
| 8 | TABLE ACCESS FULL | PRODUCTS | 3 (0)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("P1"."PROD_CATEGORY"="P2"."PROD_CATEGORY")
filter("P1"."PROD_ID"<>"P2"."PROD_ID")
3 - access(ROWID=ROWID)
6 - access("P1"."PROD_CATEGORY"="P3"."PROD_CATEGORY")
filter("P3"."PROD_LIST_PRICE">"P1"."PROD_LIST_PRICE" AND
"P1"."PROD_ID"<>"P3"."PROD_ID")

-- A transformation of the future

SELECT p1.prod_id
,p1.prod_name
,p1.prod_desc
,prod_category
,p1.prod_list_price
FROM sh.products p1 JOIN sh.products p2 USING (prod_category)
WHERE p1.prod_id <> p2.prod_id
GROUP BY p1.prod_id
,p1.prod_name
,p1.prod_desc
,prod_category
,p1.prod_list_price
HAVING SUM (
CASE
WHEN p2.prod_list_price > p1.prod_list_price THEN 1
ELSE 0
END) = 0;

------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 (0)|
|* 1 | FILTER | | |
| 2 | HASH GROUP BY | | 6 (0)|
|* 3 | HASH JOIN | | 6 (0)|
| 4 | TABLE ACCESS FULL| PRODUCTS | 3 (0)|
| 5 | TABLE ACCESS FULL| PRODUCTS | 3 (0)|
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(SUM(CASE WHEN "P2"."PROD_LIST_PRICE">"P1"."PROD_LIST_PRIC
E" THEN 1 ELSE 0 END )=0)
3 - access("P1"."PROD_CATEGORY"="P2"."PROD_CATEGORY")
filter("P1"."PROD_ID"<>"P2"."PROD_ID"

Both versions of the query in Listing 13-40 identify products that have the highest list price in their category but are not in a category of their own. The latter query in Listing 13-40 has both coalesced the two subqueries and also unnested them. I must say that I had to stare at the published queries for several minutes5 to convince myself that they were semantically equivalent6 and that real benefits are obtained by such a transformation.

Summary

The CBO has a wide range of optimizer transformations at its disposal. This chapter has covered over two dozen of these transformations, but in the unlikely event that the list is comprehensive at the time of writing it probably won’t be by the time the book goes to press. Nevertheless, learning how to recognize, and if necessary influence, the choice of optimizer transformations is crucial to tuning complex queries.

An optimizer transformation may be part of the cost-based framework and may be selected on the basis of calculated benefits in cost. However, several of the optimizer transformations are heuristic, meaning that they are applied or not based on some fixed rules. This means that a transformation can be applied that results in an increase in estimated cost. The increased estimated cost may, of course, be irrelevant if the actual elapsed time of the query is reduced.

One thing that should have become clear throughout this chapter is that for a human being to identify the right way to construct an SQL statement so that an optimal execution plan can be derived by final state optimization alone sometimes requires almost superhuman SQL prowess. Just bear in mind that although we should all be very happy that all these optimizer transformations exist, we can’t rest on our laurels and pray that the CBO will transform any bad code we write into good code; really bad code will still confuse the CBO.

Nevertheless, now that we understand the power and flexibility of both the final state optimization process that we covered in Chapter 12 and the breadth and sophistication of the transformations available to the CBO, you may be wondering why there are so many SQL tuning specialists out there. The truth is that in the real world things go wrong—all the time. Let us look at that next in Chapter 14.

__________________

1I have reason to believe that there may actually be a bug in 12.1.0.1.

2Restrictions on complex view merging preclude reversing the transformation in Listing 13-17.

3Occasionally an INDEX FAST FULL SCAN will not be materialized unless hinted even when referenced multiple times.

4Actually Jonathan Lewis has shown me a counterexample that uses the SEMIJOIN_DRIVER hint but the CBO will never generate such a plan without such nefarious hinting.

5The academic paper didn’t use the SH example schema but I have produced an equivalent example.

6They are equivalent after the correction of what appears to be a typographical error.