Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

PART 3. The Cost-Based Optimizer

CHAPTER 12. Final State Optimization

When I introduced the CBO in Chapter 2 I explained that the CBO followed two processes to come up with an execution plan for the runtime engine: query transformation and final state optimization. In this chapter I will cover final state optimization, and transformations will be the focus ofChapter 13. Put another way, this chapter looks at how the CBO evaluates different candidate execution plans after any query transformations have been performed.

In Chapters 10 and 11 I covered the mechanics of accessing and joining the row sources specified in a query. In this chapter I will provide a high-level overview of the approach used by the CBO to determine the optimal join order, join method, and access method. I will also look at inlist iteration. Strictly speaking, parallelization is a part of final state transformation, but in chapters 2 and 8 I covered parallelization as a separate topic, and I don’t need to revisit it here.

When reading this chapter please bear in mind that none of the topics covered are independent of each other. You cannot always know the best index to access a table, if any, until you know the join order. But the best join order may depend on the access methods for the joined tables. This effectively means that the CBO has to make some evaluations multiple times. Indeed, the whole final state optimization process covered in this chapter needs to be repeated for each candidate query transformation!

Fortunately, as I said in Chapter 2, we don’t need to know the finer details of how the CBO sorts all these dependencies out or any shortcuts it uses to get things done in a reasonable amount of time. We just need enough of an idea of how the CBO operates so that when things go wrong we can work out the reason why and figure out what to do about it.

The goal of final state optimization is to come up with the execution plan with the least cost; in other words, the execution plan that the runtime engine is likely to be able execute in the least amount of time. The main ways to achieve this are:

· Avoid retrieving rows and later throwing them away

· Avoid accessing blocks and then throwing away all the rows in the block

· Avoid reading the same block multiple times

· Read multiple consecutive blocks with multi-block reads rather than single-block reads

This theme of avoiding waste is key to this chapter. Let us begin with a look at how the CBO selects an optimal join order.

Join Order

The vagaries of determining the optimal join order for a SQL query are vast and one could write an entire book about them. In fact, in 2003 Dan Tow did exactly that! His book, SQL Tuning, published by O’Reilly, is almost entirely dedicated to a description of a systematic approach to determining the optimal join order for a query. That book was published before Oracle starting supporting either hash join input swapping or the star transformation, which we will come onto in Chapter 13. Today the subject of join order optimization is even more complex than it was in 2003.

Fortunately, these days the problems of join ordering are way down on the list of issues the Oracle SQL performance specialist needs to worry about. This is because the CBO does a pretty good job of determining join order itself if (and it is a big if) it can make accurate cardinality estimates for each object access and each join. The reliability of the join order algorithm used by the CBO is the key behind Wolfgang Breitling’s famous “Tuning by Cardinality Feedback” paper that I mentioned in a different context in Chapter 6. Wolfgang’s tuning approach is to correct the cardinality estimates that the CBO makes and then cross your fingers and hope that the CBO does something sensible. And these days the CBO generally will do something sensible!

So what makes one join order better than another? Have a look at Listing 12-1, which provides a very simple example.

Listing 12-1. Simple join order optimization

SELECT /*+ gather_plan_statistics */
COUNT (*)
FROM oe.order_items i JOIN oe.product_descriptions p USING (product_id)
WHERE i.order_id = 2367 AND p.language_id = 'US';

SET LINES 200 PAGES 0

SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (format => 'BASIC ROWS IOSTATS LAST'));

-------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
| 2 | NESTED LOOPS | | 1 | 13 | 8 |
|* 3 | INDEX RANGE SCAN | ORDER_ITEMS_UK | 1 | 8 | 8 |
|* 4 | INDEX UNIQUE SCAN| PRD_DESC_PK | 8 | 2 | 8 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("I"."ORDER_ID"=2367)
4 - access("I"."PRODUCT_ID"="P"."PRODUCT_ID" AND "P"."LANGUAGE_ID"='US')

Listing 12-1 joins the ORDER_ITEMS and PRODUCT_DESCRIPTIONS tables from the example OE schema. The query has selection predicates on both tables as well as an equijoin predicate on PRODUCT_ID. The only hint in the query is GATHER_PLAN_STATISTICS, which enables runtime statistics to be gathered.

According to the execution plan from DBMS_XPLAN.DISPLAY_CURSOR, the CBO decided on this join order: ORDER_ITEMS PRODUCT_DESCRIPTIONS. The CBO estimated that there would be eight rows matching from ORDER_ITEMS, and since the E-ROWS and A-ROWScolumns from operation 3 match, we know that that was a perfectly accurate guess. Despite the fact that operation 4 is an INDEX UNIQUE SCAN on an index of the PRODUCT_DESCRIPTIONS table, the CBO assumes that more than one row might be returned; 13 rows are estimated to be input to the SORT AGGREGATE operation on line 1 rather than the eight rows that are returned in practice. This doesn’t matter. The CBO has selected the right join order. Listing 12-2 lets us see what happens if we try to override the selected join order.

Listing 12-2. Inappropriate join order

SELECT /*+ gather_plan_statistics leading(p)*/
COUNT (*)
FROM oe.order_items i JOIN oe.product_descriptions p USING (product_id)
WHERE i.order_id = 2367 AND p.language_id = 'US';

SET LINES 200 PAGES 0

SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (format => 'BASIC ROWS IOSTATS LAST'));

----------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
| 2 | NESTED LOOPS | | 1 | 13 | 8 |
|* 3 | INDEX FAST FULL SCAN| PRD_DESC_PK | 1 | 288 | 288 |
|* 4 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | 288 | 1 | 8 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("P"."LANGUAGE_ID"='US')
4 - access("I"."ORDER_ID"=2367 AND "I"."PRODUCT_ID"="P"."PRODUCT_ID")

The LEADING hint in Listing 12-2 lets us see how the join order PRODUCT_DESCRIPTIONS ORDER_ITEMS works out. It turns out that 288 rows match the predicate on LANGUAGE_ID, once again perfectly forecasted by the CBO. However, 280 of these 288 product descriptions end up being discarded by the NESTED LOOPS operation on line 2. What a waste. Now we know why PRODUCT_DESCRIPTIONS ORDER_ITEMS is a bad join order.

Tip If you expect the CBO to pick plan A and it picks plan B for reasons that you don’t immediately understand, try adding hints to force plan B. Then look at the cardinalities and cost estimates for plan B. You will probably see the rationale. You should then be able to see where the “misunderstanding” is. Did you miss something, or did the CBO?

I recently worked on a query that returned about 1,000,000 rows from the first row source and 400,000,000 rows after joining with the second row source; the cardinality dropped back down to 1,000,000 rows after the second join with the third row source. If ever you see this kind of effect (cardinalities rising dramatically and then falling) you should strongly suspect a suboptimal join order.

Despite not being current, there are still things to be learned from Tow’s book. Let me tell you the two biggest things I learned from the book.

· Get the first row source correct. In most cases that is enough to get a decent execution plan.

· The best choice for the leading row source is usually the one with the strongest selectivity. This is not necessarily the row source with the lowest number of rows.

Join Method

Chapter 11 explained the benefits and drawbacks of each of the four join mechanisms, so I won’t repeat them here. However, in case I have given the impression so far in this book that a human expert can always outsmart the CBO, take a look at Listing 12-3.

Listing 12-3. A non-intuitive join method

SELECT /*+ gather_plan_statistics */ *
FROM hr.employees e JOIN hr.departments d USING (department_id)
WHERE e.job_id = 'SH_CLERK';

SET LINES 200 PAGES 0

SELECT *
FROM TABLE (
DBMS_XPLAN.display_cursor (format => 'BASIC ROWS IOSTATS LAST'));
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Bufs |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 20 | 5 |
| 1 | MERGE JOIN | | 1 | 20 | 20 | 5 |
| 2 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 1 | 27 | 6 | 2 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 1 | 27 | 6 | 1 |
|* 4 | SORT JOIN | | 6 | 20 | 20 | 3 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 1 | 20 | 20 | 3 |
|* 6 | INDEX RANGE SCAN | EMP_JOB_IX | 1 | 20 | 20 | 1 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
filter("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
6 - access("E"."JOB_ID"='SH_CLERK'

This example identifies all of the employees with a JOB_ID of SH_CLERK and lists them with their associated department information. I must confess I would never have come up with the execution plan that the CBO came up with in a month of Sundays. However, on close inspection it looks pretty good!

Why use a MERGE JOIN? The tables are quite small, and both tables have an index on DEPARTMENT_ID, so surely a NESTED LOOPS join is appropriate? None of the simple rules that I advertised in Chapter 11 seem to suggest that a MERGE JOIN is even remotely appropriate. What is going on? Let us take a deeper look.

Operations 2 and 3 only seem to deepen the confusion. The execution plan performs an INDEX FULL SCAN with no filter predicate and then accesses the table. This means that operations 2 and 3 will access every row in the table. Why not use a TABLE FULL SCAN? Try and work it out before reading on.

Operations 4, 5, and 6 access the 20 rows from EMPLOYEES that match the JOB_ID predicate through an index and then sort them as required by the MERGE JOIN. No mystery there. But what happens next? Well, the MERGE JOIN probes those 20 rows from EMPLOYEES for each row from DEPARTMENTSin order. That is the key to the access of the DEPARTMENTS table via an INDEX FULL SCAN: the results are in DEPARTMENT_ID order and require no further sort! Not only that, but it turns out that the JOB_ID value SH_CLERK means “Shipping clerk,” and all shipping clerks are part of the “Shipping” department with DEPARTMENT_ID 50. So when the runtime engine looks at DEPARTMENT_ID 60 (the sixth department) it finds that there are no more rows in EMPLOYEES to probe and stops further access to DEPARTMENTS. This is why theA-ROWS column displays a value of 6; only departments 10, 20, 30, 40, 50, and 60 were accessed.

In case you are interested, the execution plan I would have guessed would be better involves a NESTED LOOPS join. Using the technique of forcing the execution plan that I thought would be better, I could prove that it wasn’t. Take a look at Listing 12-4.

Listing 12-4. My inferior execution plan

SELECT /*+ gather_plan_statistics leading(e) use_nl(d)*/
*
FROM hr.employees e JOIN hr.departments d USING (department_id)
WHERE e.job_id = 'SH_CLERK';

SET LINES 200 PAGES 0

SELECT *
FROM TABLE (
DBMS_XPLAN.display_cursor (format => 'BASIC ROWS IOSTATS LAST'));
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffs |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 20 | 27 |
| 1 | NESTED LOOPS | | 1 | | 20 | 27 |
| 2 | NESTED LOOPS | | 1 | 20 | 20 | 7 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 1 | 20 | 20 | 3 |
|* 4 | INDEX RANGE SCAN | EMP_JOB_IX | 1 | 20 | 20 | 1 |
|* 5 | INDEX UNIQUE SCAN | DEPT_ID_PK | 20 | 1 | 20 | 4 |
| 6 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 20 | 1 | 20 | 20 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("E"."JOB_ID"='SH_CLERK')
5 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

You can see that the NESTED LOOPS join repeats the lookup on DEPT_ID_PK 20 times, resulting in a total of 27 buffer accesses rather than the 5 from Listing 12-3.

This is all pretty cool. Did the CBO work all this out? No. You can see that the CBO had no idea that the runtime engine would stop at DEPARTMENT_ID 60. We know this because there are 27 rows in DEPARTMENTS and the value for E-ROWS for operation 2 is 27. So the CBO was guessing and got lucky. But the CBO’s guess was far better than mine for all my years of SQL tuning experience.

Tip This book spends a lot of time telling you why the CBO gets things wrong a lot of the time. When the CBO gets things horribly wrong it generally does so because it is basing its decisions on information that is either entirely absent or horribly wrong. Listen to Wolfgang. Correct the CBO’s misunderstandings and it will usually do a better job than you and I working together would ever do. The only problem is that it is sometimes quite impractical to correct all the CBO's bad assumptions, as we shall see in Chapter 14.

Access Method

As you read in Chapter 10, there are myriad ways to access a table, but the access method choices that are likely to have the most impact on the performance of a query, one way or another, are those related to index range scans: Which index, if any, should be selected?

Just because the join method is nested loops does not mean that access via an index on the join predicate is inevitable, and just because a hash join is used does not mean that a full table scan on the probe table is inevitable. Take a look at Listing 12-5, which uses a hash join but no full table scan.

Listing 12-5. Hash joins with no full table scans

SELECT *
FROM sh.sales s, sh.customers c
WHERE c.cust_marital_status = 'married'
AND c.cust_gender = 'M'
AND c.cust_year_of_birth = 1976
AND c.cust_id = s.cust_id
AND s.prod_id = 136;

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 |
|* 1 | HASH JOIN | | 40 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMERS | 38 |
| 3 | BITMAP CONVERSION TO ROWIDS | | |
| 4 | BITMAP AND | | |
|* 5 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_YOB_BIX | |
|* 6 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_MARITAL_BIX | |
|* 7 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_GENDER_BIX | |
| 8 | PARTITION RANGE ALL | | 710 |
| 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 710 |
| 10 | BITMAP CONVERSION TO ROWIDS | | |
|* 11 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("C"."CUST_ID"="S"."CUST_ID")
5 - access("C"."CUST_YEAR_OF_BIRTH"=1976)
6 - access("C"."CUST_MARITAL_STATUS"='married')
7 - access("C"."CUST_GENDER"='M')
11 - access("S"."PROD_ID"=136)

The query in Listing 12-5 performs a join of the CUSTOMERS and SALES tables in the SH example schema. The CBO estimates that there are 23 married male customers born in 1976 and that amongst those 23 customers, 24 sales will have been made for product 136. Working on these assumptions the CBO reckoned that a NESTED LOOPS join would need to make 23 probes on SALES. The choice between nested loops with 23 probes and a hash join with a full table scan might have been a close thing, but on this occasion the hash join appears much faster. The reason for this is that there are selective predicates on CUSTOMERSindependent of the join and a predicate on the SALES table (s.prod_id = 136) that is independent of the join so the SALES table can be accessed via an index even with the hash join. Notice the PARTITION RANGE ALLoperation on line 8. Regardless of the join method, all partitions need to be accessed as there is no predicate on the partitioning column TIME_ID.

The CBO’s analysis of the appropriate join method and access method can’t be faulted, but as it happens there are 284 married male customers born in 1976 and the join inputs should be swapped, but it is not a big deal on this occasion.

Listing 12-6 uses nested loops even though there is no index on the joined column.

Listing 12-6. Nested loops with un-indexed join column

SELECT *
FROM oe.order_items i1, oe.order_items i2
WHERE i1.quantity = i2.quantity
AND i1.order_id = 2392
AND i1.line_item_id = 4
AND i2.product_id = 2462;

------------------------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | NESTED LOOPS | | 1 |
| 2 | TABLE ACCESS BY INDEX ROWID | ORDER_ITEMS | 1 |
|* 3 | INDEX UNIQUE SCAN | ORDER_ITEMS_PK | 1 |
|* 4 | TABLE ACCESS BY INDEX ROWID BATCHED| ORDER_ITEMS | 1 |
|* 5 | INDEX RANGE SCAN | ITEM_PRODUCT_IX | 2 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("I1"."ORDER_ID"=2392 AND "I1"."LINE_ITEM_ID"=4)
4 - filter("I1"."QUANTITY"="I2"."QUANTITY")
5 - access("I2"."PRODUCT_ID"=2462)

Listing 12-6 performs a self-join on the ORDER_ITEMS table in the OE example schema. The query looks for order items that match the order quantity for a specified line item. The join column is QUANTITY, which is not typically in a WHERE clause and is not indexed. The query is perhaps a little unusual but might be used to investigate a suspected data entry issue.

As with Listing 12-5, both row sources in Listing 12-6 have highly selective predicates independent of the join. On this occasion, one of the row sources, I1, is being accessed by a unique index, so a nested loop is appropriate.

So far in this chapter we have covered join order, join method, and access method. However, there are one or two more factors that the CBO might need to consider as part of final state optimization. Let us begin with a discussion of queries with IN lists.

IN List Iteration

When an IN list is specified in a query on column or columns supported by an index, the CBO has a choice of how many columns to use. Look at the query in Listing 12-7.

Listing 12-7. IN list execution plan options—no iteration

SELECT *
FROM hr.employees e
WHERE last_name = 'Grant' AND first_name IN ('Kimberely', 'Douglas')
ORDER BY last_name, first_name;

----------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 2 (0)|
|* 2 | INDEX RANGE SCAN | EMP_NAME_IX | 1 (0)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LAST_NAME"='Grant')
filter("FIRST_NAME"='Douglas' OR "FIRST_NAME"='Kimberely')

The EMP_NAME_IX index on the EMPLOYEES table in the HR example schema includes both LAST_NAME and FIRST_NAME columns. There aren’t likely to be too many choices for FIRST_NAME for LAST_NAME = 'Grant', so the CBO has chosen a single INDEX RANGE SCAN on LAST_NAME and filtered out any index entries that don’t match either of the supplied values for FIRST_NAME. Suppose there were loads and loads of FIRST_NAMES—then an alternative plan might be suitable. Look at the hinted version of the query in Listing 12-8.

Listing 12-8. IN list execution plan options—with iteration

SELECT /*+ num_index_keys(e emp_name_ix 2)*/
*
FROM hr.employees e
WHERE last_name = 'Grant' AND first_name IN ('Kimberely','Douglas')
ORDER BY last_name,first_name ;

-----------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 (0)|
| 1 | INLIST ITERATOR | | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 2 (0)|
|* 3 | INDEX RANGE SCAN | EMP_NAME_IX | 1 (0)|
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("LAST_NAME"='Grant' AND ("FIRST_NAME"='Douglas' OR
"FIRST_NAME"='Kimberely'))

The hint NUM_INDEX_KEYS can be used to indicate how many columns to use when performing an INDEX RANGE SCAN when an IN list is present. The supplied hint specifies that two columns are used. This means that we need to run two INDEX RANGE SCAN operations, driven by the INLIST ITERATOR operation. The first INDEX RANGE SCAN uses LAST_NAME = 'Grant' and FIRST_NAME = 'Douglas' as access predicates and the second INDEX RANGE SCAN uses LAST_NAME = 'Grant' and FIRST_NAME = 'Kimberly' as access predicates. I don’t personally find the description of the access predicates in the DBMS_XPLAN display particularly helpful in this case, so I hope my explanation has helped.

Notice that no sort is present in the execution plan in either Listing 12-7 or Listing 12-8. In the former case the index entries are already appropriately ordered. In the latter case, the members of the IN list have been reordered to avoid the need for a sort. The estimated cost of the alternative plans is virtually identical in this case, but on larger tables there may be a substantial benefit from iteration.

Summary

Final state optimization is the name I have given to the most well-known part of the CBO: the part that deals with join order, join method, and access method. However, final state optimization also deals with inlist iteration.

There is, however, a lot more to the CBO than final state optimization. The CBO also has the ability to transform a query that you write into something that may look completely different from the one you supply. This rather large topic is the focus of Chapter 13.