Advanced Tuning Techniques - Optimization - Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

PART 4. Optimization

CHAPTER 19. Advanced Tuning Techniques

When I first planned out this book I imagined that this chapter would be a lot longer than it is. In truth, most of the advanced techniques that I use in SQL tuning have already been addressed in earlier chapters. I just have a couple of educational, if obscure, cases to run through. Let us begin with some unusual ways to use the ROWID pseudocolumn.

The availability of the ROWID pseudocolumn is a distinguishing feature of the Oracle database product. This pseudocolumn can be used to get around quite a few of the characteristics of the product. I use the word characteristics rather than a more disparaging noun because no criticism is intended. In truth, the CBO has to make a decision on an execution plan in as short a time as possible and it makes little sense to consider options that only make sense once in a blue moon.

Leveraging an INDEX FAST FULL SCAN

In Chapter 10 I explained that the INDEX FAST FULL SCAN access method could only be used when all the columns selected from the table were in the index. However, sometimes this can be a nuisance, and Listing 19-1 shows how to work around the problem in extreme cases.

Listing 19-1. Self-join to use an INDEX FAST FULL SCAN

SELECT /*+ no_eliminate_join(t1) no_eliminate_join(t2) leading(t1) use_nl(t2) rowid(t2) */
t2.*
FROM sh.sales t1, sh.sales t2
WHERE MOD (t1.cust_id, 10000) = 1 AND t2.ROWID = t1.ROWID;

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9188 | 9597 (1)|
| 1 | NESTED LOOPS | | 9188 | 97 (1)|
| 2 | PARTITION RANGE ALL | | 9188 | 407 (0)|
| 3 | BITMAP CONVERSION TO ROWIDS | | 9188 | 407 (0)|
| 4 | BITMAP INDEX FAST FULL SCAN| SALES_CUST_BIX | | |
| 5 | TABLE ACCESS BY USER ROWID | SALES | 1 | 1 (0)|
------------------------------------------------------------------------------

Listing 19-1 selects all the columns from SH.SALES yet somehow has managed to use an INDEX FAST FULL SCAN. The point here is that we have used a self-join, and the only columns referenced from the copy of SH.SALES with alias T1 are CUST_ID and ROWID. Since there is no need to reference the table data from T1, an INDEX FAST FULL SCAN can be used. We then use the ROWID obtained from the T1 copy of SH.SALES to access the T2 copy and from there pick up the rest of the columns.

It is extremely unusual to need to use this sort of approach, which is almost certainly why the CBO doesn’t consider it unless a self-join is explicitly coded. In the case of Listing 19-1 we only read three rows (not the arbitrary 1% selectivity that the CBO assumes), but the index entries for these three rows are scattered about the index. An INDEX RANGE SCAN is not an option, and so in this exceptional case we have no good alternative to the INDEX FAST FULL SCAN.

There are two reasons why we need to use all these hints1 to get the execution plan that we want: first, the estimate for the number of rows that match the MOD function call is too high, and second, join elimination is a heuristic optimizer transformation, as I explained in Chapter 13. Remember that heuristic transformations are applied even if they result in an execution plan with a higher estimated cost.

Because of the small size of the SALES_CUST_BIX index you may find that the use of a traditional INDEX FULL SCAN without a self-join will perform equivalently to Listing 19-1. But when the index is large and isn’t cached you should see a significant improvement in performance when using an INDEX FAST FULL SCAN and a self-join.

An INDEX FAST FULL SCAN isn’t the only access method with restrictions that we can work around by means of a self-join. Let me now make true on a promise I made in Chapter 13 and show you how users of Oracle database standard edition can simulate a star transformation in extreme circumstances.

Simulating a Star Transformation

Users of Oracle database standard edition have no access to either bitmap indexes or partitioned tables. Listing 19-2 begins by copying the SH.SALES table to an unpartitioned table in the current schema before issuing a rewrite of the query in Listing 13-35.

Listing 19-2. Star transformations for standard edition users

CREATE TABLE t1
AS
SELECT *
FROM sh.sales s
ORDER BY time_id, cust_id, prod_id;

CREATE INDEX t1_cust_ix
ON t1 (cust_id)
COMPRESS 1;

CREATE INDEX t1_prod_ix
ON t1 (prod_id)
COMPRESS 1;

WITH q1
AS (SELECT /*+ no_merge */
c.cust_first_name, s.ROWID rid
FROM sh.customers c JOIN t1 s USING (cust_id)
WHERE c.cust_last_name = 'Everett')
,q2
AS (SELECT /*+ no_merge */
p.prod_name, s.ROWID rid
FROM sh.products p JOIN t1 s USING (prod_id)
WHERE p.prod_category = 'Electronics')
SELECT /*+ no_eliminate_join(s) leading(q1 q2 s) use_nl(s) */
q2.prod_name
,q1.cust_first_name
,s.time_id
,s.amount_sold
FROM q1 NATURAL JOIN q2 JOIN t1 s ON rid = s.ROWID;

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1591 | 2420 (1)|
| 1 | NESTED LOOPS | | 1591 | 2420 (1)|
| 2 | HASH JOIN | | 1591 | 829 (1)|
| 3 | VIEW | | 7956 | 545 (1)|
| 4 | NESTED LOOPS | | 7956 | 545 (1)|
| 5 | TABLE ACCESS FULL | CUSTOMERS | 61 | 423 (1)|
| 6 | INDEX RANGE SCAN | T1_CUST_IX | 130 | 2 (0)|
| 7 | VIEW | | 183K| 284 (1)|
| 8 | NESTED LOOPS | | 183K| 284 (1)|
| 9 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 14 | 3 (0)|
| 10 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 14 | 1 (0)|
| 11 | INDEX RANGE SCAN | T1_PROD_IX | 12762 | 20 (0)|
| 12 | TABLE ACCESS BY USER ROWID | T1 | 1 | 1 (0)|
---------------------------------------------------------------------------------------------

The query in Listing 19-2 uses two factored subqueries, Q1 and Q2. Q1 joins the dimension table SH.CUSTOMERS to the T1 fact table in order to identify the ROWIDs from T1 that match the specified customers. Only the index on T1.CUST_ID is required; no access to T1 itself is required. Factored subquery Q2 does the same sort of thing with the SH.PRODUCTS dimension table. The main query joins Q1 and Q2 using the ROWIDs from T1 to identify just the rows from T1 that match both sets of conditions before reading those rows from T1.

As usual, we have to litter our code with hints to force this execution plan, which on this sample data is quite inefficient. The main point to note is that the plan contains no use of enterprise edition features such as partitioning or bitmap conversions.

I must admit that I haven’t worked with a client that uses standard edition for many years, so the fact that I have never used this technique in a real-life production environment doesn’t mean anything. However, from a theoretical point of view I suspect legitimate uses of this type of query are as rare as hens’ teeth. To explain why, I will provide a simpler example of this sort of technique in the next section.

Simulating an INDEX JOIN

As I mentioned in Chapter 10, the index join access method only works when all the columns referenced by the query are in the indexes being joined. But as we saw in Listing 19-2, we can code our own hash join on ROWID. Given the availability of the INDEX_COMBINE hint for enterprise edition users, the technique used in Listing 19-3 is probably of most interest to users of standard edition.

Listing 19-3. Simulating an index join

WITH q1
AS (SELECT /*+ no_merge */
ROWID rid
FROM t1
WHERE cust_id = 462)
,q2
AS (SELECT /*+ no_merge */
ROWID rid
FROM t1
WHERE prod_id = 19)
SELECT /*+ leading(q1 q2) use_nl(t1) */
t1.*
FROM q1, q2, t1
WHERE q1.rid = q2.rid AND q2.rid = t1.ROWID;

-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 130 | 155 (0)|
| 1 | NESTED LOOPS | | 130 | 155 (0)|
| 2 | HASH JOIN | | 130 | 25 (0)|
| 3 | VIEW | | 130 | 3 (0)|
| 4 | INDEX RANGE SCAN | T1_CUST_IX | 130 | 3 (0)|
| 5 | VIEW | | 12762 | 22 (0)|
| 6 | INDEX RANGE SCAN | T1_PROD_IX | 12762 | 22 (0)|
| 7 | TABLE ACCESS BY USER ROWID| T1 | 1 | 1 (0)|
-----------------------------------------------------------------------

Listing 19-3 uses essentially the same technique as Listing 19-2, except that we have eliminated any reference to dimension tables. For the execution plans shown in Listing 19-2 and Listing 19-3 to be optimal, all of the following conditions must be met:

1. The option to use the INDEX_COMBINE hint is unavailable.

2. Neither index can be very selective, otherwise we wouldn’t need to join the second index.

3. The combined selectivity of the two indexes must be strong, otherwise we would use a full table scan.

4. The cost savings in accessing fewer rows from the table must be more than the overhead of the additional indexed access and the hash join.

Theoretically it is possible to find a situation in which all these conditions are met. It is the fourth and final condition in the list above that is the most difficult to meet, as the hash join is far more expensive than the BITMAP AND operation used in conjunction with bitmap indexes; an index with poor selectivity is going to generate a large hash table, and the hash join may even spill to disk.

So why am I spending time talking about techniques that I have never used and that you probably will not either? There are two reasons. First, there is a lot of old code out there that uses the AND_EQUAL hint (which is pretty much the same sort of thing), and you should realize that these hints are likely to be counterproductive. The second reason is that there is another scenario where self-joins on ROWID can actually be useful. Let us look at that now.

Joining Multi-Column Indexes

As of release 12.1.0.1 neither the INDEX_JOIN nor the INDEX_COMBINE hints work when the indexes referenced are multi-column indexes with a common prefix. Listing 19-4 replaces the two indexes on T1 that we created in Listing 19-2 with two multi-column indexes that are both prefixed with the TIME_ID column.

Listing 19-4. Attempting to combine multi-column indexes

DROP INDEX t1_cust_ix;
DROP INDEX t1_prod_ix;

CREATE INDEX t1_cust_ix2
ON t1 (time_id, cust_id);

CREATE INDEX t1_prod_ix2
ON t1 (time_id, prod_id);

SELECT /*+ index_join(t1) index_combine(t1) */
COUNT (*)
FROM t1
WHERE time_id = DATE '2001-12-28' AND cust_id = 1673 AND prod_id = 44;

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 (0)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 4 (0)|
| 3 | INDEX RANGE SCAN | T1_CUST_IX2 | 6 | 3 (0)|
---------------------------------------------------------------------------------

As you can see from Listing 19-4, we end up accessing the T1 table itself despite the presence of INDEX_JOIN and INDEX_COMBINE hints. Neither hint is considered legal. When I came across this situation recently I was glad that I understood how to combine indexes by hand.Listing 19-5 shows a legitimate implementation of a self-join to simulate an index join.

Listing 19-5. Simulating an index join to avoid table access

WITH q1
AS (SELECT /*+ no_merge */
ROWID rid
FROM t1
WHERE time_id = DATE '2001-12-28' AND cust_id = 1673)
,q2
AS (SELECT /*+ no_merge */
ROWID rid
FROM t1
WHERE time_id = DATE '2001-12-28' AND prod_id = 44)
SELECT /*+ leading(q1 q2) use_nl(t1) */
COUNT (*)
FROM q1 NATURAL JOIN q2;
----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 (0)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | HASH JOIN | | 6 | 6 (0)|
| 3 | VIEW | | 6 | 3 (0)|
| 4 | INDEX RANGE SCAN| T1_CUST_IX2 | 6 | 3 (0)|
| 5 | VIEW | | 25 | 3 (0)|
| 6 | INDEX RANGE SCAN| T1_PROD_IX2 | 25 | 3 (0)|
----------------------------------------------------------------

Due to the combination of the strong selectivity and the strong clustering factor of the T1_CUST_IX2 index, there is no big difference in the performance of the queries in Listing 19-4 and Listing 19-5, but in real life the gains can be substantial, particularly when the indexes are cached and the table itself is not.

I have one last usage case for the ROWID pseudocolumn, and that is to support application-coded parallel execution. Let us look at that case now.

Using ROWID Ranges for Application-Coded Parallel Execution

Understanding the problem is one of the key themes of this book, and when faced with a query like that in Listing 19-4 it is worthwhile trying to find out why the query is needed in the first place. One possible reason an application might want to know the number of rows satisfying a particular set of conditions is to identify the number of application threads needed to perform some processing in parallel.

This sort of application-driven parallel processing is quite common and is the raison d’être for the DBMS_PARALLEL_EXECUTE package. This package allows you to identify ranges of ROWIDs, referred to as chunks. The idea is that each chunk will be processed by a different application thread. The potential problems with the aforementioned package are that all rows in the table are considered and that the number of rows processed by each chunk may vary wildly.

When I created table T1 in Listing 19-2 I sorted the rows by TIME_ID, CUST_ID, and PROD_ID, reflecting a clustering of data that might well arise in real life. Now suppose the query in Listing 19-5 returned 3,000,000 rows rather than the 3 rows that are actually returned, and that we wanted to split these into three batches of 1,000,000 rows each for some kind of further processing. Listing 19-6 should give you some idea of how to do this.

Listing 19-6. Application-driven parallel execution for clustered data

DECLARE
CURSOR c1
IS
WITH q1
AS (SELECT /*+ no_merge */
ROWID rid
FROM t1
WHERE time_id = DATE '2001-12-28' AND cust_id > 1)
,q2
AS (SELECT /*+ no_merge */
ROWID rid
FROM t1
WHERE time_id = DATE '2001-12-28' AND prod_id > 1)
,q3
AS (SELECT /*+ leading(q1 q2) use_nl(t1) */
rid, ROW_NUMBER () OVER (ORDER BY rid) rn
FROM q1 NATURAL JOIN q2)
SELECT TRUNC ( (rn - 1) / 100) + 1 chunk
,MIN (rid) min_rowid
,MAX (rid) max_rowid
,COUNT (*) chunk_size
FROM q3
GROUP BY TRUNC ( (rn - 1) / 100);

CURSOR c2 (
min_rowid ROWID
,max_rowid ROWID)
IS
SELECT /*+ rowid(t1) */
*
FROM t1
WHERE ROWID BETWEEN min_rowid AND max_rowid
AND time_id = DATE '2001-12-28'
AND prod_id > 1
AND cust_id > 1;
BEGIN
FOR r1 IN c1
LOOP
FOR r2 IN c2 (r1.min_rowid, r1.max_rowid)
LOOP
NULL;
END LOOP;
END LOOP;
END;
/

-- Execution plan for cursor C1 below

------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 628 | 9 (0)|
| 1 | HASH GROUP BY | | 628 | 9 (0)|
| 2 | VIEW | | 628 | 9 (0)|
| 3 | WINDOW SORT | | 628 | 9 (0)|
| 4 | HASH JOIN | | 628 | 9 (0)|
| 5 | VIEW | | 629 | 5 (0)|
| 6 | INDEX RANGE SCAN| T1_CUST_IX2 | 629 | 5 (0)|
| 7 | VIEW | | 629 | 4 (0)|
| 8 | INDEX RANGE SCAN| T1_PROD_IX2 | 629 | 4 (0)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("Q1"."RID"="Q2"."RID")
6 - access("TIME_ID"=TO_DATE(' 2001-12-28 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "CUST_ID">1)
8 - access("TIME_ID"=TO_DATE(' 2001-12-28 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "PROD_ID">1)

-- Execution plan for cursor C2 below

------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 1206 (1)|
| 1 | FILTER | | | |
| 2 | TABLE ACCESS BY ROWID RANGE| T1 | 2 | 1206 (1)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(CHARTOROWID(:B2)>=CHARTOROWID(:B1))
2 - access(ROWID>=CHARTOROWID(:B1) AND ROWID<=CHARTOROWID(:B2))
filter("TIME_ID"=TO_DATE(' 2001-12-28 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "PROD_ID">1 AND "CUST_ID">1)

Listing 19-6 shows a PL/SQL block that contains two cursors. The query in cursor C1 identifies the rows in T1 that match our selection predicates (I modified them slightly so that 1,196 rows are returned rather than 3) and then groups them into chunks with a maximum of 100 rows each. Eleven of the 12 chunks in the results of the query relate to 100 rows from T1, and the last chunk relates to 96 rows. The minimum and maximum ROWIDs for each chunk are identified.

The PL/SQL block executes the cursor C2 12 times in sequence, once for each of the 12 chunks identified by cursor C1. In real life these executions would be performed in parallel by separate application threads. Only the small portion of the table containing our clustered data is scanned, and for larger numbers of rows the threads would not compete for blocks in T1 and multi-block reads could be used to read the clustered rows efficiently.

Notice that cursor C2 has to include the original predicates. Not all rows in the ROWID range are guaranteed to be from our set.

That pretty much wraps up what I wanted to say about the ROWID pseudocolumn. I want to switch tack now and talk about a different SQL tuning experience that seems too bizarre to be true.

Converting an Inner Join to an Outer Join

You may think that there is a typographical error in the title of this section. There isn’t! This is the story of a real-life experience that involved John, my client working in risk management whom I introduced in the previous chapter.

John (whom you may have guessed is a composite character) came to me with yet another poorly performing query. This query involved a view defined in the data dictionary. Listing 19-7 shows the basic idea.

Listing 19-7. Suboptimal plan involving a data dictionary view

CREATE OR REPLACE VIEW sales_simple
AS
SELECT cust_id
,time_id
,promo_id
,amount_sold
,p.prod_id
,prod_name
FROM sh.sales s, sh.products p
WHERE s.prod_id = p.prod_id;

SELECT *
FROM sales_simple v JOIN sh.customers c ON c.cust_id = v.cust_id
WHERE v.time_id = DATE '1998-03-31'
AND c.cust_first_name = 'Madison'
AND cust_last_name = 'Roy'
AND cust_gender = 'M';

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 448 (1)|
| 1 | NESTED LOOPS | | | |
| 2 | NESTED LOOPS | | 1 | 448 (1)|
| 3 | HASH JOIN | | 1 | 447 (1)|
| 4 | TABLE ACCESS FULL | CUSTOMERS | 1 | 423 (1)|
| 5 | PARTITION RANGE SINGLE | | 1188 | 24 (0)|
| 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 1188 | 24 (0)|
| 7 | BITMAP CONVERSION TO ROWIDS | | | |
| 8 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | | |
| 9 | INDEX UNIQUE SCAN | PRODUCTS_PK | 1 | 0 (0)|
| 10 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 1 | 1 (0)|
---------------------------------------------------------------------------------------------

The query in Listing 19-6 is suboptimal because there is only one row in the SH.CUSTOMERS table that matches the supplied predicate. The CBO should have picked a join order that allowed it to access the SH.SALES table via the SALES_CUST_BIX index but didn’t for reasons that aren’t immediately obvious. Not to worry—a couple of hints should do the trick, as shown in Listing 19-8.

Listing 19-8. Data dictionary query with hints in the calling code

CREATE OR REPLACE VIEW sales_simple
AS
SELECT /*+ qb_name(q1) */
cust_id
,time_id
,promo_id
,amount_sold
,p.prod_id
,prod_name
FROM sh.sales s, sh.products p
WHERE s.prod_id = p.prod_id;

SELECT /*+ merge(q1) leading(c s@q1 p@q1) use_nl(s@q1) use_nl(p@q1) index(s@q1 (cust_id))*/
*
FROM sales_simple v JOIN sh.customers c ON c.cust_id = v.cust_id
WHERE v.time_id = DATE '1998-03-31'
AND c.cust_first_name = 'Madison'
AND cust_last_name = 'Roy'
AND cust_gender = 'M';

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 854 (1)|
| 1 | NESTED LOOPS | | | |
| 2 | NESTED LOOPS | | 1 | 854 (1)|
| 3 | NESTED LOOPS | | 1 | 853 (1)|
| 4 | TABLE ACCESS FULL | CUSTOMERS | 1 | 423 (1)|
| 5 | PARTITION RANGE SINGLE | | 1 | 853 (1)|
| 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 1 | 853 (1)|
| 7 | BITMAP CONVERSION TO ROWIDS | | | |
| 8 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | | |
| 9 | INDEX UNIQUE SCAN | PRODUCTS_PK | 1 | 0 (0)|
| 10 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 1 | 1 (0)|
---------------------------------------------------------------------------------------------

The hints in Listing 19-8 are a bit difficult to read but they do the trick: we access the SH.SALES table through the SALES_CUST_BIX index. At this stage John reminded me that he was using his data visualization tool and that all hints would have to go into the view. I could have used global hints or I could have defined a “wrapper” view that encompassed the join with SH.CUSTOMERS. But since John wanted some flexibility to change his queries, neither of these options was particularly appealing. Is there another way?

It occurred to me that if I prevented view merging I might be able to do what I wanted. Listing 19-9 was my first attempt.

Listing 19-9. Attempting to push a predicate into a simple view

CREATE OR REPLACE VIEW sales_simple
AS
SELECT /*+ no_merge leading(s p) push_pred index(s (cust_id)) */
cust_id
,time_id
,promo_id
,amount_sold
,p.prod_id
,prod_name
FROM sh.sales s, sh.products p
WHERE s.prod_id = p.prod_id;

SELECT *
FROM sales_simple v JOIN sh.customers c ON c.cust_id = v.cust_id
WHERE v.time_id = DATE '1998-03-31'
AND c.cust_first_name = 'Madison'
AND cust_last_name = 'Roy'
AND cust_gender = 'M';

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 450 (1)|
|* 1 | HASH JOIN | | 1 | 450 (1)|
|* 2 | TABLE ACCESS FULL | CUSTOMERS | 1 | 423 (1)|
| 3 | VIEW | SALES_SIMPLE | 1188 | 27 (0)|
|* 4 | HASH JOIN | | 1188 | 27 (0)|
| 5 | PARTITION RANGE SINGLE | | 1188 | 24 (0)|
| 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 1188 | 24 (0)|
| 7 | BITMAP CONVERSION TO ROWIDS | | | |
|* 8 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | | |
| 9 | TABLE ACCESS FULL | PRODUCTS | 72 | 3 (0)|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("C"."CUST_ID"="V"."CUST_ID")
2 - filter("C"."CUST_FIRST_NAME"='Madison' AND "C"."CUST_LAST_NAME"='Roy' AND
"C"."CUST_GENDER"='M')
4 - access("S"."PROD_ID"="P"."PROD_ID")
8 - access("TIME_ID"=TO_DATE(' 1998-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The idea behind the failed optimization attempt of Listing 19-9 was as follows: if we prevent view merging we can force the query to use the SALES_CUST_BIX index using the predicate explicitly pushed into the view with a PUSH_PRED hint. Well, most of what I hinted came to pass: the view was not merged and the desired join order was selected. But the predicate on SH.CUSTOMERS was not pushed into the view! Why on earth not?

After half an hour or so of Googling I found the following blog from the optimizer team:

https://blogs.oracle.com/optimizer/entry/basics_of_join_predicate_pushdown_in_oracle

This blog post explains the circumstances under which the join predicate pushdown (JPPD) transformation will operate and the list of conditions I listed in Chapter 13 were shown. To help you recall, this is the list:

· UNION ALL/UNION view

· Outer-joined view

· Anti-joined view

· Semi-joined view

· DISTINCT view

· GROUP-BY view

As I explained in Chapter 13, this list of conditions is almost identical to the list of conditions that prevent simple view merging. The idea seems to be that either a view is merged or JPPD will be legal. But notably absent from the list I have just presented is the presence of a NO_MERGEhint! Now we have what I gather is an unforeseen possibility that simple view merging is suppressed but JPPD is illegal.

With this understanding in mind, I redefined the view and then asked John to click a couple of radio buttons in his data visualization tool to change the query to an outer join. John was somewhat incredulous, pointing out that in his case the results of the query would be unchanged and suggesting that an outer join couldn’t possibly outperform an inner join. But nothing ventured, nothing gained. The modified view definition and the modified query generated by the data visualization tool are shown in Listing 19-10.

Listing 19-10. Pushing predicates into an outer join

CREATE OR REPLACE VIEW sales_simple
AS
SELECT /*+ no_merge no_index(s (time_id)) */
cust_id
,time_id
,promo_id
,amount_sold
,p.prod_id
,prod_name
FROM sh.sales s, sh.products p
WHERE s.prod_id = p.prod_id;

SELECT *
FROM sales_simple v
RIGHT JOIN sh.customers c
ON c.cust_id = v.cust_id AND v.time_id = DATE '1998-03-31'
WHERE c.cust_first_name = 'Madison'
AND cust_last_name = 'Roy'
AND cust_gender = 'M';

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 427 (0)|
| 1 | NESTED LOOPS OUTER | | 1 | 427 (0)|
|* 2 | TABLE ACCESS FULL | CUSTOMERS | 1 | 423 (1)|
| 3 | VIEW PUSHED PREDICATE | SALES_SIMPLE | 1 | 5 (0)|
| 4 | NESTED LOOPS | | | |
| 5 | NESTED LOOPS | | 1 | 5 (0)|
| 6 | PARTITION RANGE SINGLE | | 1 | 4 (0)|
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES | 1 | 4 (0)|
| 8 | BITMAP CONVERSION TO ROWIDS | | | |
|* 9 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | | |
|* 10 | INDEX UNIQUE SCAN | PRODUCTS_PK | 1 | 0 (0)|
| 11 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 1 | 1 (0)|
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C"."CUST_FIRST_NAME"='Madison' AND "C"."CUST_LAST_NAME"='Roy' AND
"C"."CUST_GENDER"='M')
7 - filter("TIME_ID"=TO_DATE(' 1998-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
9 - access("CUST_ID"="C"."CUST_ID")
10 - access("S"."PROD_ID"="P"."PROD_ID")

The query in Listing 19-10 has been modified to use an outer join. Notice that I have had to include the predicate on TIME_ID in the outer join conditions to prevent the heuristic outer-join-to-inner transformation.

Now that we have an outer join in place, the conditions for JPPD have been met and I only need two hints defined in the view: the NO_MERGE hint to prevent view merging and the NO_INDEX hint to prevent the SALES_TIME_BIX index being unnecessarily combined with theSALES_CUST_BIX!

I must say that converting an inner join to an outer join to improve performance is one of the weirdest experiences in my SQL tuning career, and I think it a fitting way to end this chapter on advanced techniques.

Summary

This chapter has explained some of the lesser-used techniques in SQL tuning that haven’t been covered in previous chapters. As always, you should look at the techniques more as examples of how to think about tuning problems more than as principles that need to be learned by rote. If you do find the need to draw on the lessons learned in this chapter you will probably be solving a problem that is subtly different from the examples found here, so your problem will likely require a solution that differs, subtly or not, from the solutions provided in this chapter.

We have now concluded our discussion of SQL optimization, and it is now time to turn our attention to the problem of stabilizing our finely tuned application in the production environment. I introduced the concept of TSTATS in Chapter 6, and in Chapter 20 we will discuss the details.

__________________

1Although the hints I have used make my intentions explicit, a single NO_QUERY_TRANSFORMATION hint is a potentially more future-proof alternative to the three NO_ELIMINATE_JOIN hints.