Expert Oracle SQL: Optimization, Deployment, and Statistics (2014)

PART 1. Basic Concepts

CHAPTER 2. The Cost-Based Optimizer

Chapter 1 introduced the CBO. It is the job of the CBO to determine how to set about running your SQL statement. An approach to executing an SQL statement is known as an execution plan, and for very simple statements, there may only be one way to set about doing the task. Therefore, there will only be one possible execution plan. However, for the vast majority of SQL statements, there are different possible execution plans from which the CBO must choose. In English language, optimization means picking the best from a number of options, and in this case the CBO is trying to pick the optimal execution plan from all of the possible plans it identifies.

THE RULE-BASED OPTIMIZER

When the database is mounted but not open, queries cannot use the CBO and the rule-based optimizer (RBO) is used. It is possible to use the RBO in other situations, but since the release of Oracle database 10gR1, the RBO is no longer supported for general purpose use. Therefore, I won’t be discussing the RBO further in this book.

The vast majority of the CBO’s work revolves around optimizing queries or the subqueries embedded in DML or Data Definition Language (DDL) statements. For DML and DDL statements, there are one or two other things to consider, such as whether to perform direct-path or conventional inserts, but these are normally determined by relatively fixed rules. For the sake of simplicity, I will ignore these and continue on the assumption that the CBO is there purely to optimize queries.

I begin this chapter with a discussion of the criteria that the CBO uses to determine the optimal execution plan. After that, I’ll introduce the concept of cost, which is how the CBO quantifies the benefits of one plan over another. Finally, I’ll give a brief overview of the process that the CBO uses to identify and cost the set of plans from which it must choose. I’ll return to the topics introduced in this chapter later in the book and go into far more detail.

The Optimal Execution Plan

So if optimization is about picking the optimal execution plan, it is important to understand what an optimal execution plan is. Fortunately, or unfortunately, the CBO has a very simple view of what constitutes optimal: the optimal execution plan is the one that runs the quickest. That’s it. Well that’s almost it. If the optimizer is running using one of the FIRST_ROWS optimizer modes, then the optimal plan is the one that returns the first few rows the quickest rather than the one that returns all the rows the quickest. Be that as it may, the point is that the CBO does not encumber itself with concerns about resource consumption and the like. So, for example, if a serial execution plan is estimated to take 4.98 seconds and a plan involving 128 parallel query slaves is estimated to complete in 4.97 seconds, then the CBO will pick the latter! That is, it will do that if you let it. I’ll discuss the optimization process a little later in this chapter, and as part of that discussion I’ll return to the topic of parallel operations.

The Definition of Cost

So if the optimal execution plan is the one that would finish first, then the CBO needs to estimate how long each plan would take to run and then pick the one with the lowest predicted execution time. That estimated execution time is known as the cost of the plan. Now you might assume that the unit of execution time might be nanoseconds, milliseconds, seconds, or some such standard measurement. Unfortunately, for historical reasons, the unit of cost is defined as the length of time that a single block read takes to complete! So, if the CBO thinks that a query will take 5 seconds to run and that a single block read takes 10 milliseconds, then the assigned cost will be 500 as this is 5,000 milliseconds divided by 10 milliseconds.

CBO COST VS. REAL COST

The term cost in this book is used exclusively to refer to the estimate that the CBO makes of how long an SQL statement will take to execute. That estimate, of course, may be inaccurate. It is important to realize that when it comes to tuning, the human optimization process that attempts to better the CBO, the goal is to reduce the actual elapsed time a statement takes to run. It isn’t important how high the CBO cost estimate is as long as the statement runs fast, and similarly you won’t be impressed by a low cost estimate if the statement takes ages to run.

To arrive at the cost of an execution plan, the CBO doesn’t just estimate the number of single block reads. It also tries to work out how long any multiblock read operations in the plan will take and how long central processing unit (CPU)-related operations, such as in-memory sorts, take to run, and it incorporates these times in the overall cost. Oddly enough, though, it translates all these times into equivalent single block reads. Here’s an example. Suppose that a particular plan is estimated to involve:

· 400 single block reads

· 300 multiblock reads

· 5 seconds of CPU processing

Let’s further assume that the CBO estimates that:

· A single block read takes 10 milliseconds

· A multiblock read takes 30 milliseconds

The cost of this plan is calculated as:

((400 x 10) + (300 x 30) + 5,000)/10 = 1800

These days, execution plans are typically printed with both a cost and an estimated elapsed time in hours, minutes, and seconds, so you don’t have to do quite as much math as you used to!

The CBO’s Cost-Estimating Algorithm

There are two big questions to answer at this point: How is the cost of an execution plan determined? How accurate is that guess? Let’s take a look at these questions now.

Calculating Cost

The main inputs to the CBO’s estimating process are the object statistics that are held in the data dictionary. These statistics will indicate, for example, how many blocks are in a table and, therefore, how many multiblock reads would be required to read it in its entirety. Statistics are also held for indexes, so the CBO has some basis for estimating how many single block reads will be required to read data from a table using an index.

Object statistics are the most important inputs to the CBO’s costing algorithm but by no means the only ones. Initialization parameter settings, system statistics, dynamic sampling, SQL profiles, and SQL baselines are all examples of other things the CBO considers when making its cost estimates.

The Quality of the CBO’s Plan Selection

Let’s look at the accuracy of the CBO’s cost-estimation process and why it sometimes matters but sometimes doesn’t.

The Accuracy of the CBO’s Cost Estimates

You may be surprised to learn that the CBO’s cost calculations are frequently wildly inaccurate! Fortunately, these errors often turn out to be irrelevant. Suppose that the CBO is choosing between two possible execution plans for a query. It thinks that plan A will take 5 seconds to run and plan B 10 seconds, so it chooses plan A. Now it may be that when the query is run using plan A, it actually takes 50 seconds to run and the CBO was off by a factor of 10. Well that error won’t be of any consequence if the estimate for plan B is a big underestimate as well. As long as when the query runs with plan B it isn’t noticeably faster than when it runs with plan A, then no harm is done: the CBO made an acceptable selection of plan despite making wildly inaccurate cost estimates.

In many cases, particularly with interactive online transaction processing (OLTP)-style queries, the choice of plan is obvious and the CBO couldn’t get it wrong even if the object statistics were very inaccurate or nonexistent. For example, if you are reading a single row from a huge table using equality predicates on all primary key columns, it is very difficult to see how the CBO could do anything but use the primary key index. The alternative approach of a full table scan would take hundreds or even thousands of times longer to run, so the correct plan would still be selected even if the CBO’s cost estimates were way off.

Suboptimal Execution Plans

Although the CBO will often pick an acceptable execution plan, it is also quite common for it to pick a severely suboptimal plan, even when statistics are gathered correctly and are up to date. In other words, the CBO picks a plan that takes much longer to run than an alternative plan would have. I’ll explain the reasons why this happens so often later in this book.

There are two main reasons these bad plan selections often go unnoticed:

· The first is that the performance of many SQL statements is of no concern. Suppose you have a five-hour batch that runs overnight and executes several thousand SQL statements. If one SQL statement takes 30 seconds to run instead of one second, it won’t matter much, particularly if other statements in the batch are running concurrently and this statement isn’t on the critical path.

· The second reason that these bad plan selections go unnoticed is that there is no easy way to know how long an SQL statement should take to run; people’s expectations are often very low. The exception, of course, is when a plan changes for the worse; you would know that the new plan is bad because you experienced the previous good plan! I would hazard a conservative guess that the elapsed time of most overnight batch runs that have not been professionally tuned could be at least halved without any code changes (other than adding optimizer hints) given enough time for analysis. In real life, of course, tuning exercises usually involves physical database changes, such as adding indexes and rewriting some code. In these cases, it isn’t uncommon to see batch times reduce by a factor of 10 or more!

The assertion that the CBO often gets it wrong may come as a surprise. But remember that the CBO is just making a guess as to what the right plan is. It can’t be certain. Common sense tells us that when you guess over and over again you will guess wrong some of the time and some of the times you guess wrong you will be very, very wrong.

Now that you understand the concepts of optimization and cost, let’s look at the three main tasks involved in both identifying and costing the various executions plan for a query.

The Optimization Process

The three main tasks for identifying and costing the various execution plans for a query are:

· Determine the degree of parallelism to be used.

· Determine what transformations are to be applied.

· Determine what I will refer to as the final state optimization. I have invented this term because the term optimization is ambiguous. On the one hand, optimization can be used to refer to the entire process including identifying the degree of parallelism and the necessary transformations. On the other hand, it can refer to the process of identifying and costing the various execution plans once transformations and the degree of parallelism have been settled on. To avoid confusion, I will use the term final state optimization to refer to the latter, narrower, concept.

These three tasks are not independent of one another. You can’t really work out what degree of parallelism to use until you know what the serial execution plan looks like. After all, a table and its various indexes may have different degrees of parallelism specified in the data dictionary. But you often can’t work out whether an index or a full table scan is better unless you know the extent to which a full table scan will be parallelized. Fortunately, you and I don’t need to understand how the CBO sorts all this out. We just need to understand the basics, so let’s get started.

Parallelism

If the CBO is entirely unconcerned about resource consumption, you may be wondering why it doesn’t use parallel query most, if not all, of the time. The short answer is that by default the CBO is precluded from considering parallel query, parallel DML, and parallel DDL. To allow the CBO to consider parallel query, you need to do one of the following:

· Add the PARALLEL optimizer hint to the query.

· Alter one or more tables or indexes involved in the query to specify a nondefault degree of parallelism.

· Use an ALTER SESSION statement to specify a degree of parallelism for the session.

· Set up automatic degree of parallelism (auto DOP)on your system.

If you do any of these things, the CBO will be allowed to consider parallel operations, but in all cases the maximum degree of parallelism will be limited to ensure that resource consumption doesn’t get too much out of control. Listing 2-1 provides examples of enabling parallel query operations.

Listing 2-1. Different ways to enable parallel query

CREATE TABLE t
AS
SELECT ROWNUM c1 FROM all_objects;

CREATE INDEX i
ON t (c1);

--
-- Use optimizer hints to allow the CBO to consider parallel operations
--
EXPLAIN PLAN
FOR
SELECT /*+ parallel(10) */
* FROM t;

SELECT * FROM TABLE (DBMS_XPLAN.display);
--
-- Enable parallel operations on operations involving table T
--
ALTER TABLE t PARALLEL (DEGREE 10);
--
-- Enable parallel operations on index I
--

ALTER INDEX i
PARALLEL (DEGREE 10);

--
-- Attempt to force parallel queries of degree 10 on all SQL statements.
-- Be aware that FORCE doesn’t mean parallelism is guaranteed!
--
ALTER SESSION FORCE PARALLEL QUERY PARALLEL 10;
--
-- Setup Auto DOP for the system (logged in as SYS user, RESTART REQUIRED)
--

DELETE FROM resource_io_calibrate$;

INSERT INTO resource_io_calibrate$
VALUES (CURRENT_TIMESTAMP
,CURRENT_TIMESTAMP
,0
,0
,200
,0
,0);

COMMIT;

ALTER SYSTEM SET parallel_degree_policy=auto;

Let’s go through these examples one by one. After creating a table and an associated index, the next statement uses the variant of the PARALLEL optimizer hint introduced in Oracle database 11gR2 to enable parallel query with degree 10 for the statement. Be aware that contrary to the SQL Language Reference manual, this hint does not force a parallel query; it just allows the CBO to consider it. The point is that some operations, such as index range scans on non-partitioned indexes, can’t be parallelized, and a serial index range scan might be more efficient than a parallel full table scan at the specified degree.

If you really want to force a parallel query, you might try the ALTER SESSION statement given in Listing 2-1. Here too I have specified the degree of parallelism to be used and here too parallelism isn’t guaranteed for the same reasons! All the ALTER SESSION statement does is to implicitly add a PARALLEL hint to all statements in the session.

FORCING PARALLEL OPERATIONS

If the CBO picks a serial index range scan and you want to force a parallel full table scan, then using the ALTER SESSION statement as in Listing 2-1 may be insufficient despite the misleading FORCE keyword in the statement. You might also have to hint the code to force the full table scan as well.

I didn’t have to specify the degree of parallelism in any of the examples in Listing 2-1. If I hadn’t, then the CBO would pick the ideal degree of parallelism (ideal DOP), which is the degree of parallelism that allows the query to complete the fastest providing that degree is no more than the value specified by the initialization parameter PARALLEL_DEGREE_LIMIT.

DEGREE OF PARALLELISM AND COST

At some point, increasing the degree of parallelism becomes counterproductive for various reasons. Although the ideal DOP reflects this reality, the reported cost of plans with a degree of parallelism higher than the ideal DOP will be lower than the plan that uses the ideal DOP.

The final way to enable parallel query for a statement is to use the auto DOP feature introduced in Oracle database 11gR2. This feature basically means that the ideal DOP is used on all statements that are estimated to run for more than 10 seconds. Officially you are supposed to set up auto DOP using the DBMS_RESOURCE_MANAGER.CALIBRATE_IO routine so that the CBO can more accurately calculate the ideal DOP. Unofficially, the CBO team has suggested using the code in Listing 2-1 instead that specifies 200 MB per second throughput for the input/output (I/O) subsystem. All other metrics, like the number of disks, meaningless with enterprise storage, are left unspecified. Either way, you have to restart the database instance.

I have tried auto DOP using the unofficial approach on a large system with 64 processors and a 64GB System Global Area (SGA) and it worked really well. I have heard mixed reports from those who have used the documented approach. Some are happy and others have reported problems.

Query Transformation

The second of the CBO’s main tasks in optimizing a query is to consider the potential transformations that might be applied to the statement. In other words, the CBO looks at rewriting the statement in a simpler or more efficient way. These days there are a lot of possible transformations that the CBO can consider, and I’ll go through most of them in Chapter 13.

There are two types of query transformation:

· Cost-based transformation: The CBO should only apply the transformation when the optimal plan obtainable from the transformed query has a lower cost than the optimal plan available from the untransformed query.

· Heuristic transformation: This type of transformation is applied either unconditionally or based on some simple rules.

These days the vast majority of transformations are cost based, but there are one or two that are not. Regardless of the type of the transformation, you can always force it or suppress it with the aid of an optimizer hint. Listing 2-2 shows an example query where the CBO applies a heuristic query transformation inappropriately.

Listing 2-2. Simple view merging query transformation

CREATE TABLE t1
AS
SELECT 1 c1
FROM DUAL
CONNECT BY LEVEL <= 100;

CREATE TABLE t2
AS
SELECT 1 c2
FROM DUAL
CONNECT BY LEVEL <= 100;

CREATE TABLE t3
AS
SELECT 1 c3
FROM DUAL
CONNECT BY LEVEL <= 100;

CREATE TABLE t4
AS
SELECT 1 c4
FROM DUAL
CONNECT BY LEVEL <= 100;

EXPLAIN PLAN
FOR
WITH t1t2
AS (SELECT t1.c1, t2.c2
FROM t1, t2
WHERE t1.c1 = t2.c2)
,t3t4
AS (SELECT t3.c3, t4.c4
FROM t3, t4
WHERE t3.c3 = t4.c4)
SELECT COUNT (*)
FROM t1t2 j1, t3t4 j2
WHERE j1.c1 + j1.c2 = j2.c3 + j2.c4;

SELECT * FROM TABLE (DBMS_XPLAN.display (format => 'BASIC +COST'));

PAUSE

-- The above query is transformed into the query below

EXPLAIN PLAN
FOR
SELECT COUNT (*)
FROM t1
,t2
,t3
,t4
WHERE t1.c1 = t2.c2
AND t3.c3 = t4.c4
AND t1.c1 + t2.c2 = t3.c3 + t4.c4;

SELECT * FROM TABLE (DBMS_XPLAN.display (format => 'BASIC +COST'));
-- The resulting execution plan is this:

Plan hash value: 2241143226

-----------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 293 (89)|
| 1 | SORT AGGREGATE | | |
| 2 | HASH JOIN | | 293 (89)|
| 3 | TABLE ACCESS FULL | T4 | 2 (0)|
| 4 | HASH JOIN | | 36 (9)|
| 5 | TABLE ACCESS FULL | T2 | 2 (0)|
| 6 | MERGE JOIN CARTESIAN| | 31 (0)|
| 7 | TABLE ACCESS FULL | T1 | 2 (0)|
| 8 | BUFFER SORT | | 29 (0)|
| 9 | TABLE ACCESS FULL | T3 | 0 (0)|
-----------------------------------------------------

Listing 2-2 creates four tables and then queries them. The query has been explicitly written to suggest that T1 and T2 are joined first; T3 and T4 are joined next; and then the two intermediate results joined. However, the CBO has applied simple view merging and replaced the original query with the simple join of four tables highlighted in bold. I’ll explain this in detail and how to generate and analyze execution plans in Chapter 3, but now you just need to focus on the estimated cost for the query as a whole, which is reported on line 0 of the plan–293. Listing 2-3 adds some hints.

Listing 2-3. Simple view merging suppressed

WITH t1t2
AS (SELECT /*+ NO_MERGE */
t1.c1, t2.c2
FROM t1, t2
WHERE t1.c1 = t2.c2)
,t3t4
AS (SELECT /*+ NO_MERGE */
t3.c3, t4.c4
FROM t3, t4
WHERE t3.c3 = t4.c4)
SELECT COUNT (*)
FROM t1t2 j1, t3t4 j2
WHERE j1.c1 + j1.c2 = j2.c3 + j2.c4;

---------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
---------------------------------------------------
| 0 | SELECT STATEMENT | | 11 (28)|
| 1 | SORT AGGREGATE | | |
| 2 | HASH JOIN | | 11 (28)|
| 3 | VIEW | | 4 (0)|
| 4 | HASH JOIN | | 4 (0)|
| 5 | TABLE ACCESS FULL| T1 | 2 (0)|
| 6 | TABLE ACCESS FULL| T2 | 2 (0)|
| 7 | VIEW | | 4 (0)|
| 8 | HASH JOIN | | 4 (0)|
| 9 | TABLE ACCESS FULL| T3 | 2 (0)|
| 10 | TABLE ACCESS FULL| T4 | 2 (0)|
---------------------------------------------------

The two NO_MERGE hints have suppressed the simple view merging transformation and now the selected execution plan has a cost of 11! This is interesting, isn’t it? Most people assume that the only reason to apply optimizer hints is to get the CBO to select a plan that it considers suboptimal but they know is optimal. However, on this occasion the CBO believes the hinted plan is better than the one it chose. It didn’t pick the hinted plan itself because the unconditional nature of the simple view merging transformation meant that the optimal plan wasn’t even considered! How does the query perform in real life? All of this discussion about CBO costing is all very interesting but at the end of the day you are most interested in actual elapsed times, not cost estimates. In fact, on this occasion there is nothing much wrong with the costing algorithm. If you try running these statements, you are likely to find that the hinted query runs much faster than the unhinted one, but perhaps not 26 times faster as the cost estimates might suggest.

Final State Query Optimization

Let’s turn our attention briefly to the basic job of the CBO. Final state optimization involves determining:

· The join order for the various rows sources

· The join mechanisms

· The access method for each of the tables in a query

· Whether to use in-list iteration

I’ll discuss all of these items in some detail in Chapter 12.

Summary

This chapter briefly introduced some basic CBO concepts such as cost and described at a very high level the sort of decisions the CBO has to make and how it makes them. Part 3 of this book is entirely dedicated to the CBO, and I’ll explain query transformation and final state optimization in far more detail there. For now, let’s wrap up this introduction with three key messages:

· The CBO is trying to do a very complex job in a short amount of time.

· The CBO is just a (very) fancy guessing machine, and although its guesses are usually acceptable, often they aren’t and significant performance problems can arise.

· When the CBO costs an execution plan wrongly, it should not be of concern. But if it picks a slow running plan when a significantly faster one is available, this is of concern.

Now that I have introduced the CBO, it is time to start looking at what it produces. Chapter 3 takes a first look at execution plans.