Redo and Undo - Expert Oracle Database Architecture, Third Edition (2014)

Expert Oracle Database Architecture, Third Edition (2014)

Chapter 9. Redo and Undo

This chapter describes two of the most important pieces of data in an Oracle database: redo and undo. Redo is the information Oracle records in online (and archived) redo log files in order to “replay” your transaction in the event of a failure. Undo is the information Oracle records in the undo segments in order to reverse, or roll back, your transaction.

In this chapter, we will discuss topics such as how redo and undo (rollback) are generated, and how they fit into transactions, recovery, and so on. We’ll start off with a high-level overview of what redo and undo are and how they work together. We’ll then drill down into each topic, covering each in more depth and discussing what you, the developer, need to know about them.

The chapter is slanted toward the developer perspective in that we will not cover issues that a DBA should be exclusively in charge of figuring out and tuning. For example, we won’t cover how to find the optimum setting for RECOVERY_PARALLELISM or theFAST_START_MTTR_TARGET parameters. Nevertheless, redo and undo are topics that bridge the DBA and developer roles. Both need a good fundamental understanding of the purpose of redo and undo, how they work, and how to avoid potential issues with regard to their use. Knowledge of redo and undo also helps both DBAs and developers better understand how the database operates, in general.

In this chapter, I will present the pseudo-code for these mechanisms in Oracle and a conceptual explanation of what actually takes place. I will not cover every internal detail of what files get updated with what bytes of data. What actually takes place is a little more involved, but having a good understanding of the flow of how it works is valuable and will help you to understand the ramifications of your actions.

Image Note Time and time again, I get questions regarding the exact bits and bytes of redo and undo. People seem to want to have a very detailed specification of exactly, precisely, what is in there. I never answer those questions. Instead, I focus on the intent of redo and undo, the concepts behind redo and undo. I focus on the use of redo and undo—not on the bits and bytes. I myself do not “dump” redo log files or undo segments. I do use the supplied tools, such as LogMiner to read redo and flashback transaction history to read undo, but that presents the information to me in a human-readable format. So, we won’t be doing internals in this chapter but rather building a strong foundation.

What Is Redo?

Redo log files are crucial to the Oracle database. These are the transaction logs for the database. Oracle maintains two types of redo log files: online and archived. They are used for recovery purposes; their main purpose in life is to be used in the event of an instance or media failure.

If the power goes off on your database machine, causing an instance failure, Oracle will use the online redo logs to restore the system to exactly the committed point it was at immediately prior to the power outage. If your disk drive fails (a media failure), Oracle will use both archived redo logs and online redo logs to recover a backup of the data that was on that drive to the correct point in time. Moreover, if you “accidentally” truncate a table or remove some critical information and commit the operation, you can restore a backup of the affected data and recover it to the point in time immediately prior to the “accident” using online and archived redo log files.

Archived redo log files are simply copies of old, full online redo log files. As the system fills up log files, the ARCn process makes a copy of the online redo log file in another location, and optionally puts several other copies into local and remote locations as well. These archived redo log files are used to perform media recovery when a failure is caused by a disk drive going bad or some other physical fault. Oracle can take these archived redo log files and apply them to backups of the data files to catch them up to the rest of the database. They are the transaction history of the database.

Image Note With the advent of Oracle 10g, we now have flashback technology. This allows us to perform flashback queries (query the data as of some point in time in the past), un-drop a database table, put a table back the way it was some time ago, and so on. As a result, the number of occasions in which we need to perform a conventional recovery using backups and archived redo logs has decreased. However, the ability to perform a recovery is the DBA’s most important job. Database recovery is the one thing a DBA is not allowed to get wrong.

Every Oracle database has at least two online redo log groups with at least a single member (redo log file) in each group. These online redo log groups are written to in a circular fashion. Oracle will write to the log files in group 1, and when it gets to the end of the files in group 1, it will switch to log file group 2 and begin writing to that one. When it has filled log file group 2, it will switch back to log file group 1 (assuming you have only two redo log file groups; if you have three, Oracle would, of course, proceed to the third group).

Redo logs, or transaction logs, are one of the major features that make a database a database. They are perhaps its most important recovery structure, although without the other pieces such as undo segments, distributed transaction recovery, and so on, nothing works. They are a major component of what sets a database apart from a conventional file system. The online redo logs allow us to effectively recover from a power outage—one that might happen while Oracle is in the middle of a write. The archived redo logs let us recover from media failures when, for instance, the hard disk goes bad or human error causes data loss. Without redo logs, the database would not offer any more protection than a file system.

What Is Undo?

Undo is conceptually the opposite of redo. Undo information is generated by the database as you make modifications to data so that the data can be put back the way it was before the modifications took place. This might be done in support of multiversioning as we learn in Chapter 7, or in the event the transaction or statement you are executing fails for any reason, or if we request it with a ROLLBACK statement. Whereas redo is used to replay a transaction in the event of failure—to recover the transaction—undo is used to reverse the effects of a statement or set of statements. Undo, unlike redo, is stored internally in the database in a special set of segments known as undo segments.

Image Note “Rollback segment” and “undo segment” are considered synonymous terms. Using manual undo management, the DBA will create “rollback segments.” Using automatic undo management, the system will automatically create and destroy “undo segments” as necessary. These terms should be considered the same for all intents and purposes in this discussion.

It is a common misconception that undo is used to restore the database physically to the way it was before the statement or transaction executed, but this is not so. The database is logically restored to the way it was—any changes are logically undone—but the data structures, the database blocks themselves, may well be different after a rollback. The reason for this lies in the fact that, in any multiuser system, there will be tens or hundreds or thousands of concurrent transactions. One of the primary functions of a database is to mediate concurrent access to its data. The blocks that our transaction modifies are, in general, being modified by many other transactions as well. Therefore, we can’t just put a block back exactly the way it was at the start of our transaction—that could undo someone else’s work!

For example, suppose our transaction executed an INSERT statement that caused the allocation of a new extent (i.e., it caused the table to grow). Our INSERT would cause us to get a new block, format it for use, and put some data into it. At that point, some other transaction might come along and insert data into this block. If we roll back our transaction, obviously we can’t unformat and unallocate this block. Therefore, when Oracle rolls back, it is really doing the logical equivalent of the opposite of what we did in the first place. For every INSERT, Oracle will do aDELETE. For every DELETE, Oracle will do an INSERT. For every UPDATE, Oracle will do an “anti-UPDATE,” or an UPDATE that puts the row back the way it was prior to our modification.

Image Note This undo generation is not true for direct-path operations, which have the ability to bypass undo generation on the table. We’ll discuss these operations in more detail shortly.

How can we see this in action? Perhaps the easiest way is to follow these steps:

1. Create an empty table.

2. Full-scan the table and observe the amount of I/O performed to read it.

3. Fill the table with many rows (no commit).

4. Roll back that work and undo it.

5. Full-scan the table a second time and observe the amount of I/O performed.

So, let’s create an empty table:

EODA@ORA12CR1> create table t
2 as
3 select *
4 from all_objects
5 where 1=0;
Table created.

And now we’ll query it, with AUTOTRACE enabled in SQL*Plus to measure the I/O.

Image Note In this example, we will full-scan the table twice each time. The goal is to only measure the I/O performed the second time in each case. This avoids counting additional I/Os performed by the optimizer during any parsing and optimization that may occur.

The query initially takes no I/Os to full-scan the table:

EODA@ORA12CR1> select * from t;
no rows selected

EODA@ORA12CR1> set autotrace traceonly statistics
EODA@ORA12CR1> select * from t;
no rows selected

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads

EODA@ORA12CR1> set autotrace off

Now, that might surprise you at first—especially if you are an Oracle user dating back to versions before Oracle 11g Release 2—that there are zero I/Os against the table. This is due to a new Oracle 11g Release 2 feature—deferred segment creation.

Image Note The deferred segment creation feature is available only with the Enterprise Edition of Oracle. This feature is enabled by default in Oracle 11g Release 2 and higher. You can override this default behavior when creating the table. See Chapter 10 for further details.

If you run this example in older releases, you’ll likely see three or so I/O’s performed. We’ll discuss that in a moment, but for now let’s continue this example. Next, we’ll add lots of data to the table. We’ll make it “grow,” then roll it all back:

EODA@ORA12CR1> insert into t select * from all_objects;
18371 rows created.

EODA@ORA12CR1> rollback;
Rollback complete.

Now, if we query the table again, we’ll discover that it takes considerably more I/Os to read the table this time:

EODA@ORA12CR1> select * from t;
no rows selected

EODA@ORA12CR1> set autotrace traceonly statistics
EODA@ORA12CR1> select * from t;
no rows selected

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
317 consistent gets
0 physical reads

EODA@ORA12CR1> set autotrace off

The blocks that our INSERT caused to be added under the table’s high-water mark (HWM) are still there—formatted, but empty. Our full scan had to read them to see if they contained any rows. Moreover, the first time we ran the query, we observed zero I/Os. That was due to the default mode of table creation in Oracle 11g Release 2—using deferred segment creation. When we issued that CREATE TABLE, no storage, not a single extent, was allocated. The segment creation was deferred until the INSERT took place, and when we rolled back, the segment persisted. You can see this easily with a smaller example, I’ll explicitly request deferred segment creation this time although it is enabled by default in 11g Release 2:

EODA@ORA12CR1> drop table t purge;
Table dropped.

EODA@ORA12CR1> create table t ( x int )
2 segment creation deferred;
Table created.

EODA@ORA12CR1> select extent_id, bytes, blocks
2 from user_extents
3 where segment_name = 'T'
4 order by extent_id;
no rows selected

EODA@ORA12CR1> insert into t(x) values (1);
1 row created.

EODA@ORA12CR1> rollback;
Rollback complete.

EODA@ORA12CR1> select extent_id, bytes, blocks
2 from user_extents
3 where segment_name = 'T'
4 order by extent_id;

EXTENT_ID BYTES BLOCKS
---------- ---------- ----------
0 65536 8

As you can see, after the table was initially created there was no allocated storage—no extents were used by this table. Upon performing an INSERT, followed immediately by ROLLBACK, we can see the INSERT allocated storage—but the ROLLBACK does not “release” it.

Those two things together—that the segment was actually created by the INSERT but not “uncreated” by the ROLLBACK, and that the new formatted blocks created by the INSERT were scanned the second time around—show that a rollback is a logical “put the database back the way it was” operation. The database will not be exactly the way it was, just logically the same.

How Redo and Undo Work Together

Now let’s take a look at how redo and undo work together in various scenarios. We will discuss, for example, what happens during the processing of an INSERT with regard to redo and undo generation, and how Oracle uses this information in the event of failures at various points in time.

An interesting point to note is that undo information, stored in undo tablespaces or undo segments, is protected by redo as well. In other words, undo data is treated just like table data or index data—changes to undo generates some redo, which is logged (to the log buffer and then the redo log file). Why this is so will become clear in a moment when we discuss what happens when a system crashes. Undo data is added to the undo segment and is cached in the buffer cache, just like any other piece of data would be.

Example INSERT-UPDATE-DELETE-COMMIT Scenario

For this example, assume we’ve created a table with an index as follows:

create table t(x int, y int);
create index ti on t(x);

And then we will investigate what might happen with a set of statements like this:

insert into t (x,y) values (1,1);
update t set x = x+1 where x = 1;
delete from t where x = 2;

We will follow this transaction down different paths and discover the answers to the following questions:

· What happens if the system fails at various points in the processing of these statements?

· What happens if the buffer cache fills up?

· What happens if we ROLLBACK at any point?

· What happens if we succeed and COMMIT?

The INSERT

The initial INSERT INTO T statement will generate both redo and undo. The undo generated will be enough information to make the INSERT “go away.” The redo generated by the INSERT INTO T will be enough information to make the INSERT “happen again.”

After the INSERT has occurred, we have the scenario illustrated in Figure 9-1.

image

Figure 9-1. State of the system after an INSERT

There are some cached, modified undo blocks, index blocks, and table data blocks. Each of these blocks is protected by entries in the redo log buffer.

Hypothetical Scenario: The System Crashes Right Now

In this scenario, the system crashes before a COMMIT is issued or before the redo entries are written to disk (see Chapter 4 for details on mechanisms that cause redo to be written to disk). Everything is OK. The SGA is wiped out, but we don’t need anything that was in the SGA. It will be as if this transaction never happened when we restart. None of the blocks with changes got flushed to disk, and none of the redo got flushed to disk. We have no need of any of this undo or redo to recover from an instance failure.

Hypothetical Scenario: The Buffer Cache Fills Up Right Now

The situation is such that DBWn must make room and our modified blocks are to be flushed from the cache. In this case, DBWn will start by asking LGWR to flush the redo entries that protect these database blocks. Before DBWn can write any of the blocks that are changed to disk, LGWR must flush (to disk) the redo information related to these blocks. This makes sense: if we were to flush the modified blocks for table T (but not the undo blocks associated with the modifications) without flushing the redo entries associated with the undo blocks, and the system failed, we would have a modified table T block with no undo information associated with it. We need to flush the redo log buffers before writing these blocks out so that we can redo all of the changes necessary to get the SGA back into the state it is in right now, so that a rollback can take place.

This second scenario shows some of the foresight that has gone into all of this. The set of conditions described by “If we flushed table T blocks and did not flush the redo for the undo blocks and the system failed” is starting to get complex. It only gets more complex as we add users, and more objects, and concurrent processing, and so on.

At this point, we have the situation depicted in Figure 9-1. We have generated some modified table and index blocks. These have associated undo segment blocks, and all three types of blocks have generated redo to protect them. If you recall from our discussion of the redo log buffer inChapter 4, it is flushed at least every three seconds, when it is one-third full or contains 1MB of buffered data, or whenever a COMMIT or ROLLBACK takes place. It is very possible that at some point during our processing, the redo log buffer will be flushed. In that case, the picture will look like Figure 9-2.

image

Figure 9-2. State of the system after a redo log buffer flush

That is, we’ll have modified blocks representing uncommitted changes in the buffer cache and redo for those uncommitted changes on disk. This is a very normal scenario that happens frequently.

The UPDATE

The UPDATE will cause much of the same work as the INSERT to take place. This time, the amount of undo will be larger; we have some “before” images to save as a result of the UPDATE. Now we have the picture shown in Figure 9-3 (the dark rectangle in the redo log file represents the redo generated by the INSERT, the redo for the UPDATE is still in the SGA and has not yet been written to disk).

image

Figure 9-3. State of the system after the UPDATE

We have more new undo segment blocks in the block buffer cache. To undo the UPDATE, if necessary, we have modified database table and index blocks in the cache. We have also generated more redo log buffer entries. Let’s assume that our redo generated from the INSERT statement (discussed in the prior section) is on disk (in the redo log file) and redo generated from the UPDATE is in cache.

Hypothetical Scenario: The System Crashes Right Now

Upon startup, Oracle would read the redo log files and find some redo log entries for our transaction. Given the state in which we left the system, we have the redo entries generated by the INSERT in the redo log files (which includes redo for undo segments associated with the INSERT). However, the redo for the UPDATE was only in the log buffer and never made it to disk (and was wiped out when the system crashed). That’s okay, the transaction was never committed and the data files on disk reflect the state of the system before the UPDATE took place.

However, the redo for the INSERT was written to the redo log file. Therefore Oracle would “roll forward” the INSERT. We would end up with a picture much like Figure 9-1, with modified undo blocks (information on how to undo the INSERT), modified table blocks (right after theINSERT), and modified index blocks (right after the INSERT). Oracle will discover that our transaction never committed and will roll it back since the system is doing crash recovery and, of course, our session is no longer connected.

To roll back the uncommitted INSERT, Oracle will use the undo it just rolled forward (from the redo and now in the buffer cache) and apply it to the data and index blocks, making them look as they did before the INSERT took place. Now everything is back the way it was. The blocks that are on disk may or may not reflect the INSERT (it depends on whether or not our blocks got flushed before the crash). If the blocks on disk do reflect the INSERT, then the INSERT will be undone when the blocks are flushed from the buffer cache. If they do not reflect the undoneINSERT, so be it—they will be overwritten later anyway.

Image Note See Chapter 3 for a full discussion on checkpointing and when modified (dirty) buffers are written from the buffer cache to disk.

This scenario covers the rudimentary details of a crash recovery. The system performs this as a two-step process. First it rolls forward, bringing the system right to the point of failure, and then it proceeds to roll back everything that had not yet committed. This action will resynchronize the data files. It replays the work that was in progress and undoes anything that has not yet completed.

Hypothetical Scenario: The Application Rolls Back the Transaction

At this point, Oracle will find the undo information for this transaction either in the cached undo segment blocks (most likely) or on disk if they have been flushed (more likely for very large transactions). It will apply the undo information to the data and index blocks in the buffer cache, or if they are no longer in the cache request, they are read from disk into the cache to have the undo applied to them. These blocks will later be flushed to the data files with their original row values restored.

This scenario is much more common than the system crash. It is useful to note that during the rollback process, the redo logs are never involved. The only time redo logs are read for recovery purposes is during recovery and archival. This is a key tuning concept: redo logs are written to. Oracle does not read them during normal processing. As long as you have sufficient devices so that when ARCn is reading a file, LGWR is writing to a different device, there is no contention for redo logs. Many other databases treat the log files as “transaction logs.” They do not have this separation of redo and undo. For those systems, the act of rolling back can be disastrous—the rollback process must read the logs their log writer is trying to write to. They introduce contention into the part of the system that can least stand it. Oracle’s goal is to make it so that redo logs are written sequentially, and no one ever reads them while they are being written.

The DELETE

Again, undo is generated as a result of the DELETE, blocks are modified, and redo is sent over to the redo log buffer. This is not very different from before. In fact, it is so similar to the UPDATE that we are going to move right on to the COMMIT.

The COMMIT

We’ve looked at various failure scenarios and different paths, and now we’ve finally made it to the COMMIT. Here, Oracle will flush the redo log buffer to disk, and the picture will look like Figure 9-4.

image

Figure 9-4. State of the system after a COMMIT

The modified blocks are in the buffer cache; maybe some of them have been flushed to disk. All of the redo necessary to replay this transaction is safely on disk and the changes are now permanent. If we were to read the data directly from the data files, we probably would see the blocks as they existed before the transaction took place, as DBWn most likely has not yet written them. That’s OK—the redo log files can be used to bring those blocks up to date in the event of a failure. The undo information will hang around until the undo segment wraps around and reuses those blocks. Oracle will use that undo to provide for consistent reads of the affected objects for any session that needs them.

Commit and Rollback Processing

It is important to understand how redo log files might impact us as developers. We will look at how the different ways we can write our code affect redo log utilization. We’ve already seen the mechanics of redo earlier in the chapter, and now we’ll look at some specific issues. You might detect many of these scenarios, but they would be fixed by the DBA as they affect the database instance as a whole. We’ll start with what happens during a COMMIT, and then get into commonly asked questions and issues surrounding the online redo logs.

What Does a COMMIT Do?

As a developer, you should have a good understanding of exactly what goes on during a COMMIT. In this section, we’ll investigate what happens during the processing of the COMMIT statement in Oracle. A COMMIT is generally a very fast operation, regardless of the transaction size. You might think that the bigger a transaction (in other words, the more data it affects), the longer a COMMIT would take. This is not true. The response time of a COMMIT is generally “flat,” regardless of the transaction size. This is because a COMMIT does not really have too much work to do, but what it does do is vital.

One of the reasons this is an important fact to understand and embrace is that it will lead to letting your transactions be as big as they should be. As we discussed in the previous chapter, many developers artificially constrain the size of their transactions, committing every so many rows, instead of committing when a logical unit of work has been performed. They do this in the mistaken belief that they are preserving scarce system resources, when in fact they are increasing them. If a COMMIT of one row takes X units of time, and the COMMIT of 1,000 rows takes the same X units of time, then performing work in a manner that does 1,000 one-row COMMITs will take an additional 1,000*X units of time to perform. By committing only when you have to (when the logical unit of work is complete), you will not only increase performance, you’ll also reduce contention for shared resources (log files, various internal latches, and the like). A simple example demonstrates that it necessarily takes longer. We’ll use a Java application, although you can expect similar results from most any client—except, in this case, PL/SQL (we’ll discuss why that is after the example). To start, here is the sample table we’ll be inserting into:

SCOTT@ORA12CR1> create table test
2 ( id number,
3 code varchar2(20),
4 descr varchar2(20),
5 insert_user varchar2(30),
6 insert_date date
7 )
8 /
Table created.

Our Java program (stored in a file named perftest.java) will accept two inputs: the number of rows to INSERT (iters) and how many rows between commits (commitCnt). It starts by connecting to the database, setting autocommit off (which should be done in all Java code), and then calling a doInserts() method a total of two times:

· Once just to warm up the routine (make sure all of the classes are loaded)

· A second time, with SQL Tracing on, specifying the number of rows to INSERT along with how many rows to commit at a time (i.e., commit every N rows)

It then closes the connection and exits. The main method is as follows :

import java.sql.*;

public class perftest
{
public static void main (String arr[]) throws Exception
{
DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
Connection con = DriverManager.getConnection
("jdbc:oracle:thin:@csxdev:1521:ORA12CR1", "scott", "tiger");
Integer iters = new Integer(arr[0]);
Integer commitCnt = new Integer(arr[1]);

con.setAutoCommit(false);

doInserts( con, 1, 1 );

Statement stmt = con.createStatement ();
stmt.execute( "begin dbms_monitor.session_trace_enable(waits=>true); end;" );

doInserts( con, iters.intValue(), commitCnt.intValue() );

con.close();
}

Image Note The SCOTT account or whatever account you use to test this with will need to have the EXECUTE privilege granted on the DBMS_MONITOR package.

Now, the method doInserts() is fairly straightforward. It starts by preparing (parsing) an INSERT statement so we can repeatedly bind/execute it over and over:

static void doInserts(Connection con, int count, int commitCount )
throws Exception
{
PreparedStatement ps =
con.prepareStatement
("insert into test " +
"(id, code, descr, insert_user, insert_date)"
+ " values (?,?,?, user, sysdate)");

It then loops over the number of rows to insert, binding and executing the INSERT over and over. Additionally, it checks a row counter to see if it needs to COMMIT or not inside the loop :

int rowcnt = 0;
int committed = 0;

for (int i = 0; i < count; i++ )
{
ps.setInt(1,i);
ps.setString(2,"PS - code" + i);
ps.setString(3,"PS - desc" + i);
ps.executeUpdate();
rowcnt++;
if ( rowcnt == commitCount )
{
con.commit();
rowcnt = 0;
committed++;
}
}
con.commit();

System.out.println
("pstatement rows/commitcnt = " + count + " / " + committed );
}
}

Now we’ll run this code repeatedly with different inputs and review the resulting TKPROF file. We’ll run with 100,000 row inserts—committing 1 row at a time, then 10, and so on. The resulting TKPROF files produced the results in Table 9-1.

Table 9-1. Results from Inserting 100,000 Rows

image

As you can see, the more often you commit, the longer you wait (your mileage will vary on this). And the amount of time you wait is more or less directly proportional to the number of times you commit. Remember, this is just a single-user scenario; with multiple users doing the same work, all committing too frequently, the numbers will go up rapidly.

We’ve heard the same story, time and time again, with similar situations. For example, we’ve seen how not using bind variables and performing hard parses often severely reduces concurrency due to library cache contention and excessive CPU utilization. Even when we switch to using bind variables, soft parsing too frequently—caused by closing cursors even though we are going to reuse them shortly— incurs massive overhead. We must perform operations only when we need to—a COMMIT is just another such operation. It is best to size our transactions based on business need, not based on misguided attempts to lessen resource usage on the database.

There are two factors contributing to the expense of the COMMIT in this example:

· We’ve obviously increased the round-trips to and from the database. If we commit every record, we are generating that much more traffic back and forth. I didn’t even measure that, which would add to the overall runtime.

· Every time we commit, we must wait for our redo to be written to disk. This will result in a “wait.” In this case, the wait is named “log file sync.”

So, we committed after every INSERT, we waited every time for a short period of time—and if you wait a little bit of time but you wait often, it all adds up. Fully thirty seconds of our runtime was spent waiting for a COMMIT to complete when we committed 100,000 times—in other words, waiting for LGWR to write the redo to disk. In stark contrast, when we committed once, we didn’t wait very long (not a measurable amount of time actually). This proves that a COMMIT is a fast operation; we expect the response time to be more or less flat, not a function of the amount of work we’ve done.

So, why is a COMMIT’s response time fairly flat, regardless of the transaction size? It is because before we even go to COMMIT in the database, we’ve already done the really hard work. We’ve already modified the data in the database, so we’ve already done 99.9 percent of the work. For example, operations such as the following have already taken place:

· Undo blocks have been generated in the SGA.

· Modified data blocks have been generated in the SGA.

· Buffered redo for the preceding two items has been generated in the SGA.

· Depending on the size of the preceding three items and the amount of time spent, some combination of the previous data may be flushed onto disk already.

· All locks have been acquired.

When we COMMIT, all that is left to happen is the following:

· A System Change Number (SCN) is generated for our transaction. In case you are not familiar with it, the SCN is a simple timing mechanism Oracle uses to guarantee the ordering of transactions and to enable recovery from failure. It is also used to guarantee read-consistency and checkpointing in the database. Think of the SCN as a ticker; every time someone COMMITs, the SCN is incremented by one.

· LGWR writes all of our remaining buffered redo log entries to disk and records the SCN in the online redo log files as well. This step is actually the COMMIT. If this step occurs, we have committed. Our transaction entry is “removed” from V$TRANSACTION—this shows that we have committed.

· All locks recorded in V$LOCK held by our session are released, and everyone who was enqueued waiting on locks we held will be woken up and allowed to proceed with their work.

· Some of the blocks our transaction modified will be visited and “cleaned out” in a fast mode if they are still in the buffer cache. Block cleanout refers to the lock-related information we store in the database block header. Basically, we are cleaning out our transaction information on the block, so the next person who visits the block won’t have to. We are doing this in a way that need not generate redo log information, saving considerable work later (this is discussed more fully in the upcoming “Block Cleanout” section).

As you can see, there is very little to do to process a COMMIT. The lengthiest operation is, and always will be, the activity performed by LGWR, as this is physical disk I/O. The amount of time spent by LGWR here will be greatly reduced by the fact that it has already been flushing the contents of the redo log buffer on a recurring basis. LGWR will not buffer all of the work you do for as long as you do it. Rather, it will incrementally flush the contents of the redo log buffer in the background as you are going along. This is to avoid having a COMMIT wait for a very long time in order to flush all of your redo at once.

So, even if we have a long-running transaction, much of the buffered redo log it generates would have been flushed to disk, prior to committing. On the flip side is the fact that when we COMMIT, we must typically wait until all buffered redo that has not been written yet is safely on disk. That is, our call to LGWR is by default a synchronous one. While LGWR may use asynchronous I/O to write in parallel to our log files, our transaction will normally wait for LGWR to complete all writes and receive confirmation that the data exists on disk before returning.

Image Note Oracle 11g Release 1 and above have an asynchronous wait as described in Chapter 8. However, that style of commit has limited general-purpose use, as discussed. Commits in any end-user-facing application should be synchronous.

Now, earlier I mentioned that we were using a Java program and not PL/SQL for a reason—and that reason is a PL/SQL commit-time optimization as discussed in Chapter 8. I said that our call to LGWR is by default a synchronous one and that we wait for it to complete its write. That is true in Oracle 12c Release 1 and before for every programmatic language except PL/SQL. The PL/SQL engine, realizing that the client does not know whether or not a COMMIT has happened in the PL/SQL routine until the PL/SQL routine is completed, does an asynchronous commit. It does not wait for LGWR to complete; rather, it returns from the COMMIT call immediately. However, when the PL/SQL routine is completed, when we return from the database to the client, the PL/SQL routine will wait for LGWR to complete any of the outstanding COMMITs. So, if you commit 100 times in PL/SQL and then return to the client, you will likely find you waited for LGWR once—not 100 times—due to this optimization. Does this imply that committing frequently in PL/SQL is a good or OK idea? No, not at all—just that it is not as bad an idea as it is in other languages. The guiding rule is to commit when your logical unit of work is complete—not before.

Image Note This commit-time optimization in PL/SQL may be suspended when you are performing distributed transactions or Data Guard in maximum availability mode. Since there are two participants, PL/SQL must wait for the commit to actually be complete before continuing. Also, it can be suspended by directly invoking COMMIT WORK WRITE WAIT in PL/SQL with database version Oracle 11g Release 1 and above.

To demonstrate that a COMMIT is a “flat response time” operation, we’ll generate varying amounts of redo and time the INSERTs and COMMITs. As we do these INSERTs and COMMITs, we’ll measure the amount of redo our session generates using this small utility function:

EODA@ORA12CR1> create or replace function get_stat_val( p_name in varchar2 ) return number
2 as
3 l_val number;
4 begin
5 select b.value
6 into l_val
7 from v$statname a, v$mystat b
8 where a.statistic# = b.statistic#
9 and a.name = p_name;
10
11 return l_val;
12 end;
13 /
Function created.

Image Note The owner of the previous function will need to have been directly granted the SELECT privilege on the V$ views V_$STATNAME and V_$MYSTAT.

Drop the table T (if it exists) and create an empty table T of the same structure as BIG_TABLE:

EODA@ORA12CR1> drop table t purge;

EODA@ORA12CR1> create table t
2 as
3 select *
4 from big_table
5 where 1=0;
Table created.

Image Note Directions on how to create and populate the BIG_TABLE table used in many examples is in the “Setting Up Your Environment” section at the very front of this book.

And we’ll measure the CPU and Elapsed time used to commit our transaction using the DBMS_UTILITY package routines GET_CPU_TIME and GET_TIME. The actual PL/SQL block used to generate the workload and report on it is:

EODA@ORA12CR1> declare
2 l_redo number;
3 l_cpu number;
4 l_ela number;
5 begin
6 dbms_output.put_line
7 ( '-' || ' Rows' || ' Redo' ||
8 ' CPU' || ' Elapsed' );
9 for i in 1 .. 6
10 loop
11 l_redo := get_stat_val( 'redo size' );
12 insert into t select * from big_table where rownum <= power(10,i);
13 l_cpu := dbms_utility.get_cpu_time;
14 l_ela := dbms_utility.get_time;
15 commit work write wait;
16 dbms_output.put_line
17 ( '-' ||
18 to_char( power( 10, i ), '9,999,999') ||
19 to_char( (get_stat_val('redo size')-l_redo), '999,999,999' ) ||
20 to_char( (dbms_utility.get_cpu_time-l_cpu), '999,999' ) ||
21 to_char( (dbms_utility.get_time-l_ela), '999,999' ) );
22 end loop;
23 end;
24 /

- Rows Redo CPU Elapsed
- 10 7,072 0 1
- 100 10,248 0 0
- 1,000 114,080 0 0
- 10,000 1,146,484 0 2
- 100,000 11,368,512 0 2
- 1,000,000 113,800,488 1 2

PL/SQL procedure successfully completed.

* This test was performed on a single-user machine with a 1.7 MB log buffer and three 500MB online redo log files. Times are in hundredths of seconds.

As you can see, as we generate varying amount of redo from 7,072 bytes to 113MB, the difference in time to COMMIT is not measurable using a timer with a one hundredth of a second resolution. As we were processing and generating the redo log, LGWR was constantly flushing our buffered redo information to disk in the background. So, when we generated 113MB of redo log information, LGWR was busy flushing every 1MB, or so. When it came to the COMMIT, there wasn’t much left to do—not much more than when we created ten rows of data. You should expect to see similar (but not exactly the same) results, regardless of the amount of redo generated.

What Does a ROLLBACK Do?

By changing the COMMIT to ROLLBACK, we can expect a totally different result. The time to roll back is definitely a function of the amount of data modified. I changed the script developed in the previous section to perform a ROLLBACK instead (simply change the COMMIT to ROLLBACK) and the timings are very different. Look at the results now:

EODA@ORA12CR1> declare
2 l_redo number;
3 l_cpu number;
4 l_ela number;
5 begin
6 dbms_output.put_line
7 ( '-' || ' Rows' || ' Redo' ||
8 ' CPU' || ' Elapsed' );
9 for i in 1 .. 6
10 loop
11 l_redo := get_stat_val( 'redo size' );
12 insert into t select * from big_table where rownum <= power(10,i);
13 l_cpu := dbms_utility.get_cpu_time;
14 l_ela := dbms_utility.get_time;
15 --commit work write wait;
16 rollback;
17 dbms_output.put_line
18 ( '-' ||
19 to_char( power( 10, i ), '9,999,999') ||
20 to_char( (get_stat_val('redo size')-l_redo), '999,999,999' ) ||
21 to_char( (dbms_utility.get_cpu_time-l_cpu), '999,999' ) ||
22 to_char( (dbms_utility.get_time-l_ela), '999,999' ) );
23 end loop;
24 end;
25 /

- Rows Redo CPU Elapsed
- 10 7,180 0 0
- 100 10,872 0 0
- 1,000 121,880 0 0
- 10,000 1,224,864 0 0
- 100,000 12,148,416 2 4
- 1,000,000 121,733,580 25 36

PL/SQL procedure successfully completed.

This difference in CPU and Elapsed timings is to be expected, as a ROLLBACK has to undo the work we’ve done. Similar to a COMMIT, a series of operations must be performed. Before we even get to the ROLLBACK, the database has already done a lot of work. To recap, the following would have happened:

· Undo segment records have been generated in the SGA.

· Modified data blocks have been generated in the SGA.

· A buffered redo log for the preceding two items has been generated in the SGA.

· Depending on the size of the preceding three items and the amount of time spent, some combination of the previous data may be flushed onto disk already.

· All locks have been acquired.

When we ROLLBACK,

· We undo all of the changes made. This is accomplished by reading the data back from the undo segment and, in effect, reversing our operation and then marking the undo entry as applied. If we inserted a row, a ROLLBACK will delete it. If we updated a row, a rollback will reverse the update. If we deleted a row, a rollback will reinsert it again.

· All locks held by our session are released, and everyone who was enqueued waiting on locks we held will be released.

A COMMIT, on the other hand, just flushes any remaining data in the redo log buffers. It does very little work compared to a ROLLBACK. The point here is that you don’t want to roll back unless you have to. It is expensive since you spend a lot of time doing the work, and you’ll also spend a lot of time undoing the work. Don’t do work unless you’re sure you are going to want to COMMIT it. This sounds like common sense—of course I wouldn’t do all of the work unless I wanted to COMMIT it. However, I’ve often seen a developer use a “real” table as a temporary table, fill it up with data, report on it, and then roll back to get rid of the temporary data. Later we’ll talk about true temporary tables and how to avoid this issue.

Investigating Redo

As a developer, it’s often important to be able to measure how much redo your operations generate. The more redo you generate, the longer your operations may take, and the slower the entire system might be. You are not just affecting your session, but every session. Redo management is a point of serialization within the database. There is just one LGWR in any Oracle instance, and eventually all transactions end up at LGWR, asking it to manage their redo and COMMIT their transaction. The more it has to do, the slower the system will be. By seeing how much redo an operation tends to generate, and testing more than one approach to a problem, you can find the best way to do things.

Image Note Starting with Oracle 12c (on multiprocessor systems) log writer worker processes (LG00) are automatically started to improve the performance of writing to the redo log file.

Measuring Redo

It is pretty straightforward to see how much redo is being generated, as shown earlier in the chapter. I’ve used the AUTOTRACE built-in feature of SQL*Plus. But AUTOTRACE works only with simple DML—it can’t, for example, be used to view what a stored procedure call did. I’ve also used my utility function presented earlier, GET_STAT_VAL, to retrieve the “redo size” value from V$ tables. We’ll continue to use that function for the following exercise.

Let’s take a look at the difference in redo generated by conventional path INSERTs (the normal INSERTs you and I do every day) and direct-path INSERTs—used when loading large amounts of data into the database. We’ll use AUTOTRACE and the previously created tables T andBIG_TABLE for this simple example. First we’ll load the table using a conventional-path INSERT:

EODA@ORA12CR1> set autotrace traceonly statistics;
EODA@ORA12CR1> truncate table t;
Table truncated.

EODA@ORA12CR1> insert into t
2 select * from big_table;
1000000 rows created.

Statistics
----------------------------------------------------------
90 recursive calls
123808 db block gets
39407 consistent gets
13847 physical reads
113875056 redo size
1177 bytes sent via SQL*Net to client
1354 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1000000 rows processed

As you can see, that INSERT generated about 113MB of redo; we were expecting that because of the prior example in PL/SQL.

Image Note The example in this section was performed on a NOARCHIVELOG-mode database. If you are in ARCHIVELOG mode, the table would have to be created or set as NOLOGGING to observe this dramatic change. We will investigate the NOLOGGING attribute in more detail shortly in the section “Setting NOLOGGING in SQL.” Please make sure to coordinate all nonlogged operations with your DBA on a “real” system.

When we use a direct-path load in a NOARCHIVELOG-mode database, we get the following results:

EODA@ORA12CR1> truncate table t;
Table truncated.

EODA@ORA12CR1> insert /*+ APPEND */ into t
2 select * from big_table;
1000000 rows created.

Statistics
----------------------------------------------------------
551 recursive calls
16645 db block gets
15242 consistent gets
13873 physical reads
220504 redo size
1160 bytes sent via SQL*Net to client
1368 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
86 sorts (memory)
0 sorts (disk)
1000000 rows processed

EODA@ORA12CR1> set autotrace off

That INSERT generated only about 220KB—kilobytes, not megabytes—of redo. The method I outline using the V$MYSTAT view is useful in general for seeing the side effects of various options. The GET_STAT_VAL script is useful for small tests, with one or two operations.

Can I Turn Off Redo Log Generation?

This question is often asked. The simple short answer is no, since redo logging is crucial for the database; it is not overhead and it is not a waste. You do need it, regardless of whether you believe you do or not. It is a fact of life, and it is the way the database works. If you turned off redo, then any temporary failure of disk drives, power, or a software crash, would render the entire database unusable and unrecoverable. That said, however, there are some operations that can be done without generating redo log in some cases.

Image Note As of Oracle9i Release 2, a DBA can place the database into FORCE LOGGING mode. In that case, all operations are logged. The query SELECT FORCE_LOGGING FROM V$DATABASE may be used to see if logging is going to be forced or not. This feature is in support of Data Guard, a disaster-recovery feature of Oracle that relies on redo to maintain a standby database copy.

Setting NOLOGGING in SQL

Some SQL statements and operations support the use of a NOLOGGING clause. This does not mean that all operations against the object will be performed without generating redo, just that some very specific operations will generate significantly less redo than normal. Note that I said “significantly less redo,” not “no redo.” All operations will generate some redo—all data dictionary operations will be logged regardless of the logging mode. The amount of redo generated can be significantly less, however. For this example of the NOLOGGING clause, I ran the following in a database running in ARCHIVELOG mode:

EODA@ORA12CR1> select log_mode from v$database;

LOG_MODE
------------
ARCHIVELOG

EODA@ORA12CR1> drop table t purge;
Table dropped.

EODA@ORA12CR1> variable redo number
EODA@ORA12CR1> exec :redo := get_stat_val( 'redo size' );
PL/SQL procedure successfully completed.

EODA@ORA12CR1> create table t
2 as
3 select * from all_objects;
Table created.

EODA@ORA12CR1> exec dbms_output.put_line( (get_stat_val('redo size')-:redo) image
|| ' bytes of redo generated...' );
4487796 bytes of redo generated...
PL/SQL procedure successfully completed.

That CREATE TABLE generated about 4MB of redo information (your results will vary depending on how many rows are inserted into table T). We’ll drop and re-create the table, in NOLOGGING mode this time:

EODA@ORA12CR1> drop table t;
Table dropped.

EODA@ORA12CR1> variable redo number
EODA@ORA12CR1> exec :redo := get_stat_val( 'redo size' );
PL/SQL procedure successfully completed.

EODA@ORA12CR1> create table t
2 NOLOGGING
3 as
4 select * from all_objects;
Table created.

EODA@ORA12CR1> exec dbms_output.put_line( (get_stat_val('redo size')-:redo) image
|| ' bytes of redo generated...' );
90108 bytes of redo generated...
PL/SQL procedure successfully completed.

This time, we generated only 90KB of redo. As you can see, this makes a tremendous difference—4MB of redo vs. 90KB. The 4MB written in the first example is a copy of the actual table data itself; it was written to the redo log when the table was created without the NOLOGGINGclause.

If you test this on a NOARCHIVELOG-mode database, you will not see any differences between the two. The CREATE TABLE will not be logged, with the exception of the data dictionary modifications, in a NOARCHIVELOG-mode database. That fact also points out a valuable tip: test your system in the mode it will be run in production, as the behavior may be different. Your production system will be running in ARCHIVELOG mode; if you perform lots of operations that generate redo in this mode, but not in NOARCHIVELOG mode, you’ll want to discover this during testing, not during rollout to the users!

Of course, it is now obvious that you will do everything you can with NOLOGGING, right? In fact, the answer is a resounding no. You must use this mode very carefully, and only after discussing the issues with the person in charge of backup and recovery. Let’s say you create this table and it is now part of your application (e.g., you used a CREATE TABLE AS SELECT NOLOGGING as part of an upgrade script). Your users modify this table over the course of the day. That night, the disk that the table is on fails. “No problem,” the DBA says. “We are running inARCHIVELOG mode and we can perform media recovery.” The problem is, however, that the initially created table, since it was not logged, is not recoverable from the archived redo log. This table is unrecoverable and this brings out the most important point about NOLOGGING operations: they must be coordinated with your DBA and the system as a whole. If you use them and others are not aware of that fact, you may compromise the ability of your DBA to recover your database fully after a media failure. NOLOGGING operations must be used judiciously and carefully.

The important things to note about NOLOGGING operations are as follows:

· Some amount of redo will be generated, as a matter of fact. This redo is to protect the data dictionary. There is no avoiding this at all. It could be of a significantly lesser amount than before, but there will be some.

· NOLOGGING does not prevent redo from being generated by all subsequent operations. In the preceding example, I did not create a table that is never logged. Only the single, individual operation of creating the table was not logged. All subsequent “normal” operations such as INSERTs, UPDATEs, DELETEs, and MERGEs will be logged. Other special operations, such as a direct-path load using SQL*Loader, or a direct-path INSERT using the INSERT /*+ APPEND */ syntax, will not be logged (unless and until you ALTER the table and enable full logging again). In general, however, the operations your application performs against this table will be logged.

· After performing NOLOGGING operations in an ARCHIVELOG-mode database, you must take a new baseline backup of the affected data files as soon as possible, in order to avoid losing the data created by the NOLOGGING operation due to media failure. Since the data created by the NOLOGGING operation is not in the redo log files, and is not yet in the backups, you have no way of recovering it!

Setting NOLOGGING on an Index

There are two ways to use the NOLOGGING option. You have already seen one method—embedding the NOLOGGING keyword in the SQL command. The other method, which involves setting the NOLOGGING attribute on the segment (index or table), allows certain operations to be performed implicitly in a NOLOGGING mode. For example, I can alter an index or table to be NOLOGGING by default. This means for the index that subsequent rebuilds of this index will not be logged (the index will not generate redo; other indexes and the table itself might, but this index will not). Using the table T we just created, we can observe:

EODA@ORA12CR1> select log_mode from v$database;

LOG_MODE
------------
ARCHIVELOG

EODA@ORA12CR1> create index t_idx on t(object_name);
Index created.

EODA@ORA12CR1> variable redo number
EODA@ORA12CR1> exec :redo := get_stat_val( 'redo size' );
PL/SQL procedure successfully completed.

EODA@ORA12CR1> alter index t_idx rebuild;
Index altered.

EODA@ORA12CR1> exec dbms_output.put_line( (get_stat_val('redo size')-:redo)
|| ' bytes of redo generated...');
672264 bytes of redo generated...
PL/SQL procedure successfully completed.

Image Note Again, this example was performed in an ARCHIVELOG-mode database. You would not see the differences in redo size in a NOARCHIVELOG mode database as the index CREATE and REBUILD operations are not logged in NOARCHIVELOG mode.

When the index is in LOGGING mode (the default), a rebuild of it generated about 600KB of redo. However, we can alter the index:

EODA@ORA12CR1> alter index t_idx nologging;
Index altered.

EODA@ORA12CR1> exec :redo := get_stat_val( 'redo size' );
PL/SQL procedure successfully completed.

EODA@ORA12CR1> alter index t_idx rebuild;
Index altered.

EODA@ORA12CR1> exec dbms_output.put_line( (get_stat_val('redo size')-:redo)
|| ' bytes of redo generated...');
39352 bytes of redo generated...

PL/SQL procedure successfully completed.

And now it generates a mere 39KB of redo. But that index is “unprotected” now. If the data files it was located in failed and had to be restored from a backup, we would lose that index data. Understanding that fact is crucial. The index is not recoverable right now—we need a backup to take place. Alternatively, the DBA could just re-create the index as we can re-create the index directly from the table data as well.

NOLOGGING Wrap-up

The operations that may be performed in a NOLOGGING mode are as follows:

· Index creations and ALTERs (rebuilds).

· Bulk INSERTs into a table using a direct-path INSERT such as that available via the /*+ APPEND */ hint or SQL*Loader direct-path loads. The table data will not generate redo, but all index modifications will (the indexes on this nonlogged table will generate redo).

· LOB operations (updates to large objects do not have to be logged).

· Table creations via CREATE TABLE AS SELECT.

· Various ALTER TABLE operations such as MOVE and SPLIT.

Used appropriately on an ARCHIVELOG-mode database, NOLOGGING can speed up many operations by dramatically reducing the amount of redo log generated. Suppose you have a table you need to move from one tablespace to another. You can schedule this operation to take place immediately before a backup occurs—you would ALTER the table to be NOLOGGING, move it, rebuild the indexes (without logging as well), and then ALTER the table back to logging mode. Now, an operation that might have taken X hours can happen in X/2 hours perhaps (I’m not promising a 50-percent reduction in runtime!). The appropriate use of this feature includes involving the DBA, or whoever is responsible for database backup and recovery or any standby databases. If that person is not aware that you’re using this feature and a media failure occurs, you may lose data, or the integrity of the standby database might be compromised. This is something to seriously consider.

Why Can’t I Allocate a New Log?

I get this question all of the time. You are getting warning messages to this effect (this will be found in alert.log on your server):

Thread 1 cannot allocate new log, sequence 1466
Checkpoint not complete
Current log# 3 seq# 1465 mem# 0: /.../...redo03.log

It might say Archival required instead of Checkpoint not complete, but the effect is pretty much the same. This is really something the DBA should be looking out for. This message will be written to alert.log on the server whenever the database attempts to reuse an online redo log file and finds that it can’t. This happens when DBWn has not yet finished checkpointing the data protected by the redo log or ARCn has not finished copying the redo log file to the archive destination. At this point, the database effectively halts as far as the end user is concerned. It stops cold. DBWn or ARCn will be given priority to flush the blocks to disk. Upon completion of the checkpoint or archival, everything goes back to normal. The reason the database suspends user activity is that there is simply no place to record the changes the users are making. Oracle is attempting to reuse an online redo log file, but because either the file would be needed to recover the database in the event of a failure (Checkpoint not complete), or the archiver has not yet finished copying it (Archival required), Oracle must wait (and the end users will wait) until the redo log file can safely be reused.

If you see that your sessions spend a lot of time waiting on a “log file switch,” “log buffer space,” or “log file switch checkpoint or archival incomplete,” you are most likely hitting this. You will notice it during prolonged periods of database modifications if your log files are sized incorrectly, or because DBWn and ARCn need to be tuned by the DBA or system administrator. I frequently see this issue with the “starter” database that has not been customized. The “starter” database typically sizes the redo logs far too small for any significant amount of work (including the initial database build of the data dictionary itself). As soon as you start loading up the database, you will notice that the first 1,000 rows go fast, and then things start going in spurts: 1,000 go fast, then hang, then go fast, then hang, and so on. These are the indications you are hitting this condition.

There are a couple of things you can do to solve this issue:

· Make DBWn faster. Have your DBA tune DBWn by enabling ASYNC I/O, using DBWn I/O slaves, or using multiple DBWn processes. Look at the I/O on the system and see if one disk or a set of disks is “hot” and you need to therefore spread the data out. The same general advice applies for ARCn as well. The pros of this are that you get “something for nothing” here—increased performance without really changing any logic/structures/code. There really are no downsides to this approach.

· Add more redo log files. This will postpone the Checkpoint not complete in some cases and, after a while, it will postpone the Checkpoint not complete so long that it perhaps doesn’t happen (because you gave DBWn enough breathing room to checkpoint). The same applies to the Archival required message. The benefit of this approach is the removal of the “pauses” in your system. The downside is it consumes more disk, but the benefit far outweighs any downside here.

· Re-create the log files with a larger size. This will extend the amount of time between the time you fill the online redo log and the time you need to reuse it. The same applies to the Archival required message, if the redo log file usage is “bursty.” If you have a period of massive log generation (nightly loads, batch processes) followed by periods of relative calm, then having larger online redo logs can buy enough time for ARCn to catch up during the calm periods. The pros and cons are identical to the preceding approach of adding more files. Additionally, it may postpone a checkpoint from happening until later, since checkpoints happen at each log switch (at least), and the log switches will now be further apart.

· Make checkpointing happen more frequently and more continuously. Use a smaller block buffer cache (not entirely desirable) or various parameter settings such as FAST_START_MTTR_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT. This will force DBWn to flush dirty blocks more frequently. The benefit to this approach is that recovery time from a failure is reduced. There will always be less work in the online redo logs to be applied. The downside is that blocks may be written to disk more frequently if they are modified often. The buffer cache will not be as effective as it could be, and it can defeat the block cleanout mechanism discussed in the next section.

The approach you take will depend on your circumstances. This is something that must be fixed at the database level, taking the entire instance into consideration.

Block Cleanout

In this section, we’ll discuss block cleanouts, or the removal of “locking”-related information on the database blocks we’ve modified. This concept is important to understand when we talk about the infamous ORA-01555: snapshot too old error in a subsequent section.

If you recall from Chapter 6, we talked about data locks and how they are managed. I described how they are actually attributes of the data, stored on the block header. A side effect of this is that the next time that block is accessed, we may have to clean it out—in other words, remove the transaction information. This action generates redo and causes the block to become dirty if it wasn’t already, meaning that a simple SELECT may generate redo and may cause lots of blocks to be written to disk with the next checkpoint. Under most normal circumstances, however, this will not happen. If you have mostly small- to medium-sized transactions (OLTP), or you have a data warehouse that performs direct-path loads or uses DBMS_STATS to analyze tables after load operations, you’ll find the blocks are generally cleaned for you. If you recall from the earlier section titled “What Does a COMMIT Do?” one of the steps of COMMIT-time processing is to revisit some blocks if they are still in the SGA and if they are accessible (no one else is modifying them), and then clean them out. This activity is known as a commit clean out and is the activity that cleans out the transaction information on our modified block. Optimally, our COMMIT can clean out the blocks so that a subsequent SELECT (read) will not have to clean it out. Only an UPDATE of this block would truly clean out our residual transaction information, and since the UPDATE is already generating redo, the cleanout is not noticeable.

We can force a cleanout to not happen, and therefore observe its side effects, by understanding how the commit cleanout works. In a commit list associated with our transaction, Oracle will record lists of blocks we have modified. Each of these lists is 20 blocks long, and Oracle will allocate as many of these lists as it needs—up to a point. If the sum of the blocks we modify exceeds 10 percent of the block buffer cache size, Oracle will stop allocating new lists. For example, if our buffer cache is set to cache 3,000 blocks, Oracle will maintain a list of up to 300 blocks (10 percent of 3,000). Upon COMMIT, Oracle will process each of these lists of 20 block pointers, and if the block is still available, it will perform a fast cleanout. So, as long as the number of blocks we modify does not exceed 10 percent of the number of blocks in the cache and our blocks are still in the cache and available to us, Oracle will clean them out upon COMMIT. Otherwise, it just skips them (i.e., does not clean them out).

With this understanding, we can set up artificial conditions to see how the cleanout works. I set my DB_CACHE_SIZE to a low value of 16MB, which is sufficient to hold 2,048 8KB blocks (my blocksize is 8KB). Next I create a table such that a row fits on exactly one block—I’ll never have two rows per block. Then I fill this table up with 10,000 rows and COMMIT. We know that 10,000 blocks far exceeds 10% of 2048, so the database will not be able to clean out all of these dirty blocks upon commit—most of them will not even be in the buffer cache anymore. I’ll measure the amount of redo I’ve generated so far, run a SELECT that will visit each block, and then measure the amount of redo that SELECT generated.

Image Note In order for this example to be reproducible and predictable, you’ll need to disable SGA automatic memory management. If that is enabled, there is a chance that the database will increase the size of your buffer cache—defeating the “math” I’ve worked out.

Surprisingly to many people, the SELECT will have generated redo. Not only that, but it will also have “dirtied” these modified blocks, causing DBWn to write them again. This is due to the block cleanout. Next, I’ll run the SELECT to visit every block once again and see that no redo is generated. This is expected, as the blocks are all “clean” at this point. We’ll start by creating our table:

EODA@ORA12CR1> create table t
2 ( id number primary key,
3 x char(2000),
4 y char(2000),
5 z char(2000)
6 )
7 /

Table created.

EODA@ORA12CR1> exec dbms_stats.set_table_stats( user, 'T',
numrows=>10000, numblks=>10000 );

PL/SQL procedure successfully completed.

I used DBMS_STATS to set table statistics so as to avoid any side effects from hard parsing later (Oracle tends to scan objects that have no statistics during a hard parse and this side effect would interfere with my example!). So, this is my table with one row per block (in my 8KB blocksize database). Next, we’ll inspect the block of code we’ll be executing against this table:

EODA@ORA12CR1> declare
2 l_rec t%rowtype;
3 begin
4 for i in 1 .. 10000
5 loop
6 select * into l_rec from t where id=i;
7 end loop;
8 end;
9 /
declare
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at line 6

That block failed, but that’s OK—we knew it would since there is no data in the table yet. I ran that block simply to get the hard parse of the SQL and PL/SQL performed so when we run it later, we won’t have to worry about side effects from hard parsing being counted. Now we are ready to load the data into our table and commit:

EODA@ORA12CR1> insert into t
2 select rownum, 'x', 'y', 'z'
3 from all_objects
4 where rownum <= 10000;
10000 rows created.

EODA@ORA12CR1> commit;
Commit complete.

And, finally, I’m ready to measure the amount of redo generated during the first read of the data:

EODA@ORA12CR1> variable redo number
EODA@ORA12CR1> exec :redo := get_stat_val( 'redo size' );
PL/SQL procedure successfully completed.

EODA@ORA12CR1> declare
2 l_rec t%rowtype;
3 begin
4 for i in 1 .. 10000
5 loop
6 select * into l_rec from t where id=i;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.

EODA@ORA12CR1> exec dbms_output.put_line( (get_stat_val('redo size')-:redo)
|| ' bytes of redo generated...');
802632 bytes of redo generated...
PL/SQL procedure successfully completed.

So, this SELECT generated about 802KB of redo during its processing. This represents the block headers it modified during the index read of the primary key index and the subsequent table read of T. DBWn will be writing these modified blocks back out to disk at some point in the future (actually, since the table doesn’t fit into the cache, we know that DBWn has already written out at least some of them). Now, if I run the query again

EODA@ORA12CR1> exec :redo := get_stat_val( 'redo size' );
PL/SQL procedure successfully completed.

EODA@ORA12CR1> declare
2 l_rec t%rowtype;
3 begin
4 for i in 1 .. 10000
5 loop
6 select * into l_rec from t where id=i;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.

EODA@ORA12CR1> exec dbms_output.put_line( (get_stat_val('redo size')-:redo)
|| ' bytes of redo generated...');
0 bytes of redo generated...
PL/SQL procedure successfully completed.

I see that no redo is generated—the blocks are all clean.

If we were to rerun the preceding example with the buffer cache set to hold a little more than 100,000 blocks, we’d find that we generate little to no redo on any of the SELECTs—we will not have to clean dirty blocks during either of our SELECT statements. This is because the10,000-plus (remember the index was modified as well) blocks we modified fit comfortably into 10 percent of our buffer cache, and we are the only users. There is no one else mucking around with the data, and no one else is causing our data to be flushed to disk or accessing those blocks. In a live system, it would be normal for at least some of the blocks to not be cleaned out sometimes.

This behavior will most affect you after a large INSERT (as just demonstrated), UPDATE, or DELETE—one that affects many blocks in the database (anything more than 10 percent of the size of the cache will definitely do it). You’ll notice that the first query to touch the block after this will generate a little redo and dirty the block, possibly causing it to be rewritten if DBWn had already flushed it or the instance had been shut down, clearing out the buffer cache altogether. There is not too much you can do about it. It is normal and to be expected. If Oracle didn’t do this deferred cleanout of a block, a COMMIT could take as long to process as the transaction itself. The COMMIT would have to revisit each and every block, possibly reading them in from disk again (they could have been flushed).

If you are not aware of block cleanouts and how they work, they will be one of those mysterious things that just seem to happen for no reason. For example, say you UPDATE a lot of data and COMMIT. Now you run a query against that data to verify the results. The query appears to generate tons of write I/O and redo. It seems impossible if you are unaware of block cleanouts; it was to me the first time I saw it. You go and get someone to observe this behavior with you, but it is not reproducible as the blocks are now “clean” on the second query. You simply write it off as one of those database mysteries—a mystery that only happens when you are alone.

In an OLTP system, you’ll probably never see a block cleanout happening, since those systems are characterized by small, short transactions that affect only a few blocks. By design, all or most of the transactions are short and sweet. Modify a couple of blocks and they all get cleaned out. In a warehouse where you make massive UPDATEs to the data after a load, block cleanouts may be a factor in your design. Some operations will create data on “clean” blocks. For example, CREATE TABLE AS SELECT, direct-path loaded data, and direct-path inserted (using the /* +APPEND */ hint) data will all create clean blocks. An UPDATE, normal INSERT, or DELETE may create blocks that need to be cleaned with the first read. This could really affect you if your processing consists of

· Bulk-loading lots of new data into the data warehouse

· Running UPDATEs on all of the data you just loaded (producing blocks that need to be cleaned out)

· Letting people query the data

You have to realize that the first query to touch the data will incur some additional processing if the block needs to be cleaned. Realizing this, you yourself should “touch” the data after the UPDATE. You just loaded or modified a ton of data—you need to analyze it at the very least. Perhaps you need to run some reports to validate the load. This will clean the block out and make it so the next query doesn’t have to. Better yet, since you just bulk-loaded the data, you now need to refresh the statistics anyway. Running the DBMS_STATS utility to gather statistics may well clean out all of the blocks as it just uses SQL to query the information and would naturally clean the blocks out as it goes along.

Log Contention

This, like the cannot allocate new log message, is something the DBA must fix, typically in conjunction with the system administrator. However, it is something a developer might detect as well if the DBA isn’t watching closely enough.

If you are faced with log contention, what you might observe is a large wait time on the “log file sync” event and long write times evidenced in the “log file parallel write” event in a Statspack report. If you see this, you may be experiencing contention on the redo logs; they are not being written fast enough. This can happen for many reasons. One application reason (one the DBA can’t fix, but the developer must) is that you are committing too frequently—committing inside of a loop doing INSERTs, for example. As demonstrated in the “What Does a COMMIT Do?” section, committing too frequently, aside from being a bad programming practice, is a surefire way to introduce lots of log file sync waits. Assuming all of your transactions are correctly sized (you are not committing more frequently than your business rules dictate), the most common causes for log file waits that I’ve seen are as follows:

· Putting redo on a slow device: The disks are just performing poorly. It is time to buy faster disks.

· Putting redo on the same device as other files that are accessed frequently: Redo is designed to be written with sequential writes and to be on dedicated devices. If other components of your system—even other Oracle components—are attempting to read and write to this device at the same time as LGWR, you will experience some degree of contention. Here, you want to ensure LGWR has exclusive access to these devices if at all possible.

· Mounting the log devices in a buffered manner: Here, you are using a “cooked” file system (not RAW disks). The operating system is buffering the data, and the database is also buffering the data (redo log buffer). Double-buffering slows things down. If possible, mount the devices in a “direct” fashion. How to do this varies by operating system and device, but it is usually possible.

· Putting redo on a slow technology, such as RAID-5: RAID-5 is great for reads, but it is generally terrible for writes. As we saw earlier regarding what happens during a COMMIT, we must wait for LGWR to ensure the data is on disk. Using any technology that slows this down is not a good idea.

If at all possible, you really want at least five dedicated devices for logging and optimally six to mirror your archives as well. In these days of 200GB, 300GB, 1TB and larger disks, this is getting harder, but if you can set aside four of the smallest, fastest disks you can find and one or two big ones, you can affect LGWR and ARCn in a positive fashion. To lay out the disks, you would break them into three groups (see Figure 9-5):

· Redo log group 1: Disks 1 and 3

· Redo log group 2: Disks 2 and 4

· Archive: Disk 5 and optionally disk 6 (the big disks)

image

Figure 9-5. Optimal redo log configuration

You would place redo log group 1 with members A and B onto disks 1 and 3. You would place redo log group 2 with members C and D onto disks 2 and 4. If you have groups 3, 4, and so on, they’d go onto the odd and even groups of disks respectively. The effect of this is that LGWR, when the database is currently using group 1, will write to disks 1 and 3 simultaneously. When this group fills up, LGWR will move to disks 2 and 4. When they fill up, LGWR will go back to disks 1 and 3. Meanwhile, ARCn will be processing the full online redo logs and writing them to disks 5 and 6, the big disks. The net effect is neither ARCn nor LGWR is ever reading a disk being written to, or writing to a disk being read from, so there is no contention (see Figure 9-6).

image

Figure 9-6. Redo log flow

So, when LGWR is writing group 1, ARCn is reading group 2 and writing to the archive disks. When LGWR is writing group 2, ARCn is reading group 1 and writing to the archive disks. In this fashion, LGWR and ARCn each have their own dedicated devices and will not be contending with anyone, not even each other.

Temporary Tables and Redo/Undo

Temporary tables have been a feature within Oracle for several releases now (introduced in Oracle8i version 8.1.5). Even though temporary tables have been around for a while, there is still some confusion surrounding them, in particular in the area of logging. In Chapter 10, we will cover how and why you might use temporary tables. In this section, we’ll explore only the question “How do temporary tables work with respect to logging of changes?”

In Oracle 12c, the processing of undo for temporary tables is significantly enhanced. Therefore I’ll break this topic into two sections: Prior to 12c and Starting with 12c.

Prior to 12c

Temporary tables generate no redo for their blocks. Therefore, an operation on a temporary table is not recoverable. When you modify a block in a temporary table, no record of this change will be made in the redo log files. However, temporary tables do generate undo, and the undo is logged. Hence, temporary tables will generate some redo. At first glance, this doesn’t seem to make total sense: Why would they need to generate undo? This is because you can roll back to a SAVEPOINT within a transaction. You might erase the last 50 INSERTs into a temporary table, leaving the first 50. Temporary tables can have constraints and everything else a normal table can have. They might fail a statement on the five-hundredth row of a 500-row INSERT, necessitating a rollback of that statement. Since temporary tables behave in general just like normal tables, temporary tables must generate undo. Since undo data must be logged, temporary tables will generate some redo log for the undo they generate.

This is not nearly as ominous as it seems. The primary SQL statements used against temporary tables are INSERTs and SELECTs. Fortunately, INSERTs generate very little undo (you need to restore the block to “nothing,” and it doesn’t take very much room to store “nothing”), andSELECTs generate no undo. Hence, if you use temporary tables for INSERTs and SELECTs exclusively, this section means nothing to you. It is only if you UPDATE or DELETE that you might be concerned about this.

I set up a small test to demonstrate the amount of redo generated while working with temporary tables, an indication therefore of the amount of undo generated for temporary tables, since only the undo is logged for them. To demonstrate, I’ll take identically configured permanent and temporary tables, and then perform the same operations on each, measuring the amount of redo generated each time. The tables I’ll use are as follows:

EODA@ORA11GR2> create table perm
2 ( x char(2000) ,
3 y char(2000) ,
4 z char(2000) )
5 /
Table created.

EODA@ORA11GR2> create global temporary table temp
2 ( x char(2000) ,
3 y char(2000) ,
4 z char(2000) )
5 on commit preserve rows
6 /
Table created.

I set up a small stored procedure to allow me to perform arbitrary SQL and report the amount of redo generated by that SQL. I’ll use this routine to perform INSERTs, UPDATEs, and DELETEs against both the temporary and permanent tables:

EODA@ORA11GR2> create or replace procedure do_sql( p_sql in varchar2 )
2 as
3 l_start_redo number;
4 l_redo number;
5 begin
6 l_start_redo := get_stat_val( 'redo size' );
7
8 execute immediate p_sql;
9 commit;
10
11 l_redo := get_stat_val( 'redo size' ) - l_start_redo;
12
13 dbms_output.put_line
14 ( to_char(l_redo,'99,999,999') ||' bytes of redo generated for "' ||
15 substr( replace( p_sql, chr(10), ' '), 1, 25 ) || '"...' );
16 end;
17 /
Procedure created.

Then I ran equivalent INSERTs, UPDATEs, and DELETEs against both the PERM and TEMP tables:

EODA@ORA11GR2> set serveroutput on format wrapped
EODA@ORA11GR2> begin
2 do_sql( 'insert into perm
3 select 1,1,1
4 from all_objects
5 where rownum <= 500' );
6
7 do_sql( 'insert into temp
8 select 1,1,1
9 from all_objects
10 where rownum <= 500' );
11 dbms_output.new_line;
12
13 do_sql( 'update perm set x = 2' );
14 do_sql( 'update temp set x = 2' );
15 dbms_output.new_line;
16
17 do_sql( 'delete from perm' );
18 do_sql( 'delete from temp' );
19 end;
20 /

3,313,088 bytes of redo generated for "insert into perm "...
72,584 bytes of redo generated for "insert into temp "...

3,268,384 bytes of redo generated for "update perm set x = 2"...
1,946,432 bytes of redo generated for "update temp set x = 2"...

3,245,112 bytes of redo generated for "delete from perm"...
3,224,460 bytes of redo generated for "delete from temp"...

PL/SQL procedure successfully completed.

As you can see,

· The INSERT into the “real” table generated a lot of redo, while almost no redo was generated for the temporary table. This makes sense—there is very little undo data generated for INSERTs and only undo data is logged for temporary tables.

· The UPDATE of the real table generated about twice the amount of redo as the temporary table. Again, this makes sense. About half of that UPDATE, the “before image,” had to be saved. The “after image” (redo) for the temporary table did not have to be saved.

· The DELETEs each took about the same amount of redo space. This makes sense, because the undo for a DELETE is big, but the redo for the modified blocks is very small. Hence, a DELETE against a temporary table takes place very much in the same fashion as aDELETE against a permanent table.

Image Note If you see the temporary table generating more redo than the permanent table with the INSERT statement, you are observing a product issue in the database that is fixed in at least Oracle 9.2.0.6 and 10.1.0.4 patch releases and above.

Therefore, the following generalizations can be made regarding DML activity on temporary tables:

· An INSERT will generate little to no undo/redo activity.

· An UPDATE will generate about half the redo as with a permanent table.

· A DELETE will generate the same amount of redo as with a permanent table.

There are notable exceptions to the next to last statement. For example, if I UPDATE a column that is entirely NULL with 2,000 bytes of data, there will be very little undo data generated. This UPDATE will behave like the INSERT. On the other hand, if I UPDATE a column with 2,000 bytes of data to be NULL, it will behave like the DELETE as far as redo generation is concerned. On average, you can expect an UPDATE against a temporary table to produce about 50 percent of the undo/redo you’d experience with a permanent table.

In addition, you must consider any indexes in place on your temporary tables. Index modifications will also generate undo—which in turn generates redo. If you rerun the above example with these two indexes in place:

EODA@ORA11GR2> create index perm_idx on perm(x);
Index created.

EODA@ORA11GR2> create index temp_idx on temp(x);
Index created.

You will find redo generated in the order of (for brevity, all of the code from the previous example is not repeated here):

...
19 end;
20 /

11,735,576 bytes of redo generated for "insert into perm "...
3,351,864 bytes of redo generated for "insert into temp "...

9,257,748 bytes of redo generated for "update perm set x = 2"...
5,465,868 bytes of redo generated for "update temp set x = 2"...

4,434,992 bytes of redo generated for "delete from perm"...
4,371,620 bytes of redo generated for "delete from temp"...

PL/SQL procedure successfully completed.

The numbers hold true from what we saw before—but you can see that the index definitely added to the redo generated. The INSERT into the global temporary table went from generating almost no redo to generating 3.3MB of redo. All of this additional redo was related to the undo produced for the index maintenance.

Image Note This is an exaggerated example. The index in question was on a CHAR(2000) column; the index key is much larger than you’ll normally see in real life. Don’t expect this much additional redo typically.

In general, common sense prevails in estimating the amount of redo created. If the operation you perform causes undo data to be created, then determine how easy or hard it will be to reverse (undo) the effect of your operation. If you INSERT 2,000 bytes, the reverse of this is easy. You simply go back to no bytes. If you DELETE 2,000 bytes, the reverse is INSERTing 2,000 bytes. In this case, the redo is substantial.

Armed with this knowledge, you will avoid deleting from temporary tables. You can use TRUNCATE, bearing in mind, of course, that TRUNCATE is DDL that will commit your transaction, and in Oracle9i and before invalidate your cursors. Or just let the temporary tables empty themselves automatically after a COMMIT or when your session terminates. All of these methods generate no undo and, therefore, no redo. You should try to avoid updating a temporary table unless you really have to for some reason. You should use temporary tables mostly as something to be INSERTed into and SELECTed from. In this fashion, you’ll make optimum use of their unique ability to not generate redo.

Starting with 12c

As you saw in the previous section, when issuing INSERT, UPDATE, and DELETE statements in a temporary table, the undo for those changes is recorded in the undo tablespace, which in turn will generate redo. With the advent of Oracle 12c, you can instruct Oracle to store the undo for a temporary table in a temporary tablespace via the TEMP_UNDO_ENABLED parameter. When blocks are modified in a temporary tablespace, no redo is generated. Therefore, when TEMP_UNDO_ENABLED is set to TRUE, any DML issued against a temporary table will generate little or no redo.

Image Note By default, TEMP_UNDO_ENABLED is set to FALSE. So unless otherwise configured, temporary tables will generate the same amount of redo in 12c as in prior releases.

The TEMP_UNDO_ENABLED parameter can be set at the session or system level. Here’s an example of setting it to TRUE at the session level:

EODA@ORA12CR1> alter session set temp_undo_enabled=true;

Once enabled for a session, any modifications to data in a temporary table in that session will have a subsequent undo logged to the temporary tablespace. Any modifications to permanent tables will still have undo logged to the undo tablespace. To see the impact of this, I’ll rerun the exact same code (from the “Prior to 12c” section) that displays the amount of redo generated when issuing transactions against a permanent table and a temporary table—with the only addition being that TEMP_UNDO_ENABLED is set to TRUE. Here is the output:

3,312,148 bytes of redo generated for "insert into perm "...
376 bytes of redo generated for "insert into temp "...

2,203,788 bytes of redo generated for "update perm set x = 2"...
376 bytes of redo generated for "update temp set x = 2"...

3,243,412 bytes of redo generated for "delete from perm"...
376 bytes of redo generated for "delete from temp"...

The results are dramatic: a trivial amount of redo is generated by the INSERT, UPDATE, and DELETE statements in a temporary table. For environments where you perform large batch operations that transact against temporary tables, you can expect to see a significant reduction in the amount of redo generated.

Image Note You may be wondering why there were 376 bytes of redo generated in the prior example’s output. As processes consume space within the database, Oracle does some internal housekeeping. These changes are recorded in the data dictionary, which in turn generates some redo and undo.

Starting with Oracle12c, in an Oracle Active Data Guard configuration, you can issue DML statements directly on a temporary table that exists in a standby database. We can view the amount of redo generated for a temporary table in a standby database by running the same code (from the “Prior to 12c” section) against a standby database. The only difference being the statements issuing transactions against permanent tables must be removed (because you cannot issue DML on a permanent table in a standby database). Here is the output showing that 0 bytes of redo are generated:

0 bytes of redo generated for "insert into temp "...
0 bytes of redo generated for "update temp set x = 2"...
0 bytes of redo generated for "delete from temp"...

Image Note There’s no need to set TEMP_UNDO_ENABLED in the standby database. This is because temporary undo is always enabled in an Oracle Active Data Guard standby database.

Global temporary tables are often used for reporting purposes—like generating and storing intermediate query results. Oracle Active Data Guard is often used to offload reporting applications to the standby database. Couple global temporary tables with Oracle Active Data Guard, and you have a more powerful tool to address your reporting requirements.

Investigating Undo

We’ve already discussed a lot of undo segment topics. We’ve seen how they are used during recovery, how they interact with the redo logs, and how they are used for consistent, nonblocking reads of data. In this section, we’ll look at the most frequently raised issues with undo segments.

The bulk of our time will be spent on the infamous ORA-01555: snapshot too old error, as this single issue causes more confusion than any other topic in the entire set of database topics. Before we do this, however, we’ll investigate one other undo-related issue: the question of what type of DML operation generates the most and least undo (you might already be able to answer that yourself, given the preceding examples with temporary tables).

What Generates the Most and Least Undo?

This is a frequently asked but easily answered question. The presence of indexes (or the fact that a table is an index-organized table) may affect the amount of undo generated dramatically, as indexes are complex data structures and may generate copious amounts of undo information.

That said, an INSERT will, in general, generate the least amount of undo, since all Oracle needs to record for this is a rowid to “delete.” An UPDATE is typically second in the race (in most cases). All that needs to be recorded are the changed bytes. It is most common that you UPDATEsome small fraction of the entire row’s data. Therefore, a small fraction of the row must be remembered in the undo. Many of the previous examples run counter to this rule of thumb, but that’s because they update large, fixed-sized rows and they update the entire row. It is much more common to UPDATE a row and change a small percentage of the total row. A DELETE will, in general, generate the most undo. For a DELETE, Oracle must record the entire row’s before image into the undo segment. The previous temporary table example, with regard to redo generation, demonstrated that fact: the DELETE generated the most redo, and since the only logged element of the DML operation on a temporary table is the undo, we in fact observed that the DELETE generated the most undo. The INSERT generated very little undo that needed to be logged. TheUPDATE generated an amount equal to the before image of the data that was changed, and the DELETE generated the entire set of data written to the undo segment.

As previously mentioned, you must also take into consideration the work performed on an index. You’ll find that an update of an unindexed column not only executes much faster, it also tends to generate significantly less undo than an update of an indexed column. For example, we’ll create a table with two columns, both containing the same information, and index one of them:

EODA@ORA12CR1> create table t
2 as
3 select object_name unindexed,
4 object_name indexed
5 from all_objects
6 /
Table created.

EODA@ORA12CR1> create index t_idx on t(indexed);
Index created.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats(user,'T');
PL/SQL procedure successfully completed.

Now we’ll update the table, first updating the unindexed column and then the indexed column. We’ll need a new V$ query to measure the amount of undo we’ve generated in each case. The following query accomplishes this for us. It works by getting our session ID (SID) fromV$MYSTAT, using that to find our record in the V$SESSION view, and retrieving the transaction address (TADDR). It uses the TADDR to pull up our V$TRANSACTION record (if any) and selects the USED_UBLK column—the number of used undo blocks. Since we currently are not in a transaction, we expect it to return zero rows right now:

EODA@ORA12CR1> select used_ublk
2 from v$transaction
3 where addr = (select taddr
4 from v$session
5 where sid = (select sid
6 from v$mystat
7 where rownum = 1
8 )
9 )
10 /
no rows selected

But the query will return a row after the UPDATE starts a transaction:

EODA@ORA12CR1> update t set unindexed = lower(unindexed);
72077 rows updated.

EODA@ORA12CR1> select used_ublk
2 from v$transaction
3 where addr = (select taddr
4 from v$session
5 where sid = (select sid
6 from v$mystat
7 where rownum = 1
8 )
9 )
10 /

USED_UBLK
----------
151

EODA@ORA12CR1> commit;
Commit complete.

That UPDATE used 151 blocks to store its undo. The commit would free that up, or release it, so if we rerun the query against V$TRANSACTION, it would once again show us no rows selected. When we update the same data—only indexed this time—we’ll observe the following:

EODA@ORA12CR1> update t set indexed = lower(indexed);
72077 rows updated.

EODA@ORA12CR1> select used_ublk
2 from v$transaction
3 where addr = (select taddr
4 from v$session
5 where sid = (select sid
6 from v$mystat
7 where rownum = 1
8 )
9 )
10 /

USED_UBLK
----------
854

As you can see, updating that indexed column in this example generated several times as much undo. This is due to the inherit complexity of the index structure itself and the fact that we updated every single row in the table—moving every single index key value in this structure.

ORA-01555: Snapshot Too Old Error

In the last chapter, we briefly investigated the ORA-01555 error and looked at one cause of it: committing too frequently. Here we’ll take a much more detailed look at the causes and solutions for the ORA-01555 error. ORA-01555 is one of those errors that confound people. It is the foundation for many myths, inaccuracies, and suppositions.

Image Note ORA-01555 is not related to data corruption or data loss at all. It is a “safe” error in that regard; the only outcome is that the query that received this error is unable to continue processing.

The error is actually straightforward and has only two real causes, but since there’s a special case of one of them that happens so frequently, I’ll say that there are three:

· The undo segments are too small for the work you perform on your system.

· Your programs fetch across COMMITs (actually a variation on the preceding point). We covered this in Chapter 8.

· Block cleanout.

The first two points are directly related to Oracle’s read-consistency model. As you recall from Chapter 7, the results of your query are preordained, meaning they are well-defined before Oracle goes to retrieve even the first row. Oracle provides this consistent point in time “snapshot” of the database by using the undo segments to roll back blocks that have changed since your query began. Every statement you execute, such as the following:

update t set x = 5 where x = 2;
insert into t select * from t where x = 2;
delete from t where x = 2;
select * from t where x = 2;

will see a read-consistent view of T and the set of rows where X=2, regardless of any other concurrent activity in the database.

Image Note The four statements presented here are just examples of the types of statements that would see a read-consistent view of T. They are not meant to be run as a single transaction in the database, as the first update would cause the following three statements to see no records. They are purely illustrative.

All statements that “read” the table take advantage of this read consistency. In the example just shown, the UPDATE reads the table to find rows where x=2 (and then UPDATEs them). The INSERT reads the table to find rows where X=2, and then INSERTs them, and so on. It is this dual use of the undo segments, both to roll back failed transactions and to provide for read consistency that results in the ORA-01555 error.

The third item in the previous list is a more insidious cause of ORA-01555 in that it can happen in a database where there is a single session, and this session is not modifying the tables that are being queried when the ORA-01555 error is raised! This doesn’t seem possible—why would we need undo data for a table we can guarantee is not being modified? We’ll find out shortly.

Before we take a look at all three cases with illustrations, I’d like to share with you the solutions to the ORA-01555 error, in general:

· Set the parameter UNDO_RETENTION properly (larger than the amount of time it takes to execute your longest-running transaction). V$UNDOSTAT can be used to determine the duration of your long-running queries. Also, ensure sufficient space on disk has been set aside so the undo segments are allowed to grow to the size they need to be based on the requested UNDO_RETENTION.

· Increase the size of or add more undo segments when using manual undo management. This decreases the likelihood of undo data being overwritten during the course of your long-running query. This method goes toward solving all three of the previous points. Note that this is definitely not the preferred method; automatic undo management is highly recommended.

· Reduce the runtime of your query (tune it). This is always a good thing if possible, so it might be the first thing you try. It reduces the need for larger undo segments. This method goes toward solving all three of the previous points.

· Gather statistics on related objects. This helps avoid the third point listed earlier. Since the block cleanout is the result of a very large mass UPDATE or INSERT, statistics-gathering needs to be done anyway after a mass UPDATE or large load.

We’ll come back to these solutions, as they are important to know. It seemed appropriate to display them prominently before we begin.

Undo Segments Are in Fact Too Small

The scenario is this: you have a system where the transactions are small. As a result, you need very little undo segment space allocated. Say, for example, the following is true:

· Each transaction generates 8KB of undo on average.

· You do five of these transactions per second on average (40KB of undo per second, 2,400KB per minute).

· You have a transaction that generates 1MB of undo that occurs once per minute on average. In total, you generate about 3.5MB of undo per minute.

· You have 15MB of undo configured for the system.

That is more than sufficient undo for this database when processing transactions. The undo segments will wrap around and reuse space about every three to four minutes or so, on average. If you sized undo segments based on your transactions that do modifications, you did all right.

In this same environment, however, you have some reporting needs. Some of these queries take a really long time to run—five minutes, perhaps. Here is where the problem comes in. If these queries take five minutes to execute and they need a view of the data as it existed when the query began, you have a very good probability of the ORA-01555 error occurring. Since your undo segments will wrap during this query execution, you know that some undo information generated since your query began is gone—it has been overwritten. If you hit a block that was modified near the time you started your query, the undo information for this block will be missing, and you will receive the ORA-01555 error.

Here’s a small example. Let’s say we have a table with blocks 1, 2, 3, . . . 1,000,000 in it. Table 9-2 shows a sequence of events that could occur.

Table 9-2. Long-Running Query Timeline

Time (Minutes:Seconds)

Action

0:00

Our query begins.

0:01

Another session UPDATEs block 1,000,000. Undo information for this is recorded into some undo segment.

0:01

This UPDATE session COMMITs. The undo data it generated is still there, but is now subject to being overwritten if we need the space.

1:00

Our query is still chugging along. It is at block 200,000.

1:01

Lots of activity going on. We have generated a little over 14MB of undo by now.

3:00

Our query is still going strong. We are at block 600,000 or so by now.

4:00

Our undo segments start to wrap around and reuse the space that was active when our query began at time 0:00. Specifically, we have just reused the undo segment space that the UPDATE to block 1,000,000 used back at time 0:01.

5:00

Our query finally gets to block 1,000,000. It finds it has been modified since the query began. It goes to the undo segment and attempts to find the undo for that block to get a consistent read on it. At this point, it discovers the information it needs no longer exists. ORA-01555 is raised and the query fails.

This is all it takes. If your undo segments are sized such that they have a good chance of being reused during the execution of your queries, and your queries access data that will probably be modified, you stand a very good chance of hitting the ORA-01555 error on a recurring basis. If this is the case, you must set your UNDO_RETENTION parameter higher and let Oracle take care of figuring out how much undo to retain (this is the suggested approach; it’s much easier than trying to figure out the perfect undo size yourself) or resize your undo segments and make them larger (or have more of them). You need enough undo configured to last as long as your long-running queries. The system was sized for the transactions that modify data—you forgot to size for the other components of the system.

With Oracle9i and above, there are two methods to manage undo in the system:

· Automatic undo management: Here, Oracle is told how long to retain undo for, via the UNDO_RETENTION parameter. Oracle will determine how many undo segments to create based on concurrent workload and how big each should be. The database can even reallocate extents between individual undo segments at runtime to meet the UNDO_RETENTION goal set by the DBA. This is the recommended approach for undo management.

· Manual undo management: Here, the DBA does the work. The DBA determines how many undo segments to manually create, based on the estimated or observed workload. The DBA determines how big the segments should be based on transaction volume (how much undo is generated) and the length of the long-running queries.

Manual undo management, where a DBA figures out how many undo segments to have and how big each should be, is where one of the points of confusion comes into play. People say, “Well, we have XMB of undo configured, but this can grow. We have MAXEXTENTS set at 500 and each extent is 1MB, so the undo can get quite large.” The problem is that the manually managed undo segments will never grow due to a query; they will grow only due to INSERTs, UPDATEs, and DELETEs. The fact that a long-running query is executing does not cause Oracle to grow a manual undo segment to retain the data in case it might need it. Only a long-running UPDATE transaction would do this. In the preceding example, even if the manual undo segments had the potential to grow, they will not. What you need to do for this system is have manual undo segments that are already big. You need to permanently allocate space to the undo segments, not give them the opportunity to grow on their own.

The only solutions to this problem are to either make it so that the manual undo segments are sized so that they wrap only every six to ten minutes, or make it so your queries never take more than two to three minutes to execute. The first suggestion is based on the fact that you have queries that take five minutes to execute. In this case, the DBA needs to make the amount of permanently allocated undo two to three times larger. The second (perfectly valid) suggestion is equally appropriate. Any time you can make the queries go faster, you should. If the undo generated since the time your query began is never overwritten, you’ll avoid ORA-01555.

Under automatic undo management, things are much easier from the ORA-01555 perspective. Rather than having to figure out how big the undo space needs to be and then preallocating it, the DBA tells the database how long the longest-running query is and sets that value in theUNDO_RETENTION parameter. Oracle will attempt to preserve undo for at least that duration of time. If sufficient space to grow has been allocated, Oracle will extend an undo segment and not wrap around—in trying to obey the UNDO_RETENTION period. This is in direct contrast to manually managed undo, which will wrap around and reuse undo space as soon as it can. It is primarily for this reason, the support of the UNDO_RETENTION parameter, that I highly recommend automatic undo management whenever possible. That single parameter reduces the possibility of an ORA-01555 error greatly (when it is set appropriately).

When using manual undo management, it is also important to remember that the probability of an ORA-01555 error is dictated by the smallest undo segment in your system, not the largest and not the average. Adding one “big” undo segment will not make this problem go away. It only takes the smallest undo segment to wrap around while a query is processing, and this query stands a chance of an ORA-01555 error. This is why I was a big fan of equi-sized rollback segments when using the legacy rollback segments. In this fashion, each undo segment is both the smallest and the largest. This is also why I avoid using “optimally” sized undo segments. If you shrink an undo segment that was forced to grow, you are throwing away a lot of undo that may be needed right after that. It discards the oldest undo data when it does this, minimizing the risk, but still the risk is there. I prefer to manually shrink undo segments during off-peak times if at all.

I am getting a little too deep into the DBA role at this point, so we’ll move on to the next case. It’s just important that you understand that the ORA-01555 error in this case is due to the system not being sized correctly for your workload. The only solution is to size correctly for your workload. It is not your fault, but it is your problem since you hit it. It’s the same as if you run out of temporary space during a query. You either configure sufficient temporary space for the system, or you rewrite the queries so they use a plan that does not require temporary space.

To demonstrate this effect, we can set up a small, but somewhat artificial test. We’ll create a very small undo tablespace with one session that will generate many small transactions, virtually assuring us that it will wrap around and reuse its allocated space many times—regardless of theUNDO_RETENTION setting, since we are not permitting the undo tablespace to grow. The session that uses this undo segment will be modifying a table, T. It will use a full scan of T and read it from “top” to “bottom.” In another session, we will execute a query that will read the table T via an index. In this fashion, it will read the table somewhat randomly: it will read row 1, then row 1,000, then row 500, then row 20,001, and so on. In this way, we will tend to visit blocks very randomly and perhaps many times during the processing of our query. The odds of getting an ORA-01555 error in this case are virtually 100 percent. So, in one session we start with the following:

EODA@ORA12CR1> create undo tablespace undo_small
2 datafile '/tmp/undo.dbf' size 2m
3 autoextend off
4 /
Tablespace created.

EODA@ORA12CR1> alter system set undo_tablespace = undo_small;
System altered.

Now, we’ll set up the table T to query and modify. Note that we are ordering the data randomly in this table. The CREATE TABLE AS SELECT tends to put the rows in the blocks in the order it fetches them from the query. We’ll just scramble the rows up so they are not artificially sorted in any order, randomizing their distribution:

EODA@ORA12CR1> drop table t purge;
Table dropped.

EODA@ORA12CR1> create table t
2 as
3 select *
4 from all_objects
5 order by dbms_random.random;
Table created.

EODA@ORA12CR1> alter table t add constraint t_pk primary key(object_id);
Table altered.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats( user, 'T', cascade=> true );
PL/SQL procedure successfully completed.

And now we are ready to do our modifications:

EODA@ORA12CR1> begin
2 for x in ( select rowid rid from t )
3 loop
4 update t set object_name = lower(object_name) where rowid = x.rid;
5 commit;
6 end loop;
7 end;
8 /

Now, while that PL/SQL block of code is running, we will run a query in another session. That other query will read table T and process each record. It will spend about 1/100 of a second processing each record before fetching the next (simulated using DBMS_LOCK.SLEEP(0.01)). We will use the FIRST_ROWS hint in the query to have it use the index we created to read the rows out of the table via the index sorted by OBJECT_ID. Since the data was randomly inserted into the table, we would tend to query blocks in the table rather randomly. This block will only run for a couple of seconds before failing:

EODA@ORA12CR1> declare
2 cursor c is
3 select /*+ first_rows */ object_name
4 from t
5 order by object_id;
6
7 l_object_name t.object_name%type;
8 l_rowcnt number := 0;
9 begin
10 open c;
11 loop
12 fetch c into l_object_name;
13 exit when c%notfound;
14 dbms_lock.sleep( 0.01 );
15 l_rowcnt := l_rowcnt+1;
16 end loop;
17 close c;
18 exception
19 when others then
20 dbms_output.put_line( 'rows fetched = ' || l_rowcnt );
21 raise;
22 end;
23 /
rows fetched = 159
declare
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number 16 with name
"_SYSSMU16_587457654$" too small
ORA-06512: at line 21

As you can see, it got to process only 159 records before failing with the ORA-01555: snapshot too old error. To correct this, we want to make sure two things are done:

· UNDO_RETENTION is set in the database to be at least long enough for this read process to complete. That will allow the database to grow the undo tablespace to hold sufficient undo for us to complete.

· The undo tablespace is allowed to grow or you manually allocate more disk space to it.

For this example, I have determined my long-running process takes about 720 seconds to complete (I have about 72,000 records in the table, so at 0.01 seconds per row we have 720 seconds). My UNDO_RETENTION is set to 900 (this is in seconds, so the undo retention is about 15 minutes). I altered the undo tablespace’s data file to permit it to grow by 1MB at a time, up to 2GB in size:

EODA@ORA12CR1> alter database
2 datafile '/tmp/undo.dbf'
3 autoextend on
4 next 1m
5 maxsize 2048m;
Database altered.

When I ran the processes concurrently again, both ran to completion. The undo tablespace’s data file grew this time, because it was allowed to and the undo retention I set up said to.

EODA@ORA12CR1> select bytes/1024/1024
2 from dba_data_files
3 where tablespace_name = 'UNDO_SMALL';

BYTES/1024/1024
---------------
21

So, instead of receiving an error, we completed successfully, and the undo grew to be large enough to accommodate our needs. It is true that in this example, getting the error was purely due to the fact that we read the table T via the index and performed random reads all over the table. If we had rapidly full-scanned the table instead, there is a good chance we would not have received the ORA-01555 error in this particular case. This is because both the SELECT and UPDATE would have been full-scanning T, and the SELECT could most likely race ahead of the UPDATEduring its scan (the SELECT just has to read, but the UPDATE must read and update and therefore could go slower). By doing the random reads, we increase the probability that the SELECT will need to read a block, which the UPDATE modified and committed many rows ago. This just demonstrates the somewhat insidious nature of the ORA-01555 error. Its occurrence depends on how concurrent sessions access and manipulate the underlying tables.

Delayed Block Cleanout

This cause of the ORA-01555 error is hard to eliminate entirely, but it is rare anyway, as the circumstances under which it occurs do not happen frequently (at least not in Oracle8i and above anymore). We have already discussed the block cleanout mechanism, but to summarize, it is the process whereby the next session to access a block after it has been modified may have to check to see if the transaction that last modified the block is still active. Once the process determines that the transaction is not active, it cleans out the block so that the next session to access it does not have to go through the same process again. To clean out the block, Oracle determines the undo segment used for the previous transaction (from the block’s header) and then determines whether the undo header indicates that the transaction has been committed and, if so, when it committed. This confirmation is accomplished in one of two ways. One way is that Oracle can determine that the transaction committed a long time ago, even though its transaction slot has been overwritten in the undo segment transaction table. The other way is that the COMMIT SCN is still in the transaction table of the undo segment, meaning the transaction committed a short time ago, and its transaction slot hasn’t been overwritten.

To receive the ORA-01555 error from a delayed block cleanout, all of the following conditions must be met:

· A modification is made and COMMITed, and the blocks are not cleaned out automatically (e.g., the transaction modified more blocks than can fit in 10 percent of the SGA block buffer cache).

· These blocks are not touched by another session and will not be touched until our unfortunate query (displayed shortly) hits it.

· A long-running query begins. This query will ultimately read some of those blocks from earlier. This query starts at SCN t1, the read-consistent SCN it must roll data back to in order to achieve read consistency. The transaction entry for the modification transaction is still in the undo segment transaction table when we begin.

· During the query, many commits are made in the system. These transactions don’t touch the blocks in question (if they did, we wouldn’t have the impending problem as they would clean out the old transaction—solving the clean-out issue).

· The transaction tables in the undo segments roll around and reuse slots due to the high degree of COMMITs. Most important, the transaction entry for the original modification transaction is cycled over and reused. In addition, the system has reused undo segment extents, preventing a consistent read on the undo segment header block itself.

· Additionally, the lowest SCN recorded in the undo segment now exceeds t1 (it is higher than the read-consistent SCN of the query), due to the large number of commits.

When our query gets to the block that was modified and committed before it began, it is in trouble. Normally, it would go to the undo segment pointed to by the block and find the status of the transaction that modified it (in other words, it would find the COMMIT SCN of that transaction). If the COMMIT SCN is less than t1, our query can use this block. If the COMMIT SCN is greater than t1, our query must roll back that block. The problem is, however, that our query is unable to determine in this particular case if the COMMIT SCN of the block is greater than or less than t1. It is unsure as to whether it can use that block image or not. The ORA-01555 error then results.

To see this, we will create many blocks in a table that need to be cleaned out. We will then open a cursor on that table and allow many small transactions to take place against some other table—not the table we just updated and opened the cursor on. Finally, we will attempt to fetch the data for the cursor. Now, we know that the data required by the cursor will be “OK”—we should be able to see all of it since the modifications to the table would have taken place and been committed before we open the cursor. When we get an ORA-01555 error this time, it will be because of the previously described issue with delayed block cleanout. To set up for this example, we’ll use

· The 4MB UNDO_SMALL undo tablespace.

· A 16MB buffer cache, which is enough to hold about 2,000 blocks. This is so we can get some dirty blocks flushed to disk to observe this phenomenon.

Before we start, we’ll create the undo tablespace and the “big” table we’ll be querying:

EODA@ORA12CR1> create undo tablespace undo_small
2 datafile '/tmp/undo.dbf' size 4m
3 autoextend off
4 /
Tablespace created.

EODA@ORA12CR1> create table big
2 as
3 select a.*, rpad('*',1000,'*') data
4 from all_objects a;
Table created.

EODA@ORA12CR1> alter table big add constraint big_pk
2 primary key(object_id);
Table altered.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats( user, 'BIG' );
PL/SQL procedure successfully completed.

Image Note You might wonder why I didn’t use CASCADE=>TRUE on the gather-statistics call to gather statistics on the index created by default by the primary key constraint. That is because since Oracle 10g, a CREATE INDEX or ALTER INDEX REBUILD has implicit compute statistics added to it already whenever the table it is indexing is not empty. So, the very act of creating the index has the side effect of gathering statistics on itself. There’s no need to regather the statistics we already have.

The previous table will have lots of blocks as we get about six or seven rows per block using that big data field, and my ALL_OBJECTS table has over 70,000 rows. Next, we’ll create the small table the many little transactions will modify:

EODA@ORA12CR1> create table small ( x int, y char(500) );
Table created.

EODA@ORA12CR1> insert into small select rownum, 'x' from all_users;
25 rows created.

EODA@ORA12CR1> commit;
Commit complete.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats( user, 'SMALL' );
PL/SQL procedure successfully completed.

Now, we’ll dirty up that big table. We have a very small undo tablespace, so we’ll want to update as many blocks of this big table as possible, all while generating the least amount of undo possible. We’ll use a fancy UPDATE statement to do that. Basically, the following subquery is finding the “first” rowid of a row on every block. That subquery will return a rowid for every database block identifying a single row on it. We’ll update that row, setting a VARCHAR2(1) field. This will let us update all of the blocks in the table (some 8,000 plus in the example), flooding the buffer cache with dirty blocks that will have to be written out (we have room for only 500 right now). We’ll make sure we are using that small undo tablespace as well. To accomplish this and not exceed the capacity of our undo tablespace, we’ll craft an UPDATE statement that will update just the “first row” on each block. The ROW_NUMBER() built-in analytic function is instrumental in this operation; it assigns the number 1 to the “first row” by database block in the table, which would be the single row on the block we would update:

EODA@ORA12CR1> alter system set undo_tablespace = undo_small;
System altered.

EODA@ORA12CR1> update big
2 set temporary = temporary
3 where rowid in
4 (
5 select r
6 from (
7 select rowid r, row_number() over
8 (partition by dbms_rowid.rowid_block_number(rowid) order by rowid) rn
9 from big
10 )
11 where rn = 1
12 )
13 /
3064 rows updated.

EODA@ORA12CR1> commit;
Commit complete.

OK, so now we know that we have lots of dirty blocks on disk. We definitely wrote some of them out, because we just didn’t have the room to hold them all. Next, we will open a cursor, but it won’t yet fetch a single row. Remember, when we open the cursor, the resultset is preordained, so even though Oracle did not actually process a row of data, the act of opening that resultset fixed the point in time the results must be “as of.” Now since we’ll be fetching the data we just updated and committed, and we know no one else is modifying the data, we should be able to retrieve the rows without needing any undo at all. But that’s where the delayed block cleanout rears its head. The transaction that modified these blocks is so new that Oracle will be obliged to verify that it committed before we begin, and if we overwrite that information (also stored in the undo tablespace), the query will fail. So, here is the opening of the cursor:

EODA@ORA12CR1> variable x refcursor
EODA@ORA12CR1> exec open :x for select * from big where object_id < 100;
PL/SQL procedure successfully completed.

EODA@ORA12CR1>
EODA@ORA12CR1> !./run.sh

run.sh is a shell script; it simply fired off nine SQL*Plus sessions using a command:

$ORACLE_HOME/bin/sqlplus eoda/foo @test2 1 &
$ORACLE_HOME/bin/sqlplus eoda/foo @test2 2 &
... (3-8 would go here )...
$ORACLE_HOME/bin/sqlplus eoda/foo @test2 9 &

where each SQL*Plus session was passed a different number (that was number 1; there was a 2, 3, and so on). In the prior script, ensure you replace the eoda/foo with the username and password for your environment. The script test2.sql they each ran is as follows:

begin
for i in 1 .. 5000
loop
update small set y = i where x= &1;
commit;
end loop;
end;
/
exit

Image Note All of the scripts used in this example are available for download from the Apress web site for this book. In the ch09 folder, the demo11.sql script automates this example.

So, we had nine sessions inside of a tight loop initiate many transactions. The run.sh script waited for the nine SQL*Plus sessions to complete their work, and then we returned to our session, the one with the open cursor. Upon attempting to print it out, we observe the following:

EODA@ORA12CR1> print x
ERROR:
ORA-01555: snapshot too old: rollback segment number 17 with name
"_SYSSMU17_452567810$" too small
no rows selected

As I said, the preceding is a rare case. It took a lot of conditions, all of which must exist simultaneously to occur. We needed blocks that were in need of a cleanout to exist, and these blocks are rare in Oracle8i and above. A DBMS_STATS call to collect statistics gets rid of them so the most common causes—large mass updates and bulk loads—should not be a concern, since the tables need to be analyzed after such operations anyway. Most transactions tend to touch less than 10 percent of the blocks in the buffer cache; hence, they do not generate blocks that need to be cleaned out. If you believe you’ve encountered this issue, in which a SELECT against a table that has no other DML applied to it is raising the ORA-01555 error, try the following solutions:

· Ensure you are using “right-sized” transactions in the first place. Make sure you are not committing more frequently than you should.

· Use DBMS_STATS to scan the related objects, cleaning them out after the load. Since the block cleanout is the result of a very large mass UPDATE or INSERT, this needs to be done anyway.

· Allow the undo tablespace to grow by giving it the room to extend and increasing the undo retention. This decreases the likelihood of an undo segment transaction table slot being overwritten during the course of your long-running query. This is the same as the solution for the other cause of an ORA-01555 error (the two are very much related; you experience undo segment reuse during the processing of your query). In fact, I reran the preceding example with the undo tablespace set to autoextend 1MB at a time, with an undo retention of 900 seconds. The query against the table BIG completed successfully.

· Reduce the runtime of your query—tune it. This is always good if possible, so it might be the first thing you try.

Summary

In this chapter, we explored redo and undo and took a look at what they mean to the developer. I’ve mostly presented here situations or conditions you should be on the lookout for, since it is actually the DBAs or SAs who must correct these issues. The key point to take away from this chapter is the importance of redo and undo, and the fact that they are not overhead—they are integral components of the database and are necessary and mandatory. Once you have a good understanding of how they work and what they do, you’ll be able to make better use of them. Understanding that you are not “saving” anything by committing more frequently than you should (you are actually wasting resources, as it takes more CPU, more disk, and more programming) is probably the most important point of all. Be aware of what the database needs to do, and then let the database do it.