Expert Oracle Database Architecture, Third Edition (2014)

Chapter 15. Data Loading and Unloading

In this chapter, we will discuss data loading and unloading—in other words, how to get data into and out of an Oracle database. The main focus of the chapter is on the following bulk data loading tools:

· External tables: This is a feature with Oracle9i and above that permits access to operating system files as if they were database tables and, in Oracle 10g and above, even allows for the creation of operating system files as extracts of tables.

· SQL*Loader (pronounced “sequel loader” and hereafter referred to as SQLLDR): This is Oracle’s historical data loader that is still a common method for loading data.

In the area of data unloading, we’ll look at the following techniques:

· Data Pump unload: Data Pump is a binary format proprietary to Oracle and accessible via the Data Pump tool and external tables.

· Flat file unload: The flat file unloads will be custom developed implementations, but provide you with a result that is portable to other types of systems (even a spreadsheet).

External Tables

External tables were first introduced in Oracle9i Release 1. Put simply, they allow us to treat an operating system file as if it is a read-only database table. They are not intended to be a replacement for a “real” table, or to be used in place of a real table; rather, they are intended to be used as a tool to ease the loading and, in Oracle 10g and above, unloading of data.

When the external tables feature was first unveiled, I often referred to it as “the replacement for SQLLDR.” This idea still holds true—most of the time. Having said this, you might wonder why there is material in this chapter that covers SQLLDR. The reason is that SQLLDR has been around for a long time, and there are many, many legacy control files lying around. SQLLDR is still a commonly used tool; it is what many people know and have used. We are still in a period of transition from the use of SQLLDR to external tables, thus SQLLDR is still very relevant.

What many DBAs don’t realize is that their knowledge of SQLLDR control files is readily transferable to the use of external tables. You’ll discover, as we work through the examples in this part of the chapter, that external tables incorporate much of the SQLLDR syntax and many of the techniques.

Having said that, SQLLDR should be chosen over external tables in the following situations:

· You have to load data over a network—in other words, when the input file is not on the database server itself. One of the restrictions of external tables is that the input file must be accessible on the database server.

· Multiple users must concurrently work with the same external table processing different input files.

With those exceptions in mind, in general I strongly recommend using external tables for their extended capabilities. SQLLDR is a fairly simple tool that generates an INSERT statement and loads data. Its ability to use SQL is limited to calling SQL functions on a row-by-row basis. External tables open the entire SQL set of functionality to data loading. Some of the key functionality features that external tables have over SQLLDR, in my experience, are as follows:

· The ability to use complex WHERE conditions to selectively load data. SQLLDR has a WHEN clause to select rows to load, but you are limited to using only AND expressions and expressions using equality—no ranges (greater than, less than), no OR expressions, no IS NULL, and so on.

· The ability to MERGE data. You can take an operating system file full of data and update existing database records from it.

· The ability to perform efficient code lookups. You can join an external table to other database tables as part of your load process.

· The ability to load data sorted by including an ORDER BY statement in the CREATE TABLE or INSERT statement.

· Easier multitable inserts using INSERT. Starting in Oracle9i, an INSERT statement can insert into one or more tables using complex WHEN conditions. While SQLLDR can load into multiple tables, it can be quite complex to formulate the syntax.

· The ability to specify one or more operating system commands to be executed as the first step (preprocess) when selecting data from an external table.

· A shallower learning curve for new developers. SQLLDR is yet another tool to learn, in addition to the programming language, the development tools, the SQL language, and so on. As long as a developer knows SQL, he or she can immediately apply that knowledge to bulk data loading, without having to learn a new tool (SQLLDR).

So, with that in mind, let’s look at how to use external tables.

Setting Up External Tables

There are two simple methods for getting started with external tables:

· Executing SQLLDR with the EXTERNAL_TABLE parameter

· Starting with Oracle 12c, running SQLLDR in express mode

Interestingly, both techniques utilize the SQLLDR command-line tool. They are discussed in the following sections.

Executing SQLLDR with the EXTERNAL_TABLE Parameter

One of the easiest ways to get started with external tables is to use an existing legacy control file to provide the definition of the external table. As a first simple demonstration, a SQLLDR control file (complete details on SQLLDR control files comes later in this chapter in the SQLLDR section) is used as follows:

LOAD DATA
INFILE *
INTO TABLE DEPT
FIELDS TERMINATED BY ','
(DEPTNO, DNAME, LOC)
BEGINDATA
10,Sales,Virginia
20,Accounting,Virginia
30,Consulting,Virginia
40,Finance,Virginia

First make sure a DEPT table exists:

EODA@ORA12CR1> create table dept as select * from scott.dept;
Table created.

Now the following SQLLDR command will generate the CREATE TABLE statement for our external table:

$ sqlldr eoda demo1.ctl external_table=generate_only
Password:
SQL*Loader: Release 12.1.0.1.0 - Production on Fri Mar 7 16:28:38 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
Path used: External Table

The EXTERNAL_TABLE parameter has one of three values:

1. NOT_USED: This is self-evident in meaning and is the default value.

2. EXECUTE: This value means that SQLLDR will not generate a SQL INSERT statement and execute it. Rather, it will create an external table and use a single bulk SQL statement to load it.

3. GENERATE_ONLY: This value causes SQLLDR to not actually load any data, but only to generate the SQL DDL and DML statements it would have executed into the log file it creates.

Note DIRECT=TRUE overrides EXTERNAL_TABLE=GENERATE_ONLY. If you specify DIRECT=TRUE, the data will be loaded and no external table will be generated in Oracle 10g and before. In Oracle 11g Release 1 and above you will receive "SQL*Loader-144: Conflicting load methods: direct=true/external_table=generate_only specified" instead. Not that you would think to do that, but just be aware these two options are incompatible.

When using GENERATE_ONLY, we can see the following in the demo1.log file:

CREATE DIRECTORY statements needed for files
------------------------------------------------------------------------
CREATE DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 AS '/home/tkyte'

We may or may not see a CREATE DIRECTORY statement in the log file. SQLLDR connects to the database during the external table script generation and queries the data dictionary to see if a suitable directory already exists. In this case, there was no suitable directory in place, so SQLLDR generated a CREATE DIRECTORY statement for us. Next, it generated the CREATE TABLE statement for our external table:

CREATE TABLE statement for external table:
------------------------------------------
CREATE TABLE "SYS_SQLLDR_X_EXT_DEPT"
(
"DEPTNO" NUMBER(2),
"DNAME" VARCHAR2(14),
"LOC" VARCHAR2(13)
)

SQLLDR had logged into the database; that is how it knows the exact datatypes to be used in this external table definition (e.g., that DEPTNO is a NUMBER(2)). It picked them up right from the data dictionary. Next, we see the beginning of the external table definition:

ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000

The ORGANIZATION EXTERNAL clause tells Oracle this is not a “normal” table. We saw this clause before in Chapter 10 when we looked at IOTs. Currently there are three organization types: HEAP for a normal table, INDEX for an IOT, and EXTERNAL for an external table. The rest of the text starts to tell Oracle more about the external table. The ORACLE_LOADER type is one of two supported types (in Oracle9i it is the only supported type). The other type is ORACLE_DATAPUMP, the proprietary Data Pump format used by Oracle in Oracle 10g and later. We will take a look at that type in a subsequent section on data unloading—it is a format that can be used to both load and unload data. An external table may be used both to create a Data Pump format file and to subsequently read it.

The very next section we encounter is the ACCESS PARAMETERS section of the external table. Here we describe to the database how to process the input file. As you look at this, you should notice the similarity to a SQLLDR control file; this is no accident. For the most part, SQLLDR and external tables use very similar syntax:

ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'demo1.bad'
LOGFILE 'demo1.log_xt'
READSIZE 1048576
SKIP 6
FIELDS TERMINATED BY "," LDRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"DEPTNO" CHAR(255)
TERMINATED BY ",",
"DNAME" CHAR(255)
TERMINATED BY ",",
"LOC" CHAR(255)
TERMINATED BY ","
)
)

These access parameters show how to set up an external table so that it processes files pretty much identically to the way SQLLDR would:

· RECORDS: Records are terminated by newlines by default, as they are for SQLLDR.

· BADFILE: There is a bad file (a file where records that fail processing are recorded to) set up in the directory we just created.

· LOGFILE: There is a log file that is equivalent to SQLLDR’s log file set up in the current working directory.

· READSIZE: This is the default buffer used by Oracle to read the input data file. It is 1MB in this case. This memory comes from the PGA in dedicated server mode and the SGA in shared server mode, and it is used to buffer the information in the input data file for a session (refer to Chapter 4, where we discussed PGA and SGA memory). Keep that shared server fact in mind if you’re using shared servers: the memory is allocated from the SGA.

· SKIP 6: This determines how many records in the input file should be skipped. You might be asking, “Why ‘skip 6’?” Well, we used INFILE * in this example; SKIP 6 is used to skip over the control file itself to get to the embedded data. If we did not use INFILE *, there would be no SKIP clause at all.

· FIELDS TERMINATED BY: This is just as we used in the control file itself. However, the external table did add LDRTRIM, which stands for LoaDeR TRIM. This is a trim mode that emulates the way in which SQLLDR trims data by default. Other options includeLRTRIM, LTRIM, and RTRIM (for left/right trimming of whitespace); and NOTRIM to preserve all leading/trailing whitespace.

· REJECT ROWS WITH ALL NULL FIELDS: This causes the external table to log to the bad file any entirely blank lines and to not load that row.

· The column definitions themselves: This is the metadata about the expected input data values. They are all character strings in the data file to be loaded, and they can be up to 255 characters in length (SQLLDR’s default size), and terminated by , and optionally enclosed by quotes.

Note For a comprehensive list of all options available to you when using external tables, review the Oracle Utilities Guide. This reference contains a section dedicated to external tables. The Oracle SQL Language Reference Guide provides the basic syntax, but not the details of theACCESS PARAMETERS section.

Lastly, we get to the LOCATION section of the external table definition:

location
(
'demo1.ctl'
)
) REJECT LIMIT UNLIMITED

This tells Oracle the name of the file to load, which is demo1.ctl in this case since we used INFILE * in the original control file. The next statement in the control file is the default INSERT that can be used to load the table from the external table itself:

INSERT statements used to load internal tables:
-----------------------------------------------
INSERT /*+ append */ INTO DEPT
(
DEPTNO,
DNAME,
LOC
)
SELECT
"DEPTNO",
"DNAME",
"LOC"
FROM "SYS_SQLLDR_X_EXT_DEPT"

This would perform the logical equivalent of a direct path load if possible (assuming the APPEND hint may be obeyed; the existence of triggers or foreign key constraints may prevent the direct path operation from taking place).

Lastly, in the log file, we’ll see statements that may be used to remove the objects SQLLDR would have us create after the load was complete:

statements to cleanup objects created by previous statements:
-------------------------------------------------------------
DROP TABLE "SYS_SQLLDR_X_EXT_DEPT"
DROP DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000

And that is it. If we take that log file and insert / where appropriate to make it a valid SQL*Plus script, then we should be ready to go—or not, depending on the permissions in place. For example, assuming the schema I log into has the CREATE ANY DIRECTORY privilege or READand WRITE access to an existing directory, I might observe the following:

EODA@ORA12CR1> INSERT /*+ append */ INTO DEPT
2 (
3 DEPTNO,
4 DNAME,
5 LOC
6 )
7 SELECT
8 "DEPTNO",
9 "DNAME",
10 "LOC"
11 FROM "SYS_SQLLDR_X_EXT_DEPT"
12 /
INSERT /*+ append */ INTO DEPT
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04063: unable to open log file demo1.log_xt
OS error Permission denied
ORA-06512: at "SYS.ORACLE_LOADER", line 19
ORA-06512: at line 1

Well, that doesn’t seem right at first. I’m logged into the operating system as TKYTE, the directory I’m logging in to is /home/tkyte, and I own that directory, so I can certainly write to it (I created the SQLLDR log file there, after all). What happened? The fact is that the external table code is running in the Oracle server software in my dedicated or shared server. The process trying to read the input data file is the Oracle software owner, not my account. The process trying to create the log file is the Oracle software owner, not my account. Apparently, Oracle did not have the privilege required to write into my directory, and hence the attempted access of the external table failed. This is an important point. To read a table, the account under which the database is running (the Oracle software owner) must be able to

· Read the file we are pointing to. In UNIX/Linux, that means the Oracle software owner must have read and execute permissions on all directory paths leading to the file. In Windows, the Oracle software owner must be able to read that file.

· Write to the directories where the log file will be written to (or bypass the generation of the log file altogether, but this is not generally recommended). In fact, if the log file already exists, the Oracle software owner must be able to write to the existing file.

· Write to any of the bad files we have specified, just like the log file.

Returning to the example, the following command gives Oracle the ability to write into my directory:

EODA@ORA12CR1> host chmod a+rw.

Caution This command actually gives everyone the ability to write into our directory! This is just a demonstration; normally, we would use a special directory—perhaps owned by the Oracle software owner itself—to do this.

Next, I rerun my INSERT statement:

EODA@ORA12CR1> list
1 INSERT /*+ append */ INTO DEPT
2 (
3 DEPTNO,
4 DNAME,
5 LOC
6 )
7 SELECT
8 "DEPTNO",
9 "DNAME",
10 "LOC"
11* FROM "SYS_SQLLDR_X_EXT_DEPT"
EODA@ORA12CR1> /
4 rows created.

EODA@ORA12CR1> host ls -l demo1.log_xt
-rw-r----- 1 oracle dba 687 Mar 8 14:35 demo1.log_xt

You can see that this time I accessed the file, I successfully loaded four rows, and the log file was created and in fact is owned by “oracle,” not by my operating system account.

Running SQLLDR in Express Mode

Starting with Oracle 12c, SQLLDR express mode allows you to quickly load data from a CSV file into a table. If the schema you’re using has CREATE ANY DIRECTORY privilege, then express mode will attempt to use an external table to load the data. Otherwise it will use SQLLDR in direct path mode.

A simple example will illustrate this. Suppose you have a table created as follows:

EODA@ORA12CR1> create table dept
2 ( deptno number(2) constraint dept_pk primary key,
3 dname varchar2(14),
4 loc varchar2(13)
5 )
6 /

And you have the following data stored in a CSV file named dept.dat:

10,Sales,Virginia
20,Accounting,Virginia
30,Consulting,Virginia
40,Finance,Virginia

First, we’ll run this example when the user does not have the CREATE ANY DIRECTORY privilege. SQLLDR express mode is invoked via the TABLE keyword:

$ sqlldr eoda table=dept
Password:
SQL*Loader: Release 12.1.0.1.0 - Production on Sat Mar 8 14:42:02 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
Express Mode Load, Table: DEPT
Path used: External Table, DEGREE_OF_PARALLELISM=AUTO
SQL*Loader-816: error creating temporary directory object SYS_SQLLDR_XT_TMPDIR_00000 for file dept.dat
ORA-01031: insufficient privileges
SQL*Loader-579: switching to direct path for the load
SQL*Loader-583: ignoring trim setting with direct path, using value of LDRTRIM
SQL*Loader-584: ignoring DEGREE_OF_PARALLELISM setting with direct path, using value of NONE
Express Mode Load, Table: DEPT
Path used: Direct
Load completed - logical record count 4.
Table DEPT:
4 Rows successfully loaded.
Check the log file:
dept.log
for more information about the load.

The prior output tells us that the schema did not have the privilege to create a directory object; therefore, SQLLDR used its direct path method of loading the data. Querying the table confirms the data was successfully loaded:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia
20 Accounting Virginia
30 Consulting Virginia
40 Finance Virginia

Now we grant the CREATE ANY DIRECTORY privilege to the user:

SYS@ORA12CR1> grant create any directory to eoda;
Grant succeeded.

To set this up again, first remove the records from the DEPT table:

EODA@ORA12CR1> truncate table dept;

Now we invoke SQLLDR from the OS command line in express mode:

$ sqlldr eoda table=dept
Password:
SQL*Loader: Release 12.1.0.1.0 - Production on Sat Mar 8 14:45:53 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
Express Mode Load, Table: DEPT
Path used: External Table, DEGREE_OF_PARALLELISM=AUTO
Table DEPT:
4 Rows successfully loaded.
Check the log files:
dept.log
dept_%p.log_xt
for more information about the load.

Since express mode was able to create a directory object, an external table using dept.dat as its data source is accessed by an INSERT statement to load the data into the DEPT table. After the load is finished, the external table is dropped.

All of the code required to perform the prior steps is generated and recorded in the dept.log file. This file is automatically populated for you when running SQLLDR in express mode. If you only want SQL*Loader to generate the log file and not execute its contents, then specify theEXTERNAL_TABLE=GENERATE_ONLY option. For example:

$ sqlldr eoda table=dept external_table=generate_only

If you inspect the dept.log file, you’ll see the code that was generated. First, there’s a SQLLDR control file (SQLLDR control files are described in detail in the SQLLDR section of this chapter):

OPTIONS(EXTERNAL_TABLE=EXECUTE, TRIM=LRTRIM)
LOAD DATA
INFILE 'dept'
APPEND
INTO TABLE DEPT
FIELDS TERMINATED BY ","
(
DEPTNO,
DNAME,
LOC
)

Next is the SQL that will create an external table:

CREATE TABLE "SYS_SQLLDR_X_EXT_DEPT"
(
"DEPTNO" NUMBER(2),
"DNAME" VARCHAR2(14),
"LOC" VARCHAR2(13)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'dept_%p.bad'
LOGFILE 'dept_%p.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," LRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"DEPTNO" CHAR(255),
"DNAME" CHAR(255),
"LOC" CHAR(255)
)
)
location
(
'dept.dat'
)
)REJECT LIMIT UNLIMITED

That is followed by a direct path INSERT statement that can be used to load data from the external table into the regular database table (DEPT in this example):

INSERT /*+ append parallel(auto) */ INTO DEPT
(
DEPTNO,
DNAME,
LOC
)
SELECT
"DEPTNO",
"DNAME",
"LOC"
FROM "SYS_SQLLDR_X_EXT_DEPT"

Lastly, the temporary table and directory object are dropped:

DROP TABLE "SYS_SQLLDR_X_EXT_DEPT"
DROP DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000

The idea being that when you run SQLLDR in express mode with EXTERNAL_TABLE=GENERATE_ONLY, you can use the contents of the log file to run the SQL statements manually from SQL (if so desired).

Tip For full details of all options available with SQLLDR express mode, refer to the Oracle Database Utilities manual.

Dealing with Errors

In a perfect world, there would be no errors. The data in the input file would be perfect, and it would all load correctly. That almost never happens. So, how can we track errors with this process?

The most common method is to use the BADFILE option. Here, Oracle will record all records that failed processing. For example, if our control file contained a record with DEPTNO 'ABC', that record would fail and end up in the bad file because 'ABC' cannot be converted into a number. We’ll demonstrate that in the following example.

First, we add the following as the last line of demo1.ctl (this will add a line of data that cannot be loaded to our input):

ABC,XYZ,Hello

Next, we run the following command, to prove that the demo1.bad file does not yet exist:

EODA@ORA12CR1> host ls -l demo1.bad
ls: demo1.bad: No such file or directory

Then we query the external table to display the contents:

EODA@ORA12CR1> select * from SYS_SQLLDR_X_EXT_DEPT;

DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia
20 Accounting Virginia
30 Consulting Virginia
40 Finance Virginia

Now we will find that the file exists and we can review its contents:

EODA@ORA12CR1> host ls -l demo1.bad
-rw-r----- 1 oracle dba 14 Mar 9 10:38 demo1.bad

EODA@ORA12CR1> host cat demo1.bad
ABC,XYZ,Hello

But how can we programmatically inspect these bad records and the log that is generated? Fortunately, that is easy to do by using yet another external table. Suppose we set up this external table:

EODA@ORA12CR1> create table et_bad
2 ( text1 varchar2(4000) ,
3 text2 varchar2(4000) ,
4 text3 varchar2(4000)
5 )
6 organization external
7 (type oracle_loader
8 default directory SYS_SQLLDR_XT_TMPDIR_00000
9 access parameters
10 (
11 records delimited by newline
12 fields
13 missing field values are null
14 ( text1 position(1:4000),
15 text2 position(4001:8000),
16 text3 position(8001:12000)
17 )
18 )
19 location ('demo1.bad')
20 )
21 /
Table created.

This is just a table that can read any file without failing on a datatype error, as long as the lines in the file consist of fewer than 12,000 characters. If they are longer than 12,000 characters, then we can simply add more text columns to accommodate them.

We can clearly see the rejected records via a simple query:

EODA@ORA12CR1> select * from et_bad;

TEXT1 TEXT2 TEXT3
--------------- --------------- ---------------
ABC,XYZ,Hello

A COUNT(*) could tell us how many records were rejected. Another external table created on the log file associated with this external table could tell us why the record was rejected. We would need to go one step further to make this a repeatable process, however. The reason is that the bad file is not blanked out if there were no errors in our use of the external table. So, if there were some preexisting bad file with data in it and our external table generated no errors, we would be misled into thinking there were errors.

I’ve taken four approaches in the past to resolve this issue:

· Use UTL_FILE and reset the bad file—truncate it, in effect, by simply opening it for write and closing it.

· Use UTL_FILE to rename any preexisting bad files, preserving their contents, but allowing us to create a new one.

· Incorporate the PID into the bad (and log) file names. We’ll demonstrate this later in the “Multiuser Issues” section.

· Manually use operating system commands to resolve issues (like renaming the file, removing it, and so on).

In that fashion, we’ll be able to tell if the bad records in the bad file were generated by us just recently or if they were left over from some older version of the file itself and are not meaningful.

PROJECT REFERENCED COLUMNS

The COUNT(*) earlier in this section made me think about a feature available in Oracle 10g and above: the ability to optimize external table access by only accessing the fields in the external file that are referenced in the query. That is, if the external table is defined to have 100 number fields, but you select out only one of them, you can direct Oracle to bypass converting the other 99 strings into numbers. It sounds great, but it can cause a different number of rows to be returned from each query. Suppose the external table has 100 lines of data in it. All of the data for column C1 is “valid” and converts to a number. None of the data for column C2 is “valid,” and it does not convert into a number. If you select C1 from that external table, you’ll get 100 rows back. If you select C2 from that external table, you’ll get 0 rows back.

You have to explicitly enable this optimization, and you should think about whether it is safe for you to use or not (only you know enough about your application and its processing to answer the question “Is it safe?”). Using the earlier example with the bad line of data added, we would expect to see the following output upon querying our external table:

EODA@ORA12CR1> select dname from SYS_SQLLDR_X_EXT_DEPT;

DNAME
--------------
Sales
Accounting
Consulting
Finance

EODA@ORA12CR1> select deptno from SYS_SQLLDR_X_EXT_DEPT;

DEPTNO
----------
10
20
30
40

We know the bad record has been logged into the BADFILE. But if we simply ALTER the external table and tell Oracle to only project (process) the referenced columns as follows, we get different numbers of rows from each query:

EODA@ORA12CR1> alter table SYS_SQLLDR_X_EXT_DEPT
2 project column referenced
3 /

Table altered.

EODA@ORA12CR1> select dname from SYS_SQLLDR_X_EXT_DEPT;
DNAME
--------------
Sales
Accounting
Consulting
Finance
XYZ

EODA@ORA12CR1> select deptno from SYS_SQLLDR_X_EXT_DEPT;

DEPTNO
----------
10
20
30
40

The DNAME field was valid for every single record in the input file, but the DEPTNO column was not. If we do not retrieve the DEPTNO column, it does not fail the record—the resultset is materially changed.

Using an External Table to Load Different Files

A common need is to use an external table to load data from differently named files over a period of time. That is, this week we must load file1.dat, and next week it will be file2.dat, and so on. So far, we’ve been loading from a fixed file name, demo1.ctl. What if we subsequently need to load from a second file, demo2.ctl?

Fortunately, that is pretty easy to accommodate. The ALTER TABLE command may be used to repoint the location setting of the external table:

EODA@ORA12CR1> alter table SYS_SQLLDR_X_EXT_DEPT location( 'demo2.ctl' );
Table altered.

And that would pretty much be it—the very next query of that external table would have it accessing the file demo2.ctl.

Multiuser Issues

In the introduction to this section, I described three situations where external tables might not be as useful as SQLLDR. One of them was a specific multiuser issue. We just saw how to change the location of an external table—how to make it read from file 2 instead of file 1 and so on. The problem arises when multiple users each try to concurrently use that external table and have it point to different files for each session.

This cannot be done. The external table will point to a single file (or set of files) at any given time. If I log in and alter the table to point to file 1 and you do the same at about the same time, and then we both query that table, we’ll both be processing the same file.

This issue is generally not one that you should encounter—external tables are not a replacement for database tables; they are a means to load data, and as such you would not use them on a daily basis as part of your application. They are generally a DBA or developer tool used to load information, either as a one-time event or on a recurring basis, as in a data warehouse load. If the DBA has ten files to load into the database using the same external table, she would not do them sequentially—that is, pointing the external table to file 1 and processing it, then file 2 and processing it, and so on. Rather, she would simply point the external table to both files and let the database process both of them:

EODA@ORA12CR1> alter table SYS_SQLLDR_X_EXT_DEPT
2 location( 'file1.dat', 'file2.dat' )
3 /
Table altered.

If parallel processing is required, then the database already has the built-in ability to do this, as demonstrated in the last chapter.

So the only multiuser issue would be if two sessions both tried to alter the location at about the same time (assuming they had the privilege to ALTER the table)—and this is just a possibility to be aware of, not something I believe you’ll actually run into very often.

Another multiuser consideration is that of the bad and log file names. What if you have many sessions concurrently looking at the same external table or using parallel processing (which, in some respects, is a multiuser situation)? It would be nice to be able to segregate these files by session, and fortunately you can do that. You may incorporate the following special strings:

· %p: PID.

· %a: Parallel execution servers agent ID. The parallel execution servers have numbers 001, 002, 003, and so on assigned to them.

In this fashion, each session will tend to generate its own bad and log files. For example, let’s say that you used the following BADFILE syntax in the CREATE TABLE statement from earlier:

RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'demo1_%p.bad'
LOGFILE 'demo1.log_xt'

You would expect to find a file named similarly to the following if any records failed loading:

$ ls *.bad
demo1_7108.bad

However, you still might have issues over lengthy periods of time. The PIDs will be reused on most operating systems. So the techniques outlined in dealing with errors may well still be relevant—you’ll need to reset your bad file or rename it if it exists if you determine this to be an issue.

Preprocessing

Preprocessing is an external table feature that allows you to execute one or more operating system commands as the first step when selecting from an external table. This feature was added in Oracle 11g Release 2, but was subsequently back-ported to version 10.2.0.5 as well (so this works in Oracle 10g Release 2 and above). You invoke preprocessing by specifying the PREPROCESSOR clause. The input of this clause can be an OS command or a shell script.

Sometimes the utility of preprocessing isn’t intuitive. Consider the following use cases:

· Dynamically display the output of an OS command (or a combination of commands) by selecting from an external table

· Search for files and filter columns and/or rows before displaying the data

· Process and modify the contents of a file before returning the data

Examples of these scenarios follow.

Monitoring the File System Through SQL

In some of my customer’s databases, the DBAs are using data files with automatic extension, but with many data files sharing the same file system, such as the following and so on:

tablespace A, datafiles /u01/oradata/ts_A_file01.dbf autoextend unlimited
tablespace B, datafiles /u01/oradata/ts_B_file01.dbf autoextend unlimited

The requested extension is that all data files must be capable of growing by at least 20 percent of their current size; so if, for example, ts_A_file01.dbf currently is 100GB and ts_B_file01.dbf is 200GB, we must ensure that at least 20GB + 40GB = 60GB is free in the/u01/oradata file system.

The question is: How can we monitor this in a single query inside the database? Right now, we have a complex script gathering free space from the df command in a text file, opening a cursor, calculating the current allocated space from DBA_DATA_FILES, and reading the df data via an external table.

This can be done in a single SQL query. To accomplish this, I’ll need to be able to query disk free (df) interactively—without a complex set of maneuvers such as running a script or redirecting output. I’ll start by making it so that the df output can be queried as if it were in a table. I’ll use the preprocessor directive to achieve this.

To start I’ll create a directory where I can place a small shell script that will produce the df output:

EODA@ORA12CR1> create or replace directory exec_dir as '/orahome/oracle/bin';
Directory created.

To perform this operation safely, we need to grant EXECUTE on the directory object containing the program we want to invoke. This allows us to control precisely which program will be executed by Oracle database and avoid executing a “Trojan horse” program by accident. As a privileged account, I grant the following to the EODA user:

SYS@ORA12CR1> grant execute on directory exec_dir to eoda;

Next, I’ll create a shell script named run_df.bsh in that directory. This script will contain only the following:

#!/bin/bash
/bin/df –Pl

Also, I’ll make the shell script executable:

$ chmod +x run_df.bsh

And the output of that script will look something like this:

EODA@ORA12CR1> !./run_df.bsh
Filesystem 512-blocks Used Available Capacity Mounted on
rpool/ROOT/solaris-1 205406208 4576073 92886960 5% /
...
orapool1/ora01 629145600 382371882 246773718 61% /ora01
orapool2/ora02 629145600 429901328 199244272 69% /ora02
orapool1/ora03 629145600 415189806 213955794 66% /ora03
orapool2/ora04 629145600 343152972 285992628 55% /ora04

Note that in the run_df.bsh script, I used explicit pathnames to run df; I did not rely on the environment and on the path environment variable in particular. This is very important: when coding scripts for external tables—when coding scripts, in general—you always want to use explicit paths to run the program you actually intend to run. You don’t have any real control over the environment this script will run in, so relying on the environment being set a certain way is a recipe for disaster.

So, now that I have the script and the directory, I am ready to create the external table. As you can see from the following code, all I need to do is have the external table skip the first record and then parse each subsequent line, using white space as a delimiter. This is something an external table can do easily, as shown next:

EODA@ORA12CR1>create table df
2 (
3 fsname varchar2(100),
4 blocks number,
5 used number,
6 avail number,
7 capacity varchar2(10),
8 mount varchar2(100)
9 )
10 organization external
11 (
12 type oracle_loader
13 default directory exec_dir
14 access parameters
15 (
16 records delimited
17 by newline
18 preprocessor
19 exec_dir:'run_df.bsh'
20 skip 1
21 fields terminated by
22 whitespace ldrtrim
23 )
24 location
25 (
26 exec_dir:'run_df.bsh'
27 )
28 )
29 /
Table created.

With the df external table created, I can now review the df output easily in a query, as shown next:

EODA@ORA12CR1> select * from df;

FSNAME BLOCKS USED AVAIL CAPACITY MOUNT
-------------------- ---------- ---------- ---------- ---------- ------------------------------
orapool1/ora01 629145600 382371882 246773718 61% /ora01
orapool2/ora02 629145600 429901326 199244274 69% /ora02
orapool1/ora03 629145600 415189808 213955792 66% /ora03
orapool2/ora04 629145600 343152974 285992626 55% /ora04

Tip Hopefully, you see how this approach could work easily for ps, ls, du, and so on—all the UNIX/Linux utilities could easily be considered “tables” now!

With this data available to me in the df external table, it is easy to start working on the query. All I need to do is join df to DBA_DATA_FILES with a join condition that matches the longest mount point possible to each file name. My single-query solution to the issue in this example is shown in the next bit of code. Here’s what’s happening on some of the lines:

EODA@ORA12CR1> with fs_data
2 as
3 (select /*+ materialize */ *
4 from df
5 )
6 select mount,
7 file_name,
8 bytes/1024/1024 mbytes,
9 tot_bytes/1024/1024 tot_mbytes,
10 avail_bytes/1024/1024 avail_mbytes,
11 case
12 when 0.2 * tot_bytes < avail_bytes
13 then 'OK'
14 else 'Short on disk space'
15 end status
16 from (
17 select file_name, mount, avail_bytes, bytes,
18 sum(bytes) over
19 (partition by mount) tot_bytes
20 from (
21 select a.file_name,
22 b.mount,
23 b.avail*1024 avail_bytes, a.bytes,
24 row_number() over
25 (partition by a.file_name
26 order by length(b.mount) DESC) rn
27 from dba_data_files a,
28 fs_data b
29 where a.file_name
30 like b.mount || '%'
31 )
32 where rn = 1
33 )
34 order by mount, file_name
35 /

MOUNT FILE_NAME MBYTES TOT_MBYTES AVAIL_MBYTES STATUS
------- ---------------------------------------- ---------- ---------- ------------ --------
/ora01 /ora01/dbfile/ORA12CR1/cia_data_01.dbf 1024 93486 240989.959 OK
/ora01 /ora01/dbfile/ORA12CR1/config_tbsp_1_01.dbf 1500 93486 240989.959 OK
...
/ora04 /ora04/dbfile/ORA12CR1/dim_data08.dbf 30720 136202 279289.674 OK
/ora04 /ora04/dbfile/ORA12CR1/dim_data_02.dbf 30720 136202 279289.674 OK
48 rows selected.

On lines 3 and 4, I query the df external table. I purposely use a materialize hint to force the optimizer to load the df data into the equivalent of a global temporary table, because the query would tend to read and reread the external table over and over, and the results of the df table could change as the query runs. This provides the logical equivalent of a consistent read on the df data. Also, if the query plan did involve rereading the external table, I would receive an error message at runtime:

KUP-04108 unable to reread file string

The documentation explains the error:

· Cause: The query that is executing requires that a datasource for an external table be read multiple times. However, the datasource is a sequential device that cannot be reread. Examples of this type of datasource are a tape or a pipe.

· Action: There are a few ways around this problem. One is to rewrite the query so that the external table is referenced only once. Another option is to move the datasource to a rereadable device such as a disk file. A third option is to load the data for the external table into a temporary table and change the query so that it references the temporary table.

On lines 27–30, I join DBA_DATA_FILES to df data with a WHERE clause, using LIKE. This will join every file in DBA_DATA_FILES to every possible mount point in the df output. I know that the goal, however, is to find the “longest” matching mount point, so to accomplish that I assign—on lines 24–26—a ROW_NUMBER to each row. This ROW_NUMBER will be sequentially assigned to each duplicated row in DBA_DATA_FILES, so if the FILE_NAME matches more than one MOUNT, each FILE_NAME occurrence will be assigned a unique, increasingROW_NUMBER value. This ROW_NUMBER will be assigned after the data is sorted by the length of the MOUNT, from big to small.

Once I have that data, I apply a WHERE clause to save only the first entry for each FILE_NAME value—that predicate is WHERE rn = 1, on line 32. At the same time, I’ve added another column—TOT_MBYTES—on lines 18–19. This will enable me to verify the 20 percent threshold.

The last step is to format and output the data. I print the columns I’m interested in and add a CASE statement on lines 11–15 to verify that 20 percent of the total bytes of storage allocated on a given mount point does not exceed the remaining available bytes of free storage.

So, now you see how to use external tables to query operating system output such as df, ps, find, and ls. Additionally, you can use them to query anything that writes to standard out, including gunzip, sed, and so on.

Reading and Filtering Compressed Files in a Directory Tree

When working with large data loads, it’s common to use compressed files. Normally when working with compressed files as part of the data load processing, you would have an initial step that uncompresses the file and then another step to load it into staging tables. With external tables, you can streamline this by instructing the external table to uncompress the data as it is read from the compressed file.

An example will illustrate this. To set this up, suppose you have multiple zipped files that you want to read and process. And furthermore, assume that the files are in different directories and that you need the process to search through the various levels of directories, find the zipped files, and make the data available by selecting from an external table.

To set this up, we’ll first create three directories:

$ mkdir /tmp/base
$ mkdir /tmp/base/base2a
$ mkdir /tmp/base/base2b

Next, we’ll create three test files, compress them, and place each in a separate directory:

$ echo 'base col1,base col2' | gzip > /tmp/base/filebase.csv.gz
$ echo 'base2a col1,base2a col2' | gzip > /tmp/base/base2a/filebase2a.csv.gz
$ echo 'base2b col1,base2b col2' | gzip > /tmp/base/base2b/filebase2b.csv.gz

Next, two directory objects are created—one that points to the directory that will contain a shell script and another one that points to a base directory that serves a starting point for searching for files to be processed:

EODA@ORA12CR1> create or replace directory exec_dir as '/orahome/oracle/bin';
Directory created.

EODA@ORA12CR1> create or replace directory data_dir as '/tmp';
Directory created.

Next, a shell script is created named search_dir.bsh. The file is located in the /oracle/home/bin directory in this example. The following code is placed within the script:

#!/bin/bash
/usr/bin/find $* -name "*.gz" -exec /bin/zcat {} \; | /usr/bin/cut -f1 -d,

The prior script will search starting in a directory that gets passed into the script; it will search the base directory and all subdirectories underneath it for files with the extension.gz. And then for each file that is found, zcat is used to view uncompressed data. Finally, cut displays just the first column of data.

And this makes the shell script executable:

$ chmod +x search_dir.bsh

Now all we need is an external table the uses search_dir.bsh to display data in the compressed files:

EODA@ORA12CR1> create table csv
2 ( col1 varchar2(20)
3 )
4 organization external
5 (
6 type oracle_loader
7 default directory data_dir
8 access parameters
9 (
10 records delimited by newline
11 preprocessor exec_dir:'search_dir.bsh'
12 fields terminated by ',' ldrtrim
13 )
14 location
15 (
16 data_dir:'base'
17 )
18 )
19 /

Table created.

Now notice in the output that only the first column from the compressed files is displayed; this is because the code in search_dir.bsh uses cut to only display the first column:

EODA@ORA12CR1> select * from csv;

COL1
--------------------
base2a col1
base col1
base2b col1

Notice how we can dynamically alter the directory location where we want the search for files to begin:

EODA@ORA12CR1> create or replace directory data_dir as '/tmp/base';
EODA@ORA12CR1> alter table csv location( 'base2a' );

Now when we select from the table, it returns just one record:

COL1
--------------------
base2a col1

Also, we can easily modify the shell script code and have it filter the data based on other criteria, such as filtering rows by searching for a string within the CSV file (such as base2). Create a search_dir2.bsh script to do just that:

#!/bin/bash
/usr/bin/find $* -name "*.gz" -print0 | /usr/bin/xargs -0 -I {} /usr/bin/zgrep "base2" {}

And make it executable:

$ chmod +x search_dir2.bsh

Here a new external table definition allows for two columns in the output:

EODA@ORA12CR1> create table csv2
2 ( col1 varchar2(20)
3 ,col2 varchar2(20)
4 )
5 organization external
6 (
7 type oracle_loader
8 default directory data_dir
9 access parameters
10 (
11 records delimited by newline
12 preprocessor exec_dir:'search_dir2.bsh'
13 fields terminated by ',' ldrtrim
14 )
15 location
16 (
17 data_dir:'base'
18 )
19 )
20 /
Table created.

Select from the external table:

EODA@ORA12CR1> select * from csv2;

You can see from the output that only two rows are returned, the ones that contain the string of base2:

COL1 COL2
-------------------- --------------------
base2a col1 base2a col2
base2b col1 base2b col2

FINDING THE LARGEST FILES

When dealing with disk space issues, sometimes it’s handy to display the top N largest files in a directory tree. You can use a preprocessing external table to accomplish this task. First, create two directory objects:

EODA@ORA12CR1> create or replace directory exec_dir as '/orahome/oracle/bin';
EODA@ORA12CR1> create or replace directory data_dir as '/';

Then create a shell (place it in /orahome/oracle/bin) script named flf.bsh with the following code:

#!/bin/bash
/usr/bin/find $1 -ls|/bin/sort -nrk7|/usr/bin/head -10|/bin/awk '{print $11,$7}'

Make flf.bsh executable:

$ chmod +x flf.bsh

Now create an external table with the PREPROCESSOR directive:

create table flf (fname varchar2(200), bytes number)
organization external (
type oracle_loader
default directory exec_dir
access parameters
( records delimited by newline
preprocessor exec_dir:'flf.bsh'
fields terminated by whitespace ldrtrim)
location (data_dir:'u01'));

Now select from the external table to return the ten largest files under the directory (and subdirectories) of /u01:

EODA@ORA12CR1> select * from flf;
FNAME BYTES
---------------------------------------- ----------
/u01/dbfile/ORA12CR1/temp01.dbf 1.0737E+10
...
/u01/app/oracle/unloaddir/big_table.dat 2786618287

Now suppose you want to change the directory location to search a starting point from /u01 to /orahome/oracle. Do the following:

EODA@ORA12CR1> create or replace directory data_dir as '/orahome';
EODA@ORA12CR1> alter table flf location(data_dir:'oracle');

Now selecting from the external table, the directory path that starts searching for the largest files in a directory tree is changed:

EODA@ORA12CR1> select * from flf;
FNAME BYTES
------------------------------------------------------------ ----------
/orahome/oracle/orainst/12.1.0.2/database1.zip 1652417511
...
/orahome/oracle/orainst/12.1.0.2/database2.zip 1212882524

So not a typical use of an external table; rather it’s an illustration of what’s possible with preprocessing.

Trimming Characters Out of a File

I worked with a DBA who would receive an occasional fire-drill manager e-mail with an attached spreadsheet asking, “Can you load this spreadsheet into the production database?” In this environment, there wasn’t direct network access to the production database; so the DBA didn’t have the option of loading over the network via a tool like SQLLDR. In this situation, the steps for loading data are as follows:

1. Save the spreadsheet on a Windows laptop as a CSV file.

2. Copy the CSV file to a secure server configured specifically for file transfers to the production server, and then copy the CSV file to the production server.

3. Use an OS utility to remove hidden DOS characters that are embedded in the CSV file.

4. As the Oracle user, create an external table based on the CSV file.

5. Use SQL to insert into a production table by selecting from the external table.

Step 3 of the preceding list is what I’ll focus on here because this is where the preprocessor eliminates the need for that step (as for the other steps, the DBA will have to figure something else out long term).

Assume the file to be loaded is called load.csv located in the /tmp directory and contains the following data:

emergency data|load now^M
more data|must load data^M

The ^M characters are carriage-return characters from the Windows environment that should be removed before the data is loaded. To achieve this, we’ll set up a couple of directories and then use an external table with the PREPROCESSOR directive, which invokes a shell script to remove the special ^M characters before the data is retrieved:

EODA@ORA12CR1> create or replace directory data_dir as '/tmp';
Directory created.

EODA@ORA12CR1> create or replace directory exec_dir as '/orahome/oracle/bin';
Directory created.

Now create a shell script named run_sed.bsh that uses sed to remove ^M characters from a file. This example places the run_sed.bsh script in the /orahome/oracle/bin directory:

#!/bin/bash
/bin/sed -e 's/^M//g' $*

Tip Also consider using the dos2unix utility to remove unwanted characters when transferring from Windows/DOS to UNIX/Linux.

The ^M character in the run_sed.bsh script is placed in there by pressing CTRL+V and then CTRL+M (or instead of CTRL+M, you can press the Return key here); you can’t simply type in ^ and then an M. It needs to be the special character ^M.

And this makes the script executable:

$ chmod +x run_sed.bsh

Next is the external table definition:

EODA@ORA12CR1> create table csv3
2 ( col1 varchar2(20)
3 ,col2 varchar2(20)
4 )
5 organization external
6 (
7 type oracle_loader
8 default directory data_dir
9 access parameters
10 (
11 records delimited by newline
12 preprocessor exec_dir:'run_sed.bsh'
13 fields terminated by '|' ldrtrim
14 )
15 location
16 (
17 data_dir:'load.csv'
18 )
19 )
20 /
Table created.

Now selecting from the external table:

EODA@ORA12CR1> select * from csv3;

COL1 COL2
-------------------- --------------------
emergency data load now
more data must load data

How do we know the extraneous ^M was removed from each line? Check the length of COL2:

EODA@ORA12CR1> select length(col2) from csv3;

LENGTH(COL2)
------------
8
14

If the hidden ^M characters had not been removed, the length of COL2 would have been at least one byte longer than the character data within the column (not to mention we could have had some surprises with searches and joins).

Preprocessing Wrap-Up

The prior examples show the power of external table PREPROCESSOR directive. Sure, you could achieve the same results without preprocessing, but you would have more steps, more code to maintain, and more places where things can break.

Specific examples were chosen to illustrate the power and flexibility of preprocessing. In the first example, the file operated on was a shell script; there was no data file. The output of the OS command was the data returned when selecting from the external table.

In the second example, a base directory was operated on by the preprocessor script. This provided a starting point for a directory tree search of compressed files, on-the-fly uncompressing of the contents, and then additional filtering of columns and/or rows with commands like find,cut, and zgrep.

The third example showed a script operating on a CSV file to remove unwanted characters by using sed before displaying the data. This is a common need when transferring files to and from DOS and UNIX/Linux platforms.

These three examples demonstrate distinctive ways to use the PREPROCESSOR directive. This provides a base knowledge of the possible uses and allows you to build on this to simplify your loading requirements.

External Tables Summary

In this section, we explored external tables. They are a feature available with Oracle9i and later that may for the most part replace SQLLDR. We investigated the quickest way to get going with external tables: the technique of using SQLLDR to convert the control files we have from past experiences. We demonstrated some techniques for detecting and handling errors via the bad files and, we also explored some multiuser issues regarding external tables. Lastly, we demonstrated preprocessing techniques to execute operating system commands as a first step when selecting from an external table.

We are now ready to get into the next section in this chapter, which deals with unloading data from the database.

Data Pump Unload

Oracle9i introduced external tables as a method to read external data into the database. Oracle 10g introduced the ability to go the other direction and use a CREATE TABLE statement to create external data, to unload data from the database. Starting with Oracle 10g, this data is extracted in a proprietary binary format known as Data Pump format, which is the same format the EXPDP and IMPDP tools provided by Oracle to move data from database to database use.

Using the external table unload is actually quite easy—as easy as a CREATE TABLE AS SELECT statement. To start, we need a DIRECTORY object:

EODA@ORA12CR1> create or replace directory tmp as '/tmp';
Directory created.

Now we are ready to unload data to this directory using a simple SELECT statement, for example:

EODA@ORA12CR1> create table all_objects_unload
2 organization external
3 ( type oracle_datapump
4 default directory TMP
5 location( 'allobjects.dat' )
6 )
7 as
8 select
9 *
10 from all_objects
11 /
Table created.

I purposely chose the ALL_OBJECTS view because it is a quite complex view with lots of joins and predicates. This shows you can use this datapump unload technique to extract arbitrary data from your database. We could add predicates or whatever else we wanted to extract a slice of data.

Note This example shows you can use this to extract arbitrary data from your database. Yes, that is repeated text. From a security perspective, this does make it rather easy for someone with access to the information to take the information elsewhere. You need to control access to the set of people who have the ability to create DIRECTORY objects and write to them, and who have the necessary access to the physical server to get the unloaded data.

The final step would be to copy allobjects.dat onto another server, perhaps a development machine for testing with, and extract the DDL to re-create this table over there:

EODA@ORA12CR1> select dbms_metadata.get_ddl( 'TABLE', 'ALL_OBJECTS_UNLOAD' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','ALL_OBJECTS_UNLOAD')
---------------------------------------------------

CREATE TABLE "EODA"."ALL_OBJECTS_UNLOAD"
( "OWNER" VARCHAR2(128),
"OBJECT_NAME" VARCHAR2(128),
"SUBOBJECT_NAME" VARCHAR2(128),
"OBJECT_ID" NUMBER,
"DATA_OBJECT_ID" NUMBER,
"OBJECT_TYPE" VARCHAR2(23),
"CREATED" DATE,
"LAST_DDL_TIME" DATE,
"TIMESTAMP" VARCHAR2(19),
"STATUS" VARCHAR2(7),
"TEMPORARY" VARCHAR2(1),
"GENERATED" VARCHAR2(1),
"SECONDARY" VARCHAR2(1),
"NAMESPACE" NUMBER,
"EDITION_NAME" VARCHAR2(128),
"SHARING" VARCHAR2(13),
"EDITIONABLE" VARCHAR2(1),
"ORACLE_MAINTAINED" VARCHAR2(1)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY "TMP"
LOCATION
( 'allobjects.dat'
)
)

This makes it rather easy to load this extract on another database, as it would simply be the following, and you are done—the data is loaded:

EODA@ORA12CR1> insert /*+ append */ into some_table select * from all_objects_unload;

Note Starting with Oracle 11g, you can create compressed and encrypted dump files. These features require the Enterprise Edition of Oracle, as well as the Advanced Compression option (for compression) and the Advanced Security option (for encryption).

SQLLDR

SQLLDR is Oracle’s high-speed, bulk data loader. It is an extremely useful tool for getting data into an Oracle database from a variety of flat file formats. SQLLDR can be used to load enormous amounts of data in an amazingly short period of time. It has two modes of operation:

· Conventional path: SQLLDR will employ SQL inserts on our behalf to load data.

· Direct path: SQLLDR does not use SQL in this mode; it formats database blocks directly.

The direct path load allows you to read data from a flat file and write it directly to formatted database blocks, bypassing the entire SQL engine, undo generation and, optionally, redo generation at the same time. Parallel direct path load is among the fastest ways to go from having no data to a fully loaded database.

We will not cover every single aspect of SQLLDR. For all of the details, refer to the Oracle Database Utilities manual, which dedicates seven chapters to SQLLDR in Oracle 12c Release 1. The fact that it is covered in seven chapters is notable, since every other utility, such as DBVERIFY, DBNEWID, and LogMiner get one chapter or less. For complete syntax and all of the options, I will again refer you to the Oracle Database Utilities manual, as this chapter is intended to answer the “How do I . . .?” questions that a reference manual does not address.

It should be noted that the Oracle Call Interface (OCI) allows you to write your own direct path loader using C. This is useful when the operation you want to perform is not feasible in SQLLDR or when seamless integration with your application is desired. SQLLDR is a command-line tool (i.e., it’s a separate program). It is not an API or anything that can be “called from PL/SQL,” for example.

If you execute SQLLDR from the command line with no inputs, it gives you the following help:

$ sqlldr
SQL*Loader: Release 12.1.0.1.0 - Production on Sun Mar 9 11:57:29 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
Usage: SQLLDR keyword=value [,keyword=value,...]
Valid Keywords:
userid -- ORACLE username/password
control -- control file name
log -- log file name
bad -- bad file name
data -- data file name
discard -- discard file name
discardmax -- number of discards to allow (Default all)
skip -- number of logical records to skip (Default 0)
load -- number of logical records to load (Default all)
errors -- number of errors to allow (Default 50)
rows -- number of rows in conventional path bind array or between direct path data saves
(Default: Conventional path 64, Direct path all)
bindsize -- size of conventional path bind array in bytes (Default 256000)
silent -- suppress messages during run (header,feedback,errors,discards,partitions)
direct -- use direct path (Default FALSE)
parfile -- parameter file: name of file that contains parameter specifications
parallel -- do parallel load (Default FALSE)
file -- file to allocate extents from
skip_unusable_indexes -- disallow/allow unusable indexes or index partitions (Default FALSE)
skip_index_maintenance -- do not maintain indexes, mark affected indexes as unusable (Default FALSE)
commit_discontinued -- commit loaded rows when load is discontinued (Default FALSE)
readsize -- size of read buffer (Default 1048576)
external_table -- use external table for load; NOT_USED, GENERATE_ONLY, EXECUTE
columnarrayrows -- number of rows for direct path column array (Default 5000)
streamsize -- size of direct path stream buffer in bytes (Default 256000)
multithreading -- use multithreading in direct path
resumable -- enable or disable resumable for current session (Default FALSE)
resumable_name -- text string to help identify resumable statement
resumable_timeout -- wait time (in seconds) for RESUMABLE (Default 7200)
date_cache -- size (in entries) of date conversion cache (Default 1000)
no_index_errors -- abort load on any index errors (Default FALSE)
...

For technical definitions of each individual parameter, please refer to Chapter 8 of the Oracle Database Utilities manual. I will demonstrate the usage of a few of these parameters in this chapter.

To use SQLLDR, you will need a control file. A SQLLDR control file simply contains information describing the input data—its layout, datatypes, and so on—as well as information about the target table(s).

Note Don’t confuse a SQLLDR control file with a database control file. Recall from Chapter 3 that a database control file is small binary file that stores a directory of the files Oracle requires along with other information such as checkpoint data, the name of the database, and so on.

The control file can even contain the data to load. In the following example, we’ll build a simple control file in a step-by-step fashion, and I’ll provide an explanation of the commands. (Note that the parenthetical numbers to the left in the code are not part of this control file; they are just there for reference.) I’ll refer to the file later as demo1.ctl.

(1) LOAD DATA
(2) INFILE *
(3) INTO TABLE DEPT
(4) FIELDS TERMINATED BY ','
(5) (DEPTNO, DNAME, LOC)
(6) BEGINDATA
(7) 10,Sales,Virginia
(8) 20,Accounting,Virginia
(9) 30,Consulting,Virginia
(10) 40,Finance,Virginia

· LOAD DATA (1): This tells SQLLDR what to do (in this case, load data). The other thing SQLLDR can do is CONTINUE_LOAD to resume a load. You would use this latter option only when continuing a multitable direct path load.

· INFILE * (2): This tells SQLLDR the data to be loaded is actually contained within the control file itself as shown on lines 6 through 10. Alternatively, you could specify the name of another file that contains the data. You can override this INFILE statement using a command-line parameter if you wish. Be aware that command-line options override control file settings.

· INTO TABLE DEPT (3): This tells SQLLDR to which table you are loading data (in this case, the DEPT table).

· FIELDS TERMINATED BY ',' (4): This tells SQLLDR that the data will be in the form of comma-separated values. There are dozens of ways to describe the input data to SQLLDR; this is just one of the more common methods.

· (DEPTNO, DNAME, LOC) (5): This tells SQLLDR what columns you are loading, their order in the input data, and their datatypes. The datatypes are for the data in the input stream, not the datatypes in the database. In this case, they are defaulting to CHAR(255), which is sufficient.

· BEGINDATA (6): This tells SQLLDR you have finished describing the input data and that the very next lines, lines 7 to 10, are the actual data to be loaded into the DEPT table.

This is a control file in one of its most simple and common formats: to load delimited data into a table. We will take a look at some much more complex examples in this chapter, but this is a good one to get our feet wet with. To use this control file, which we will name demo1.ctl, all we need to do is create an empty DEPT table:

EODA@ORA12CR1> create table dept
2 ( deptno number(2) constraint dept_pk primary key,
3 dname varchar2(14),
4 loc varchar2(13)
5 )
6 /
Table created.

And run the following command:

$ sqlldr userid=eoda control=demo1.ctl
Password:
SQL*Loader: Release 12.1.0.1.0 - Production on Sun Mar 9 12:03:26 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
Path used: Conventional
Commit point reached - logical record count 4
Table DEPT:
4 Rows successfully loaded.
Check the log file:
demo1.log

If the table is not empty, we will receive an error message to the following effect:

SQL*Loader-601: For INSERT option, table must be empty. Error on table DEPT

This is because we allowed almost everything in the control file to default, and the default load option is INSERT (as opposed to APPEND, TRUNCATE, or REPLACE). To INSERT, SQLLDR assumes the table is empty. If we wanted to add records to the DEPT table, we could have specified APPEND; to replace the data in the DEPT table, we could have used REPLACE or TRUNCATE. REPLACE uses a conventional DELETE FROM TABLE statement to remove records; hence, if the table to be loaded into already contains many records, it could be quite slow to perform. TRUNCATE uses the TRUNCATE SQL command and is typically faster, as it does not have to physically remove each row.

Every load will generate a log file. The log file from our simple load looks like this:

SQL*Loader: Release 12.1.0.1.0 - Production on Sun Mar 9 12:03:26 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.

Control File: demo1.ctl
Data File: demo1.ctl
Bad File: demo1.bad
Discard File: none specified

(Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional

Table DEPT, loaded from every logical record.
Insert option in effect for this table: INSERT

Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
DEPTNO FIRST * , CHARACTER
DNAME NEXT * , CHARACTER
LOC NEXT * , CHARACTER

Table DEPT:
4 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.

Space allocated for bind array: 49536 bytes(64 rows)
Read buffer bytes: 1048576

Total logical records skipped: 0
Total logical records read: 4
Total logical records rejected: 0
Total logical records discarded: 0

Run began on Sat Mar 01 10:10:35 2014
Run ended on Sat Mar 01 10:10:36 2014

Elapsed time was: 00:00:01.01
CPU time was: 00:00:00.01

The log file tells us about many of the aspects of our load. We can see the options we used (defaulted or otherwise). We can see how many records were read, how many loaded, and so on. The log files specify the locations of all BAD and DISCARD files. They even tell us how long it took. These log files are crucial for verifying that the load was successful, as well as for diagnosing errors. If the loaded data resulted in SQL errors (i.e., the input data was “bad” and created records in the BAD file), these errors would be recorded here. The information in the log files is largely self-explanatory, so we will not spend any more time on it.

Loading Data with SQLLDR FAQs

We will now cover what I have found to be the most frequently asked questions with regard to loading data in an Oracle database using SQLLDR.

Why Do I Receive “exceeds maximum length” in My Log File?

This is perhaps the most frequently recurring question I’ve heard with SQLLDR: Why does my log file contain something similar to (the following)?:

Record 4: Rejected - Error on table DEPT, column DNAME.
Field in data file exceeds maximum length

This is due to the fact that the default datatype in SQLLDR for processing an input record is a char(255). If you have any string datatypes in your table that exceed that, you’ll have to explicitly tell SQLLDR that the input record can contain more than 255 characters.

For example, suppose you add a column that can hold more than 255 characters:

EODA@ORA12CR1> alter table dept modify dname varchar2(1000);
Table altered.

And you had a control file such as the following (the line with more text repeated is a single line in that input file):

LOAD DATA
INFILE *
INTO TABLE DEPT
FIELDS TERMINATED BY ','
(DEPTNO, DNAME, LOC)
BEGINDATA
10,Sales,Virginia
20,Accounting,Virginia
30,Consulting,Virginia
40,Finance more text more text more text more ... <repeated many times> ...more text,Virginia

When you ran SQLLDR, you would receive the previous error message. The solution is rather simple:

LOAD DATA
INFILE *
INTO TABLE DEPT
FIELDS TERMINATED BY ','
(DEPTNO, DNAME char(1000), LOC)
BEGINDATA ...

That’s it! Just tell SQLLDR the maximum width of the field in the input record—in this case 1,000.

How Do I Load Delimited Data?

Delimited data, or data that is separated by some special character and perhaps enclosed in quotes, is the most popular data format for flat files today. On a mainframe, a fixed-length, fixed-format file would probably be the most recognized file format, but on UNIX/Linux and Windows, delimited files are the norm. In this section, we will investigate the popular options used to load delimited data.

The most popular format for delimited data is the comma-separated values (CSV) format. In this file format, each field of data is separated from the next by a comma. Text strings can be enclosed within quotes, thus allowing for the string itself to contain commas. If the string must contain a quotation mark as well, the convention is to double up the quotation mark (in the following code we use“” in place of just "). A typical control file to load delimited data will look much like our first example earlier, but the FIELDS TERMINATED BY clause would generally be specified like this:

FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

It specifies that a comma separates the data fields, and that each field might be enclosed in double quotes. Let’s say that we were to modify the bottom of this control file to be as follows:

FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(DEPTNO, DNAME, LOC)
BEGINDATA
10,Sales,"Virginia,USA"
20,Accounting,"Va, ""USA"""
30,Consulting,Virginia
40,Finance,Virginia

When we run SQLLDR using this control file, the results will be as follows:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia,USA
20 Accounting Va, "USA"
30 Consulting Virginia
40 Finance Virginia

Notice the following in particular:

· Virginia,USA in department 10: This results from input data that was "Virginia,USA". This input data field had to be enclosed in quotes to retain the comma as part of the data. Otherwise, the comma would have been treated as the end-of-field marker, andVirginia would have been loaded without the USA text.

· Va, "USA": This resulted from input data that was "Va, ""USA""". SQLLDR counted the double occurrence of " as a single occurrence within the enclosed string. To load a string that contains the optional enclosure character, you must ensure the enclosure character is doubled up.

Another popular format is tab-delimited data, which is data separated by tabs rather than commas. There are two ways to load this data using the TERMINATED BY clause:

· TERMINATED BY X'09' (the tab character using hexadecimal format; in ASCII, 9 is a tab character)

· TERMINATED BY WHITESPACE

The two are very different in implementation, as the following shows. Using the DEPT table from earlier, we’ll load using this control file:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY WHITESPACE
(DEPTNO, DNAME, LOC)
BEGINDATA
10 Sales Virginia

It is not readily visible on the page, but there are two tabs between each piece of data here. The data line is actually as follows, where the \t is the universally recognized tab escape sequence.

10\t\tSales\t\tVirginia

When you use this control file with the TERMINATED BY WHITESPACE clause as previously, the resulting data in the table DEPT is this:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia

TERMINATED BY WHITESPACE parses the string by looking for the first occurrence of whitespace (tab, blank, or newline), and then it continues until it finds the next non-whitespace character. Hence, when it parsed the data, DEPTNO had 10 assigned to it, the two subsequent tabs were considered as whitespace, Sales was assigned to DNAME, and so on.

On the other hand, suppose you were to use FIELDS TERMINATED BY X'09', as the following modified control file does:

...
FIELDS TERMINATED BY X'09'
(DEPTNO, DNAME, LOC)
...

You would find DEPT loaded with the following data:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales

Here, once SQLLDR encountered a tab, it output a value. Hence, 10 is assigned to DEPTNO, and DNAME gets NULL since there is no data between the first tab and the next occurrence of a tab. Sales gets assigned to LOC.

This is the intended behavior of TERMINATED BY WHITESPACE and TERMINATED BY <character>. The one that is more appropriate to use will be dictated by the input data and how you need it to be interpreted.

Lastly, when loading delimited data such as this, it is very common to want to skip over various columns in the input record. For example, you might want to load fields 1, 3, and 5, skipping columns 2 and 4. To do this, SQLLDR provides the FILLER keyword. This allows you to map a column in an input record, but not put it into the database. For example, given the DEPT table and the last control file from earlier, we can modify the control file to load the data correctly (skipping over the tabs) using the FILLER keyword:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY x'09'
(DEPTNO, dummy1 filler, DNAME, dummy2 filler, LOC)
BEGINDATA
10 Sales Virginia

The resulting DEPT table is now as follows:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia

How Do I Load Fixed Format Data?

Often, you have a flat file generated from some external system, and this file is a fixed-length file with positional data. For example, the NAME field is in bytes 1 to 10, the ADDRESS field is in bytes 11 to 35, and so on. We will look at how SQLLDR can import this kind of data for us.

This fixed-width, positional data is the optimal data format for SQLLDR to load. It will be the fastest way to process, as the input data stream is somewhat trivial to parse. SQLLDR will have stored fixed-byte offsets and lengths into data records, and extracting a given field is very simple. If you have an extremely large volume of data to load, converting it to a fixed position format is generally the best approach. The downside to a fixed-width file is, of course, that it can be much larger than a simple, delimited file format.

To load fixed-width positional data, you will use the POSITION keyword in the control file, for example:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
( DEPTNO position(1:2),
DNAME position(3:16),
LOC position(17:29)
)
BEGINDATA
10Accounting Virginia,USA

This control file does not employ the FIELDS TERMINATED BY clause; rather, it uses POSITION to tell SQLLDR where fields begin and end. Note that with the POSITION clause, we could use overlapping positions and go back and forth in the record. For example, suppose we were to alter the DEPT table as follows:

EODA@ORA12CR1> alter table dept add entire_line varchar2(29);
Table altered.

And then we used the following control file:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
( DEPTNO position(1:2),
DNAME position(3:16),
LOC position(17:29),
ENTIRE_LINE position(1:29)
)
BEGINDATA
10Accounting Virginia,USA

The field ENTIRE_LINE is defined as POSITION(1:29). It extracts its data from all 29 bytes of input data, whereas the other fields are substrings of the input data. The outcome of this control file will be as follows:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC ENTIRE_LINE
---------- -------------- ------------- -----------------------------
10 Accounting Virginia,USA 10Accounting Virginia,USA

When using POSITION, we can use relative or absolute offsets. In the preceding example, we used absolute offsets. We specifically denoted where fields begin and where they end. We could have written the preceding control file as follows:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
( DEPTNO position(1:2),
DNAME position(*:16),
LOC position(*:29),
ENTIRE_LINE position(1:29)
)
BEGINDATA
10Accounting Virginia,USA

The * instructs the control file to pick up where the last field left off. Therefore (*:16) is just the same as (3:16) in this case. Notice that you can mix relative and absolute positions in the control file. Additionally, when using the * notation, you can add to the offset. For example, ifDNAME started 2 bytes after the end of DEPTNO, we could have used (*+2:16). In this example, the effect would be identical to using (5:16).

The ending position in the POSITION clause must be the absolute column position where the data ends. At times, it can be easier to specify just the length of each field, especially if they are contiguous, as in the preceding example. In this fashion, we would just have to tell SQLLDR the record starts at byte 1, and then specify the length of each field. This will save us from having to compute start and stop byte offsets into the record, which can be hard at times. In order to do this, we’ll leave off the ending position and instead specify the length of each field in the fixed-length record as follows:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
( DEPTNO position(1) char(2),
DNAME position(*) char(14),
LOC position(*) char(13),
ENTIRE_LINE position(1) char(29)
)
BEGINDATA
10Accounting Virginia,USA

Here we had to tell SQLLDR only where the first field begins and its length. Each subsequent field starts where the last one left off and continues for a specified length. It is not until the last field that we have to specify a position again, since this field goes back to the beginning of the record.

How Do I Load Dates?

Loading dates using SQLLDR is fairly straightforward, but it seems to be a common point of confusion. You simply need to use the DATE data type in the control file and specify the date mask to be used. This date mask is the same mask you use with TO_CHAR and TO_DATE in the database. SQLLDR will apply this date mask to your data and load it for you.

For example, let’s say we alter our DEPT table again, as follows:

EODA@ORA12CR1> alter table dept add last_updated date;
Table altered.

We can load it with the following control file:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
(DEPTNO,
DNAME,
LOC,
LAST_UPDATED date 'dd/mm/yyyy'
)
BEGINDATA
10,Sales,Virginia,1/5/2014
20,Accounting,Virginia,21/6/2014
30,Consulting,Virginia,5/1/2013
40,Finance,Virginia,15/3/2014

The resulting DEPT table will look like this:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC LAST_UPDA
---------- -------------- ------------- ---------
10 Sales Virginia 01-MAY-14
20 Accounting Virginia 21-JUN-14
30 Consulting Virginia 05-JAN-13
40 Finance Virginia 15-MAR-14

It is that easy. Just supply the format in the control file and SQLLDR will convert the date for us. In some cases, it might be appropriate to use a more powerful SQL function. For example, your input file might contain dates in many different formats: sometimes with the time component, sometimes without; sometimes in DD-MON-YYYY format; sometimes in DD/MM/YYYY format; and so on. You’ll learn in the next section how to use functions in SQLLDR to overcome these challenges.

How Do I Load Data Using Functions?

In this section, you’ll see how to refer to functions while loading data. Bear in mind, however, that the use of such functions (including database sequences) requires the SQL engine, and hence won’t work in a direct path load.

Using functions in SQLLDR is very easy once you understand how SQLLDR builds its INSERT statement. To have a function applied to a field in a SQLLDR script, simply add it to the control file in double quotes. For example, say you have the DEPT table from earlier, and you would like to make sure the data being loaded is in uppercase. You could use the following control file to load it:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
LAST_UPDATED date 'dd/mm/yyyy'
)
BEGINDATA
10,Sales,Virginia,1/5/2014
20,Accounting,Virginia,21/6/2014
30,Consulting,Virginia,5/1/2013
40,Finance,Virginia,15/3/2014

The resulting data in the database will be as follows:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC LAST_UPDA ENTIRE_LINE
---------- -------------- ------------- --------- -----------
10 SALES VIRGINIA 01-MAY-14
20 ACCOUNTING VIRGINIA 21-JUN-14
30 CONSULTING VIRGINIA 05-JAN-13
40 FINANCE VIRGINIA 15-MAR-14

Notice how you are able to easily uppercase the data just by applying the UPPER function to a bind variable. It should be noted that the SQL functions could refer to any of the columns, regardless of the column the function is actually applied to. This means that a column can be the result of a function on two or more of the other columns. For example, if you wanted to load the column ENTIRE_LINE, you could use the SQL concatenation operator. It is a little more involved than that, though, in this case. Right now, the input data set has four data elements in it. Let’s say that you were to simply add ENTIRE_LINE to the control file like this:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
LAST_UPDATED date 'dd/mm/yyyy',
ENTIRE_LINE ":deptno||:dname||:loc||:last_updated"
)
BEGINDATA
10,Sales,Virginia,1/5/2014
20,Accounting,Virginia,21/6/2014
30,Consulting,Virginia,5/1/2013
40,Finance,Virginia,15/3/2014

You would find this error in your log file for each input record:

Record 1: Rejected - Error on table DEPT, column ENTIRE_LINE.
Column not found before end of logical record (use TRAILING NULLCOLS)

Here, SQLLDR is telling you that it ran out of data in the record before it ran out of columns. The solution is easy in this case, and in fact SQLLDR even tells us what to do: use TRAILING NULLCOLS. This will have SQLLDR bind a NULL value in for that column if no data exists in the input record. In this case, adding TRAILING NULLCOLS will cause the bind variable :ENTIRE_LINE to be NULL. So, you retry with this control file:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
LAST_UPDATED date 'dd/mm/yyyy',
ENTIRE_LINE ":deptno||:dname||:loc||:last_updated"
)
BEGINDATA
10,Sales,Virginia,1/5/2014
20,Accounting,Virginia,21/6/2014
30,Consulting,Virginia,5/1/2013
40,Finance,Virginia,15/3/2014

Now the data in the table is as follows:

EODA@ORA12CR1> select * from dept;

DEPTNO DNAME LOC LAST_UPDA ENTIRE_LINE
---------- -------------- ------------- --------- -----------------------------
10 SALES VIRGINIA 01-MAY-14 10SalesVirginia1/5/2014
20 ACCOUNTING VIRGINIA 21-JUN-14 20AccountingVirginia21/6/2014
30 CONSULTING VIRGINIA 05-JAN-13 30ConsultingVirginia5/1/2013
40 FINANCE VIRGINIA 15-MAR-14 40FinanceVirginia15/3/2014

What makes this feat possible is the way SQLLDR builds its INSERT statement. SQLLDR will look at the preceding and see the DEPTNO, DNAME, LOC, LAST_UPDATED, and ENTIRE_LINE columns in the control file. It will set up five bind variables named after these columns. Normally, in the absence of any functions, the INSERT statement it builds is simply:

INSERT INTO DEPT ( DEPTNO, DNAME, LOC, LAST_UPDATED, ENTIRE_LINE )
VALUES ( :DEPTNO, :DNAME, :LOC, :LAST_UPDATED, :ENTIRE_LINE );

It would then parse the input stream, assigning the values to its bind variables, and then execute the statement. When you begin to use functions, SQLLDR incorporates them into its INSERT statement. In the preceding example, the INSERT statement SQLLDR builds will look like this:

INSERT INTO T (DEPTNO, DNAME, LOC, LAST_UPDATED, ENTIRE_LINE)
VALUES ( :DEPTNO, upper(:dname), upper(:loc), :last_updated,
:deptno||:dname||:loc||:last_updated );

It then prepares and binds the inputs to this statement, and executes it. So, pretty much anything you can think of doing in SQL, you can incorporate into your SQLLDR scripts. With the addition of the CASE statement in SQL, doing this can be extremely powerful and easy. For example, say your input file could have dates in the following formats:

· HH24:MI:SS: Just a time; the date should default to the first day of the current month.

· DD/MM/YYYY: Just a date; the time should default to midnight.

· HH24:MI:SS DD/MM/YYYY: The date and time are both explicitly supplied.

You could use a control file like this:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
LAST_UPDATED
"case
when length(:last_updated) > 9
then to_date(:last_updated,'hh24:mi:ss dd/mm/yyyy')
when instr(:last_updated,':') > 0
then to_date(:last_updated,'hh24:mi:ss')
else to_date(:last_updated,'dd/mm/yyyy')
end"
)
BEGINDATA
10,Sales,Virginia,12:03:03 17/10/2014
20,Accounting,Virginia,02:23:54
30,Consulting,Virginia,01:24:00 21/10/2014
40,Finance,Virginia,17/8/2014

This results in the following:

EODA@ORA12CR1> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
Session altered.

EODA@ORA12CR1> select deptno, dname, loc, last_updated from dept;

DEPTNO DNAME LOC LAST_UPDATED
------ ------------------------------ ------------- --------------------
10 SALES VIRGINIA 17-oct-2014 12:03:03
20 ACCOUNTING VIRGINIA 01-mar-2014 02:23:54
30 CONSULTING VIRGINIA 21-oct-2014 01:24:00
40 FINANCE VIRGINIA 17-aug-2014 00:00:00

Now, one of three date formats will be applied to the input character string (notice that you are not loading a DATE anymore; you are just loading a string). The CASE function will look at the length and the contents of the string to determine which of the masks it should use.

It is interesting to note that you can write your own functions to be called from SQLLDR. This is a straightforward application of the fact that PL/SQL can be called from SQL.

How Do I Load Data with Embedded Newlines?

This is something that has been problematic for SQLLDR historically: how to load free-form data that may include a newline in it. The newline character is the default end-of-line character to SQLLDR, and the ways around this did not offer much flexibility in the past. Fortunately, in Oracle 8.1.6 and later versions we have some new options. The options for loading data with embedded newlines are now as follows:

· Load the data with some other character in the data that represents a newline (e.g., put the string \n in the text where a newline should appear) and use a SQL function to replace that text with a CHR(10) during load time.

· Use the FIX attribute on the INFILE directive, and load a fixed-length flat file. In this case there is no record terminator; rather, the fact that each record is exactly as long as every other record is used to determine where records begin and end.

· Use the VAR attribute on the INFILE directive, and load a variable-width file that uses a format such that the first few bytes of each line specify the length in bytes of the line to follow.

· Use the STR attribute on the INFILE directive to load a variable-width file with some sequence of characters that represents the end of line, as opposed to just the newline character representing this.

The following sections demonstrate each in turn.

Use a Character Other Than a Newline

This is an easy method if you have control over how the input data is produced. If it is easy enough to convert the data when creating the data file, this will work fine. The idea is to apply a SQL function to the data on the way into the database, replacing some string of characters with a newline. Let’s add another column to our DEPT table:

EODA@ORA12CR1> alter table dept add comments varchar2(4000);
Table altered.

We’ll use this column to load text into. An example control file with inline data could be as follows:

LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS "replace(:comments,'\\n',chr(10))"
)
BEGINDATA
10,Sales,Virginia,This is the Sales\nOffice in Virginia
20,Accounting,Virginia,This is the Accounting\nOffice in Virginia
30,Consulting,Virginia,This is the Consulting\nOffice in Virginia
40,Finance,Virginia,This is the Finance\nOffice in Virginia

Notice how in the call to replace we had to use \\n, not just \n. This is because \n is recognized by SQLLDR as a newline, and SQLLDR would have converted it into a newline, not a two-character string. When we execute SQLLDR with the preceding control file, the table DEPT is loaded with the following:

EODA@ORA12CR1> select deptno, dname, comments from dept;

DEPTNO DNAME COMMENTS
---------- -------------- ------------------------------
10 SALES This is the Sales
Office in Virginia

20 ACCOUNTING This is the Accounting
Office in Virginia

30 CONSULTING This is the Consulting
Office in Virginia

40 FINANCE This is the Finance
Office in Virginia

Use the FIX Attribute

The FIX attribute is another method available to us. If we use this, the input data must appear in fixed-length records. Each record will be exactly the same number of bytes as any other record in the input data set. When using positional data, the use of the FIX attribute is especially valid. These files are typically fixed-length input files to begin with. When using free-form delimited data, it is less likely that we will have a fixed-length file, as these files are generally of varying length (this is the entire point of delimited files: to make each line only as big as it needs to be).

When using the FIX attribute, we must use an INFILE clause, as this is an option to the INFILE clause. Additionally, the data must be stored externally, not in the control file itself, using this option. So, assuming we have fixed-length input records, we can use a control file such as this:

LOAD DATA
INFILE demo.dat "fix 80"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS
)

This file specifies an input data file that will have records that are 80 bytes each. This includes the trailing newline that may or may not be there. In this case, the newline is nothing special in the input data file. It is just another character to be loaded or not. This is the thing to understand: the newline at the end of the record (if present) will become part of the record. To fully understand this, we need a utility to dump the contents of a file on the screen so we can see what is really in there. Using any UNIX/Linux variant, this is pretty easy to do with od, a program to dump files to the screen in octal and other formats. We’ll use the following demo.dat file. Note that the first column in the following output is actually in octal (base 8), so the number 0000012 on the second line is in octal and represents the decimal number 10. This tells us what byte in the file we are looking at. I’ve formatted the output to show ten characters per line (using -w10), so 0, 12, 24, and 36 are really 0, 10, 20, and 30:

[tkyte@desktop tkyte]$ od -c -w10 -v demo.dat
0000000 1 0 , S a l e s , V
0000012 i r g i n i a , T h
0000024 i s i s t h e
0000036 S a l e s \n O f f i
0000050 c e i n V i r g
0000062 i n i a
0000074
0000106
0000120 2 0 , A c c o u n t
0000132 i n g , V i r g i n
0000144 i a , T h i s i s
0000156 t h e A c c o u
0000170 n t i n g \n O f f i
0000202 c e i n V i r g
0000214 i n i a
0000226
0000240 3 0 , C o n s u l t
0000252 i n g , V i r g i n
0000264 i a , T h i s i s
0000276 t h e C o n s u
0000310 l t i n g \n O f f i
0000322 c e i n V i r g
0000334 i n i a
0000346
0000360 4 0 , F i n a n c e
0000372 , V i r g i n i a ,
0000404 T h i s i s t h
0000416 e F i n a n c e \n
0000430 O f f i c e i n
0000442 V i r g i n i a
0000454
0000466
0000500
[tkyte@desktop tkyte]$

Notice that in this input file, the newlines (\n) are not there to indicate where the end of the record for SQLLDR is; rather, they are just data to be loaded in this case. SQLLDR is using the FIX width of 80 bytes to figure out how much data to read. In fact, if we look at the input data, the records for SQLLDR are not even terminated by \n in this input file. The character right before department 20’s record is a space, not a newline.

Now that we know each and every record is 80 bytes long, we are ready to load it using the control file listed earlier with the FIX 80 clause. When we do so, we can see the following:

EODA@ORA12CR1> select '"' || comments || '"' comments from dept;

COMMENTS
-------------------------------------------------------------------------------
"This is the Sales
Office in Virginia "

"This is the Accounting
Office in Virginia "

"This is the Consulting
Office in Virginia "

"This is the Finance
Office in Virginia "

You might need to trim this data, since the trailing whitespace is preserved. You can do that in the control file, using the TRIM built-in SQL function.

A word of caution to those of you lucky enough to work on both Windows and UNIX/Linux: the end-of-line marker is different on these platforms. On UNIX/Linux, it is simply \n (CHR(10) in SQL). On Windows/DOS, it is \r\n (CHR(13)||CHR(10) in SQL). In general, if you use the FIX approach, make sure to create and load the file on a homogenous platform (UNIX/Linux and UNIX/Linux, or Windows and Windows).

Use the VAR Attribute

Another method of loading data with embedded newline characters is to use the VAR attribute. When using this format, each record will begin with some fixed number of bytes that represent the total length of the incoming record. Using this format, we can load variable-length records that contain embedded newlines, but only if we have a record length field at the beginning of each and every record. So, suppose we use a control file such as the following:

LOAD DATA
INFILE demo.dat "var 3"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS
)

Then the VAR 3 says that the first 3 bytes of each input record contain the length of that input record. If we take a data file such as the following, we can load it using that control file:

[tkyte@desktop tkyte]$ cat demo.dat
05510,Sales,Virginia,This is the Sales
Office in Virginia
06520,Accounting,Virginia,This is the Accounting
Office in Virginia
06530,Consulting,Virginia,This is the Consulting
Office in Virginia
05940,Finance,Virginia,This is the Finance
Office in Virginia
[tkyte@desktop tkyte]$

In our input data file, we have four rows of data. The first row starts with 055, meaning that the next 55 bytes represent the first input record. This 55 bytes includes the terminating newline after the word Virginia. The next row starts with 065. It has 65 bytes of text, and so on. Using this format data file, we can easily load our data with embedded newlines.

Again, if you are using UNIX/Linux and Windows (the preceding example was with UNIX/Linux, where a newline is one character long), you would have to adjust the length field for each record. On Windows, the preceding example’s .dat file would have to contain 56, 66, 66, and 60 for the values in the length fields.

Use the STR Attribute

This is perhaps the most flexible method of loading data with embedded newlines. Using the STR attribute, we can specify a new end-of-line character (or sequence of characters). This allows us to create an input data file that has some special character at the end of each line—the newline is no longer “special.”

I prefer to use a sequence of characters, typically some special marker, and then a newline. This makes it easy to see the end-of-line character when viewing the input data in a text editor or some utility, as each record still has a newline at the end of it. The STR attribute is specified in hexadecimal, and perhaps the easiest way to get the exact hexadecimal string we need is to use SQL and UTL_RAW to produce the hexadecimal string for us. For example, assuming we are on UNIX/Linux where the end-of-line marker is CHR(10) (linefeed) and our special marker character is a pipe symbol (|), we can write this:

EODA@ORA12CR1> select utl_raw.cast_to_raw( '|'||chr(10) ) from dual;

UTL_RAW.CAST_TO_RAW('|'||CHR(10))
-------------------------------------------------------------------------------
7C0A

It shows us that the STR we need to use on UNIX/Linux is X'7C0A'.

Note On Windows, you would use UTL_RAW.CAST_TO_RAW( '|'||chr(13)||chr(10) ).

To use this, we might have a control file like this:

LOAD DATA
INFILE demo.dat "str X'7C0A'"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS
)

So, if our input data looks like the following, where each record in the data file ends with a |\n, the previous control file will load it correctly:

[tkyte@desktop tkyte]$ cat demo.dat
10,Sales,Virginia,This is the Sales
Office in Virginia|
20,Accounting,Virginia,This is the Accounting
Office in Virginia|
30,Consulting,Virginia,This is the Consulting
Office in Virginia|
40,Finance,Virginia,This is the Finance
Office in Virginia|
[tkyte@desktop tkyte]$

Embedded Newlines Wrap-up

We explored at least four ways to load data with embedded newlines in this section. In the upcoming section titled “Flat File Unload,” we will use one of these techniques, the STR attribute, in a generic unload utility to avoid issues with regard to newlines in text.

Additionally, one thing to be very aware of—and I’ve mentioned it previously a couple of times—is that in Windows (all flavors), text files may end in \r\n (ASCII 13 + ASCII 10, carriage return/linefeed). Your control file will have to accommodate this: that \r is part of the record. The byte counts in the FIX and VAR, and the string used with STR must accommodate this. For example, if you took any of the previous .dat files that currently contain just \n in them and FTP-ed them to Windows using an ASCII transfer (the default), every \n would turn into \r\n. The same control file that just worked in UNIX/Linux would not be able to load the data anymore. This is something you must be aware of and take into consideration when setting up the control file.

How Do I Load LOBs?

We will now consider some methods for loading into LOBs. This is not a LONG or LONG RAW field, but rather the preferred datatypes of BLOB and CLOB. These datatypes were introduced in Oracle 8.0 and later, and they support a much richer interface/set of functionality than the legacyLONG and LONG RAW types, as discussed in Chapter 12.

We will investigate two methods for loading these fields: SQLLDR and PL/SQL. Others exist, such as Java streams, Pro*C, and OCI. We will begin working with the PL/SQL method of loading LOBs, and then we’ll look at using SQLLDR to load them as well.

Loading a LOB via PL/SQL

The DBMS_LOB package has entry points called LoadFromFile, LoadBLOBFromFile, and LoadCLOBFromFile. These procedures allow us to use a BFILE (which can be used to read operating system files) to populate a BLOB or CLOB in the database. There is not a significant difference between the LoadFromFile and LoadBLOBFromFile routines, other than the latter returns OUT parameters that indicate how far into the BLOB column we have loaded data. The LoadCLOBFromFile routine, however, provides a significant feature: character set conversion. If you recall, in Chapter 12 we discussed some of the National Language Support (NLS) features of the Oracle database and the importance of character sets. LoadCLOBFromFile allows us to tell the database that the file it is about to load is in a character set different from the one the database is using, and that it should perform the required character set conversion. For example, you may have a UTF8-compatible database, but the files received to be loaded are encoded in the WE8ISO8859P1 character set, or vice versa. This function allows you to successfully load these files.

Note For complete details on the procedures available in the DBMS_LOB package and their full set of inputs and outputs, please refer to the Oracle PL/SQL Packages and Types Reference.

To use these procedures, we will need to create a DIRECTORY object in the database. This object will allow us to create BFILES (and open them) that point to a file existing on the file system that the database server has access to. This last phrase, “that the database server has access to,” is a key point when using PL/SQL to load LOBs. The DBMS_LOB package executes entirely in the server. It can see only the file systems the server can see. It cannot, in particular, see your local file system if you are accessing Oracle over the network.

So we need to begin by creating a DIRECTORY object in the database. This is a straightforward process. We will create two directories for this example (note that these examples are executed in a UNIX/Linux environment; you will use the syntax for referring to directories that is appropriate for your operating system):

EODA@ORA12CR1> create or replace directory dir1 as '/tmp/';
Directory created.

EODA@ORA12CR1> create or replace directory "dir2" as '/tmp/';
Directory created.

Note Oracle DIRECTORY objects are logical directories, meaning they are pointers to existing, physical directories in your operating system. The CREATE DIRECTORY command does not actually create a directory in the file system—you must perform that operation separately.

The user who performs this operation needs to have the CREATE ANY DIRECTORY privilege. The reason we create two directories is to demonstrate a common case-related (“case” as in uppercase versus lowercase characters) issue with regard to DIRECTORY objects. When Oracle created the first directory DIR1, it stored the object name DIR1 in uppercase as it is the default. In the second example with dir2, it will have created the DIRECTORY object preserving the case we used in the name. The importance of this will be demonstrated shortly when we use theBFILE object.

Now, we want to load some data into either a BLOB or a CLOB. The method for doing so is rather easy, for example:

EODA@ORA12CR1> create table demo
2 ( id int primary key,
3 theClob clob
4 )
5 /
Table created.

EODA@ORA12CR1> host echo 'Hello World!' > /tmp/test.txt

EODA@ORA12CR1> declare
2 l_clob clob;
3 l_bfile bfile;
4 begin
5 insert into demo values ( 1, empty_clob() )
6 returning theclob into l_clob;
7
8 l_bfile := bfilename( 'DIR1', 'test.txt' );
9 dbms_lob.fileopen( l_bfile );
10
11 dbms_lob.loadfromfile( l_clob, l_bfile,
12 dbms_lob.getlength( l_bfile ) );
13
14 dbms_lob.fileclose( l_bfile );
15 end;
16 /
PL/SQL procedure successfully completed.

EODA@ORA12CR1> select dbms_lob.getlength(theClob), theClob from demo
2 /
DBMS_LOB.GETLENGTH(THECLOB) THECLOB
--------------------------- ---------------
13 Hello World!

Walking through the preceding code we see:

· On lines 5 and 6, we create a row in our table, set the CLOB to an EMPTY_CLOB(), and retrieve its value in one call. With the exception of temporary LOBs, LOBs live in the database—we cannot write to a LOB variable without having a pointer to either a temporary LOB or a LOB that is already in the database. An EMPTY_CLOB() is not a NULL CLOB; it is a valid non-NULL pointer to an empty structure. The other thing this did for us was to get a LOB locator, which points to data in a row that is locked. If we were to have selected this value out without locking the underlying row, our attempts to write to it would fail because LOBs must be locked prior to writing (unlike other structured data). By inserting the row, we have, of course, locked the row. If we were modifying an existing row instead of inserting, we would have used SELECT FOR UPDATE to retrieve and lock the row.

· On line 8, we create a BFILE object. Note how we use DIR1 in uppercase—this is key, as we will see in a moment. This is because we are passing to BFILENAME() the name of an object, not the object itself. Therefore, we must ensure the name matches the case Oracle has stored for this object.

· On line 9, we open the LOB. This will allow us to read it.

· On lines 11 and 12, we load the entire contents of the operating system file /tmp/test.txt into the LOB locator we just inserted. We use DBMS_LOB.GETLENGTH() to tell the LOADFROMFILE() routine how many bytes of the BFILE to load (all of them).

· Lastly, on line 14, we close the BFILE we opened, and the CLOB is loaded.

If we had attempted to use dir1 instead of DIR1 in the preceding example, we would have encountered the following error:

EODA@ORA12CR1> declare
...
6 returning theclob into l_clob;
7
8 l_bfile := bfilename( 'dir1', 'test.txt' );
9 dbms_lob.fileopen( l_bfile );
...
15 end;
16 /
declare
*
ERROR at line 1:
ORA-22285: non-existent directory or file for FILEOPEN operation
ORA-06512: at "SYS.DBMS_LOB", line 523
ORA-06512: at line 9

This is because the directory dir1 does not exist—DIR1 does. If you prefer to use directory names in mixed case, you should use quoted identifiers when creating them as we did for dir2. This will allow you to write code as shown here:

EODA@ORA12CR1> declare
2 l_clob clob;
3 l_bfile bfile;
4 begin
5 insert into demo values ( 1, empty_clob() )
6 returning theclob into l_clob;
7
8 l_bfile := bfilename( 'dir2', 'test.txt' );
9 dbms_lob.fileopen( l_bfile );
10
11 dbms_lob.loadfromfile( l_clob, l_bfile,
12 dbms_lob.getlength( l_bfile ) );
13
14 dbms_lob.fileclose( l_bfile );
15 end;
16 /
PL/SQL procedure successfully completed.

I mentioned earlier in this section that LoadCLOBFromFile allows us to tell the database that the file it is about to load is in a character set different from the one the database is using, and that it should perform the required character set conversion. If you run the prior examples and get this for your output:

DBMS_LOB.GETLENGTH(THECLOB) THECLOB
--------------------------- ------------------------------
6 ??????

then you most likely are running into a character set mismatch between your database and the encoding used for the file. This next example uses the LoadCLOBFromFile to account for a file encoded in WE8ISO8859P1:

EODA@ORA12CR1> declare
2 l_clob clob;
3 l_bfile bfile;
4 dest_offset integer := 1;
5 src_offset integer := 1;
6 src_csid number := NLS_CHARSET_ID('WE8ISO8859P1');
7 lang_context integer := dbms_lob.default_lang_ctx;
8 warning integer;
9 begin
10 insert into demo values ( 1, empty_clob() )
11 returning theclob into l_clob;
12 l_bfile := bfilename( 'dir2', 'test.txt' );
13 dbms_lob.fileopen( l_bfile );
14 dbms_lob.loadclobfromfile( l_clob, l_bfile, dbms_lob.getlength( l_bfile ), dest_offset, src_offset,
src_csid, lang_context,warning );
15 dbms_lob.fileclose( l_bfile );
16 end;
17 /
PL/SQL procedure successfully completed.

Now selecting from the table we see this in the output:

EODA@ORA12CR1> select dbms_lob.getlength(theClob), theClob from demo;

DBMS_LOB.GETLENGTH(THECLOB) THECLOB
--------------------------- ------------------------------
13 Hello World!

There are methods other than the load from file routines by which you can populate a LOB using PL/SQL. Using DBMS_LOB and its supplied routines is by far the easiest if you are going to load the entire file. If you need to process the contents of the file while loading it, you may also use DBMS_LOB.READ on the BFILE to read the data. The use of UTL_RAW.CAST_TO_VARCHAR2 is handy here if the data you are reading is text, not RAW. You may then use DBMS_LOB.WRITE or WRITEAPPEND to place the data into a CLOB or BLOB.

Loading LOB Data via SQLLDR

We will now investigate how to load data into a LOB via SQLLDR. There is more than one method for doing this, but we will investigate the two most common:

· When the data is inline with the rest of the data.

· When the data is stored out of line, and the input data contains a file name to be loaded with the row. These are also known as secondary data files (SDFs) in SQLLDR terminology.

We will start with data that is inline.

Loading LOB Data That Is Inline

These LOBs will typically have newlines and other special characters embedded in them. Therefore, you will almost always use one of the four methods detailed in the “How Do I Load Data with Embedded Newlines?” section to load this data. Let’s begin by modifying the DEPT table to have a CLOB instead of a big VARCHAR2 field for the COMMENTS column:

EODA@ORA12CR1> truncate table dept;
Table truncated.

EODA@ORA12CR1> alter table dept drop column comments;
Table altered.

EODA@ORA12CR1> alter table dept add comments clob;
Table altered.

For example, say we have a data file (demo.dat) that has the following contents:

10, Sales,Virginia,This is the Sales
Office in Virginia|
20,Accounting,Virginia,This is the Accounting
Office in Virginia|
30,Consulting,Virginia,This is the Consulting
Office in Virginia|
40,Finance,Virginia,"This is the Finance
Office in Virginia, it has embedded commas and is
much longer than the other comments field. If you
feel the need to add double quoted text in here like
this: ""You will need to double up those quotes!"" to
preserve them in the string. This field keeps going for up to
1000000 bytes (because of the control file definition I used)
or until we hit the magic end of record marker,
the | followed by an end of line - it is right here ->"|

Each record ends with a pipe symbol (|), followed by the end-of-line marker. The text for department 40 is much longer than the rest, with many newlines, embedded quotes, and commas. Given this data file, we can create a control file such as this:

LOAD DATA
INFILE demo.dat "str X'7C0A'"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS char(1000000)
)

Note This example is from UNIX/Linux, where the end-of-line marker is 1 byte, hence the STR setting in the preceding control file. On Windows, it would have to be '7C0D0A'.

To load the data file, we specify CHAR(1000000) on column COMMENTS since SQLLDR defaults to CHAR(255) for any input field as discussed previously. The CHAR(1000000) will allow SQLLDR to handle up to 1,000,000 bytes of input text. You must set this to a value that is larger than any expected chunk of text in the input file. Reviewing the loaded data, we see the following:

EODA@ORA12CR1> select comments from dept;

COMMENTS
-------------------------------------------------------------------------------
This is the Consulting
Office in Virginia

This is the Finance
Office in Virginia, it has embedded commas and is
much longer than the other comments field. If you
feel the need to add double quoted text in here like
this: "You will need to double up those quotes!" to
preserve them in the string. This field keeps going for up to
1000000 bytes or until we hit the magic end of record marker,
the | followed by an end of line - it is right here ->

This is the Sales
Office in Virginia

This is the Accounting
Office in Virginia

The one thing to observe here is that the doubled-up quotes are no longer doubled up. SQLLDR removed the extra quotes we placed there.

Loading LOB Data That Is Out of Line

A common scenario is to have a data file that contains the names of files to load into the LOBs, instead of having the LOB data mixed in with the structured data. This offers a greater degree of flexibility, as the data file given to SQLLDR does not have to use one of the four methods to get around having embedded newlines in the input data, as would frequently happen with large amounts of text or binary data. SQLLDR calls this type of additional data file a LOBFILE.

SQLLDR can also support the loading of a structured data file that points to another, single data file. We can tell SQLLDR how to parse LOB data from this other file, so that each row in the structured data gets loaded with a piece of it. I find this mode to be of limited use (in fact, I’ve never found a use for it to date), and I will not discuss it here. SQLLDR refers to these externally referenced files as complex secondary data files.

LOBFILES are relatively simple data files aimed at facilitating LOB loading. The attribute that distinguishes LOBFILEs from the main data files is that in LOBFILEs, there is no concept of a record, hence newlines never get in the way. In LOBFILEs, the data is in any of the following formats:

· Fixed-length fields (e.g., load bytes 100 through 1000 from the LOBFILE)

· Delimited fields (terminated by something or enclosed by something)

· Length/value pairs, a variable-length field

The most common of these types is the delimited fields—ones that are terminated by an end-of-file (EOF), in fact. Typically, you have a directory full of files you would like to load into LOB columns, and each file in its entirety will go into a BLOB. The LOBFILE statement withTERMINATED BY EOF is what you will use.

So, let’s say we have a directory full of files we would like to load into the database. We would like to load the OWNER of the file, the TIME_STAMP of the file, the NAME of the file, and the file itself. The table we would load into would be created as follows:

EODA@ORA12CR1> create table lob_demo
2 ( owner varchar2(255),
3 time_stamp date,
4 filename varchar2(255),
5 data blob
6 )
7 /
Table created.

Using a simple ls –l on UNIX/Linux, and dir /q /n on Windows, and capturing that output, we can generate our input file and load it using a control file such as this on UNIX/Linux:

LOAD DATA
INFILE *
REPLACE
INTO TABLE LOB_DEMO
( owner position(14:19),
time_stamp position(31:42) date "Mon DD HH24:MI",
filename position(44:100),
data LOBFILE(filename) TERMINATED BY EOF
)
BEGINDATA
-rwxr-xr-x 1 oracle dba 14889 Jul 22 22:01 demo1.log_xt
-rwxr-xr-x 1 oracle dba 123 Jul 22 20:07 demo2.ctl
-rwxr-xr-x 1 oracle dba 712 Jul 23 12:11 demo.bad
-rwxr-xr-x 1 oracle dba 8136 Mar 9 12:36 demo.control_files
-rwxr-xr-x 1 oracle dba 825 Jul 23 12:26 demo.ctl
-rwxr-xr-x 1 oracle dba 1681 Jul 23 12:26 demo.log
-rw-r----- 1 oracle dba 118 Jul 23 12:52 dl.sql
-rwxr-xr-x 1 oracle dba 127 Jul 23 12:05 lob_demo.sql
-rwxr-xr-x 1 oracle dba 171 Mar 10 13:53 p.bsh
-rwxr-xr-x 1 oracle dba 327 Mar 10 11:10 prime.bsh
-rwxr-xr-x 1 oracle dba 24 Mar 6 12:09 run_df.sh

Now, if we inspect the contents of the LOB_DEMO table after running SQLLDR, we will discover the following:

EODA@ORA12CR1> select owner, time_stamp, filename, dbms_lob.getlength(data)
2 from lob_demo
3 /

OWNER TIME_STAM FILENAME DBMS_LOB.GETLENGTH(DATA)
---------- --------- -------------------- ------------------------
oracle 22-JUL-14 demo1.log_xt 14889
oracle 22-JUL-14 demo2.ctl 123
oracle 23-JUL-14 demo.bad 712
oracle 09-MAR-14 demo.control_files 8136
oracle 23-JUL-14 demo.ctl 825
oracle 23-JUL-14 demo.log 0
oracle 23-JUL-14 dl.sql 118
oracle 23-JUL-14 lob_demo.sql 127
oracle 10-MAR-14 p.bsh 171
oracle 10-MAR-14 prime.bsh 327
oracle 06-MAR-14 run_df.sh 24

11 rows selected.

Note You might ask, “Why is the size of demo.log apparently 0?” During the running of SQLLDR it would open the demo.log file for writing, which would zero out the length of that file and reset that file. So while loading the demo.log file, it was empty.

This works with CLOBs as well as BLOBs. Loading a directory of text files using SQLLDR in this fashion is easy.

Loading LOB Data into Object Columns

Now that we know how to load into a simple table we have created ourselves, we might also find the need to load into a table that has a complex object type with a LOB in it. This happens most frequently when using the image capabilities. The image capabilities are implemented using a complex object type, ORDSYS.ORDIMAGE. We need to be able to tell SQLLDR how to load into this.

To load a LOB into an ORDIMAGE type column, we must understand a little more about the structure of the ORDIMAGE type. Using a table we want to load into, and a DESCRIBE on that table in SQL*Plus, we discover that we have a column called IMAGE of type ORDSYS.ORDIMAGE, which we want to ultimately load into IMAGE.SOURCE.LOCALDATA. The following examples will work only if you have Oracle Text installed and configured; otherwise, the datatype ORDSYS.ORDIMAGE will be an unknown type:

EODA@ORA12CR1> create table image_load(
2 id number,
3 name varchar2(255),
4 image ordsys.ordimage
5 )
6 /
Table created.

EODA@ORA12CR1> desc image_load
Name Null? Type
---------------------------------------- -------- ----------------------------
ID NUMBER
NAME VARCHAR2(255)
IMAGE ORDSYS.ORDIMAGE

EODA@ORA12CR1> desc ordsys.ordimage
Name Null? Type
----------------------------------------------------- -------- -------------------------------
SOURCE ORDSYS.ORDSOURCE
HEIGHT NUMBER(38)
WIDTH NUMBER(38)
CONTENTLENGTH NUMBER(38)
FILEFORMAT VARCHAR2(4000)
...

EODA@ORA12CR1> desc ordsys.ordsource
Name Null? Type
----------------------------------------------------- -------- -------------------------------
LOCALDATA BLOB
SRCTYPE VARCHAR2(4000)
SRCLOCATION VARCHAR2(4000)
SRCNAME VARCHAR2(4000)
UPDATETIME DATE
...

Note You could issue SET DESC DEPTH ALL or SET DESC DEPTH <n> in SQL*Plus to have the entire hierarchy displayed at once. Given that the output from describing the ORDSYS.ORDIMAGE type would have been many pages long, I chose to do it piece by piece.

So a control file to load this might look like this:

LOAD DATA
INFILE *
INTO TABLE image_load
REPLACE
FIELDS TERMINATED BY ','
( ID,
NAME,
file_name FILLER,
IMAGE column object
(
SOURCE column object
(
LOCALDATA LOBFILE (file_name) TERMINATED BY EOF
NULLIF file_name = 'NONE'
)
)
)
BEGINDATA
1,icons,icons.gif

I have introduced two new constructs here:

· COLUMN OBJECT: This tells SQLLDR that this is not a column name; rather, it is part of a column name. It is not mapped to a field in the input file, but is used to build the correct object column reference to be used during the load. In the preceding file, we have two column object tags, one nested inside the other. Therefore, the column name that will be used is IMAGE.SOURCE.LOCALDATA, as we need it to be. Note that we are not loading any of the other attributes of these two object types (e.g., IMAGE.HEIGHT,IMAGE.CONTENTLENGTH, and IMAGE.SOURCE.SRCTYPE). We’ll shortly see how to get those populated.

· NULLIF FILE_NAME = 'NONE': This tells SQLLDR to load a NULL into the object column in the event that the field FILE_NAME contains the word NONE in it.

Once you have loaded an Oracle Text type, you will typically need to postprocess the loaded data using PL/SQL to have Oracle Text operate on it. For example, with the preceding data, you would probably want to run the following to have the properties for the image set up correctly:

begin
for c in ( select * from image_load ) loop
c.image.setproperties;
end loop;
end;
/

SETPROPERTIES is an object method provided by the ORDSYS.ORDIMAGE type, which processes the image itself and updates the remaining attributes of the object with appropriate values.

How Do I Call SQLLDR from a Stored Procedure?

The short answer is that you cannot do this. SQLLDR is not an API; it is not something that is callable. SQLLDR is a command-line program. You can definitely write an external procedure in Java or C that runs SQLLDR, but that won’t be the same as “calling” SQLLDR. The load will happen in another session, and it will not be subject to your transaction control. Additionally, you will have to parse the resulting log file to determine if the load was successful or not, and how successful (i.e., how many rows got loaded before an error terminated the load) it may have been. Invoking SQLLDR from a stored procedure is not something I recommend doing.

In the past, before Oracle9i, you might have implemented your own SQLLDR-like process. For example, the options could have been as follows:

· Write a mini-SQLLDR in PL/SQL. It can use either BFILES to read binary data or UTL_FILE to read text data to parse and load.

· Write a mini-SQLLDR in Java. This can be a little more sophisticated than a PL/SQL-based loader and can make use of the many available Java routines.

· Write a SQLLDR in C, and call it as an external procedure.

I’d like to finish up the topic of SQLLDR by discussing a few topics that are not immediately intuitive.

SQLLDR Caveats

In this section, we will discuss some things to have to watch out for when using SQLLDR.

TRUNCATE Appears to Work Differently

The TRUNCATE option of SQLLDR might appear to work differently than TRUNCATE does in SQL*Plus, or any other tool. SQLLDR, working on the assumption you will be reloading the table with a similar amount of data, uses the extended form of TRUNCATE. Specifically, it issues the following:

truncate table t reuse storage

The REUSE STORAGE option does not release allocated extents—it just marks them as free space. If this were not the desired outcome, you would truncate the table prior to executing SQLLDR.

SQLLDR Defaults to CHAR(255)

This issue comes up so often, I’ve decided to talk about it twice in this chapter. The default length of input fields is 255 characters. If your field is longer than this, you will receive an error message:

Record N: Rejected - Error on table T, column C.
Field in data file exceeds maximum length

This does not mean the data will not fit into the database column; rather, it indicates that SQLLDR was expecting 255 bytes or less of input data, and it received somewhat more than that. The solution is to simply use CHAR(N) in the control file, where N is big enough to accommodate the largest field length in the input file. Refer to the very first item in the earlier section “Loading Data with SQLLDR FAQs” for an example.

Command Line Overrides Control File

Many of the SQLLDR options may be either placed in the control file or used on the command line. For example, I can use INFILE FILENAME as well as SQLLDR ... DATA=FILENAME. The command line overrides any options in the control file. You cannot count on the options in a control file actually being used, as the person executing SQLLDR can override them.

SQLLDR Summary

In this section, we explored many areas of loading data. We covered the typical, everyday issues we will encounter: loading delimited files, loading fixed-length files, loading a directory full of image files, using functions on input data to transform the input, and so on. We did not cover massive data loads using the direct path loader in any detail; rather, we touched lightly on that subject. Our goal was to answer the questions that arise frequently with the use of SQLLDR and that affect the broadest audience.

Flat File Unload

One thing SQLLDR does not do, and that Oracle supplies no command-line tools for, is unloading data in a format understandable by SQLLDR or other programs. This would be useful for moving data from system to system without using Data Pump EXPDP/IMPDP. Using EXPDP/IMPDPto move data from system to system works fine for moderate amounts of data—as long as both systems are Oracle.

Note APEX provides a data export feature as part of its SQL Workshop as does SQL Developer. You may export the information in a CSV format easily. This works well for a few megabytes of information, but it is not appropriate for many tens of megabytes or more.

We will develop a small PL/SQL utility that may be used to unload data on a server in a SQLLDR-friendly format. Also, equivalent tools for doing so in Pro*C and SQL*Plus are provided on the Ask Tom web site athttp://tkyte.blogspot.com/2009/10/httpasktomoraclecomtkyteflat.html. The PL/SQL utility will work fine in most small cases, but better performance will be had using Pro*C. Note that Pro*C and SQL*Plus are also useful if you need the files to be generated on the client and not on the server, which is where PL/SQL will create them.

The specification of the package we will create is as follows:

EODA@ORA12CR1> create or replace package unloader
2 AUTHID CURRENT_USER
3 as
4 /* Function run -- unloads data from any query into a file
5 and creates a control file to reload that
6 data into another table
7
8 p_query = SQL query to "unload". May be virtually any query.
9 p_tname = Table to load into. Will be put into control file.
10 p_mode = REPLACE|APPEND|TRUNCATE -- how to reload the data
11 p_dir = directory we will write the ctl and dat file to.
12 p_filename = name of file to write to. I will add .ctl and .dat
13 to this name
14 p_separator = field delimiter. I default this to a comma.
15 p_enclosure = what each field will be wrapped in
16 p_terminator = end of line character. We use this so we can unload
17 and reload data with newlines in it. I default to
18 "|\n" (a pipe and a newline together) and "|\r\n" on NT.
19 You need only to override this if you believe your
20 data will have that default sequence of characters in it.
21 I ALWAYS add the OS "end of line" marker to this sequence, you should not
22 */
23 function run( p_query in varchar2,
24 p_tname in varchar2,
25 p_mode in varchar2 default 'REPLACE',
26 p_dir in varchar2,
27 p_filename in varchar2,
28 p_separator in varchar2 default ',',
29 p_enclosure in varchar2 default '"',
30 p_terminator in varchar2 default '|' )
31 return number;
32 end;
33 /
Package created.

Note the use of AUTHID CURRENT_USER. This permits this package to be installed once in a database and used by anyone to unload data. All the person needs is SELECT privileges on the table(s) he wants to unload and EXECUTE privileges on this package. If we did not use AUTHID CURRENT_USER in this case, then the owner of this package would need direct SELECT privileges on all tables to be unloaded. Additionally, since this routine would be completely subject to “SQL Injection” attacks (it takes an arbitrary SQL statement as input), the CREATE statement must specify AUTHID CURRENT_USER. If it was a default definer’s rights routine, anyone with EXECUTE on it would be able to execute any SQL statement using the owner’s permissions. If you know a routine is SQL Injectable, it had better be an invoker’s rights routine!

Note The SQL will execute with the privileges of the invoker of this routine. However, all PL/SQL calls will run with the privileges of the definer of the called routine; therefore, the ability to use UTL_FILE to write to a directory is implicitly given to anyone with execute permission on this package.

The package body follows. We use UTL_FILE to write a control file and a data file. DBMS_SQL is used to dynamically process any query. We use one datatype in our queries: a VARCHAR2(4000). This implies we cannot use this method to unload LOBs, and that is true if the LOB is greater than 4,000 bytes. We can, however, use this to unload up to 4,000 bytes of any LOB using SUBSTR. Additionally, since we are using a VARCHAR2 as the only output data type, we can handle RAWs up to 2,000 bytes in length (4,000 hexadecimal characters), which is sufficient for everything except LONG RAWs and LOBs. Additionally, any query that references a nonscalar attribute (a complex object type, nested table, and so on) will not work with this simple implementation. The following is a 90 percent solution, meaning it solves the problem 90 percent of the time:

EODA@ORA12CR1> create or replace package body unloader
2 as
3
4
5 g_theCursor integer default dbms_sql.open_cursor;
6 g_descTbl dbms_sql.desc_tab;
7 g_nl varchar2(2) default chr(10);
8

These are some global variables used in this package body. The global cursor is opened once, the first time we reference this package, and it will stay open until we log out. This avoids the overhead of getting a new cursor every time we call this package. The G_DESCTBL is a PL/SQL table that will hold the output of a DBMS_SQL.DESCRIBE call. G_NL is a newline character. We use this in strings that need to have newlines embedded in them. We do not need to adjust this for Windows—UTL_FILE will see the CHR(10) in the string of characters and automatically turn that into a carriage return/linefeed for us.

Next, we have a small convenience function used to convert a character to hexadecimal. It uses the built-in functions to do this:

9
10 function to_hex( p_str in varchar2 ) return varchar2
11 is
12 begin
13 return to_char( ascii(p_str), 'fm0x' );
14 end;
15

Finally, we create one more convenience function, IS_WINDOWS, that returns TRUE or FALSE depending on if we are on the Windows platform, and therefore the end of line is a two-character string instead of the single character it is on most other platforms. We are using the built-inDBMS_UTILITY function, GET_PARAMETER_VALUE, which can be used to read most any parameter. We retrieve the CONTROL_FILES parameter and look for the existence of a \ in it—if we find one, we are on Windows:

16 function is_windows return boolean
17 is
18 l_cfiles varchar2(4000);
19 l_dummy number;
20 begin
21 if (dbms_utility.get_parameter_value( 'control_files', l_dummy, l_cfiles )>0)
22 then
23 return instr( l_cfiles, '\' ) > 0;
24 else
25 return FALSE;
26 end if;
27 end;

Note The IS_WINDOWS function does rely on you using the \ in your CONTROL_FILES parameter. Be aware that you may use /, but it would be highly unusual.

The following is a procedure to create a control file to reload the unloaded data, using the DESCRIBE table generated by DBMS_SQL.DESCRIBE_COLUMNS. It takes care of the operating system specifics for us, such as whether the operating system uses a carriage return/linefeed (this is used for the STR attribute):

28
29 procedure dump_ctl( p_dir in varchar2,
30 p_filename in varchar2,
31 p_tname in varchar2,
32 p_mode in varchar2,
33 p_separator in varchar2,
34 p_enclosure in varchar2,
35 p_terminator in varchar2 )
36 is
37 l_output utl_file.file_type;
38 l_sep varchar2(5);
39 l_str varchar2(5) := chr(10);
40
41 begin
42 if ( is_windows )
43 then
44 l_str := chr(13) || chr(10);
45 end if;
46
47 l_output := utl_file.fopen( p_dir, p_filename || '.ctl', 'w' );
48
49 utl_file.put_line( l_output, 'load data' );
50 utl_file.put_line( l_output, 'infile ''' ||
51 p_filename || '.dat'' "str x''' ||
52 utl_raw.cast_to_raw( p_terminator ||
53 l_str ) || '''"' );
54 utl_file.put_line( l_output, 'into table ' || p_tname );
55 utl_file.put_line( l_output, p_mode );
56 utl_file.put_line( l_output, 'fields terminated by X''' ||
57 to_hex(p_separator) ||
58 ''' enclosed by X''' ||
59 to_hex(p_enclosure) || ''' ' );
60 utl_file.put_line( l_output, '(' );
61
62 for i in 1 .. g_descTbl.count
63 loop
64 if ( g_descTbl(i).col_type = 12 )
65 then
66 utl_file.put( l_output, l_sep || g_descTbl(i).col_name ||
67 ' date ''ddmmyyyyhh24miss'' ');
68 else
69 utl_file.put( l_output, l_sep || g_descTbl(i).col_name ||
70 ' char(' ||
71 to_char(g_descTbl(i).col_max_len*2) ||' )' );
72 end if;
73 l_sep := ','||g_nl ;
74 end loop;
75 utl_file.put_line( l_output, g_nl || ')' );
76 utl_file.fclose( l_output );
77 end;
78

Here is a simple function to return a quoted string using the chosen enclosure character. Notice how it not only encloses the character, but also doubles up the enclosure character if it exists in the string as well, so that they are preserved:

79 function quote(p_str in varchar2, p_enclosure in varchar2)
80 return varchar2
81 is
82 begin
83 return p_enclosure ||
84 replace( p_str, p_enclosure, p_enclosure||p_enclosure ) ||
85 p_enclosure;
86 end;
87

Next we have the main function, RUN. As it is fairly large, I’ll comment on it as we go along:

88 function run( p_query in varchar2,
89 p_tname in varchar2,
90 p_mode in varchar2 default 'REPLACE',
91 p_dir in varchar2,
92 p_filename in varchar2,
93 p_separator in varchar2 default ',',
94 p_enclosure in varchar2 default '"',
95 p_terminator in varchar2 default '|' ) return number
96 is
97 l_output utl_file.file_type;
98 l_columnValue varchar2(4000);
99 l_colCnt number default 0;
100 l_separator varchar2(10) default '';
101 l_cnt number default 0;
102 l_line long;
103 l_datefmt varchar2(255);
104 l_descTbl dbms_sql.desc_tab;
105 begin

We will save the NLS_DATE_FORMAT into a variable so we can change it to a format that preserves the date and time when dumping the data to disk. In this fashion, we will preserve the time component of a date. We then set up an exception block so that we can reset theNLS_DATE_FORMAT upon any error:

106 select value
107 into l_datefmt
108 from nls_session_parameters
109 where parameter = 'NLS_DATE_FORMAT';
110
111 /*
112 Set the date format to a big numeric string. Avoids
113 all NLS issues and saves both the time and date.
114 */
115 execute immediate
116 'alter session set nls_date_format=''ddmmyyyyhh24miss'' ';
117
118 /*
119 Set up an exception block so that in the event of any
120 error, we can at least reset the date format.
121 */
122 begin

Next we will parse and describe the query. The setting of G_DESCTBL to L_DESCTBL is done to reset the global table; otherwise, it might contain data from a previous DESCRIBE in addition to data for the current query. Once we have done that, we call DUMP_CTL to actually create the control file:

123 /*
124 Parse and describe the query. We reset the
125 descTbl to an empty table so .count on it
126 will be reliable.
127 */
128 dbms_sql.parse( g_theCursor, p_query, dbms_sql.native );
129 g_descTbl := l_descTbl;
130 dbms_sql.describe_columns( g_theCursor, l_colCnt, g_descTbl );
131
132 /*
133 Create a control file to reload this data
134 into the desired table.
135 */
136 dump_ctl( p_dir, p_filename, p_tname, p_mode, p_separator,
137 p_enclosure, p_terminator );
138
139 /*
140 Bind every single column to a varchar2(4000). We don't care
141 if we are fetching a number or a date or whatever.
142 Everything can be a string.
143 */

We are ready to dump the actual data out to disk. We begin by defining every column to be a VARCHAR2(4000) for fetching into. All NUMBERs, DATEs, RAWs—every type will be converted into VARCHAR2. Immediately after this, we execute the query to prepare for the fetching phase:

144 for i in 1 .. l_colCnt loop
145 dbms_sql.define_column( g_theCursor, i, l_columnValue, 4000);
146 end loop;
147
148 /*
149 Run the query - ignore the output of execute. It is only
150 valid when the DML is an insert/update or delete.
151 */

Now we open the data file for writing, fetch all of the rows from the query, and print it out to the data file:

152 l_cnt := dbms_sql.execute(g_theCursor);
153
154 /*
155 Open the file to write output to and then write the
156 delimited data to it.
157 */
158 l_output := utl_file.fopen( p_dir, p_filename || '.dat', 'w',
159 32760 );
160 loop
161 exit when ( dbms_sql.fetch_rows(g_theCursor) <= 0 );
162 l_separator := '';
163 l_line := null;
164 for i in 1 .. l_colCnt loop
165 dbms_sql.column_value( g_theCursor, i,
166 l_columnValue );
167 l_line := l_line || l_separator ||
168 quote( l_columnValue, p_enclosure );
169 l_separator := p_separator;
170 end loop;
171 l_line := l_line || p_terminator;
172 utl_file.put_line( l_output, l_line );
173 l_cnt := l_cnt+1;
174 end loop;
175 utl_file.fclose( l_output );
176

Lastly, we set the date format back (and the exception block will do the same if any of the preceding code fails for any reason) to what it was and return:

177 /*
178 Now reset the date format and return the number of rows
179 written to the output file.
180 */
181 execute immediate
182 'alter session set nls_date_format=''' || l_datefmt || '''';
183 return l_cnt;
184 exception
185 /*
186 In the event of ANY error, reset the date format and
187 re-raise the error.
188 */
189 when others then
190 execute immediate
191 'alter session set nls_date_format=''' || l_datefmt || '''';
192 RAISE;
193 end;
194 end run;
195
196
197 end unloader;
198 /
Package body created.

To run this, we can simply use the following (note that the following does, of course, require that you have SELECT on SCOTT.EMP granted to one of your roles or to yourself directly):

EODA@ORA12CR1> set serveroutput on

EODA@ORA12CR1> create or replace directory my_dir as '/tmp';
Directory created.

EODA@ORA12CR1> declare
2 l_rows number;
3 begin
4 l_rows := unloader.run
5 ( p_query => 'select * from scott.emp order by empno',
6 p_tname => 'emp',
7 p_mode => 'replace',
8 p_dir => 'MY_DIR',
9 p_filename => 'emp',
10 p_separator => ',',
11 p_enclosure => '"',
12 p_terminator => '~' );
13
14 dbms_output.put_line( to_char(l_rows) ||
15 ' rows extracted to ascii file' );
16 end;
17 /
14 rows extracted to ascii file
PL/SQL procedure successfully completed.

The control file that was generated by this shows the following (note that the numbers in parentheses in bold on the right are not actually in the file; they are solely for reference purposes):

load data (1)
infile 'emp.dat' "str x'7E0A'" (2)
into table emp (3)
replace (4)
fields terminated by X'2c' enclosed by X'22' (5)
( (6)
EMPNO char(44), (7)
ENAME char(20), (8)
JOB char(18), (9)
MGR char(44), (10)
HIREDATE date 'ddmmyyyyhh24miss' , (11)
SAL char(44), (12)
COMM char(44), (13)
DEPTNO char(44), (14)
) (15)

The things to note about this control file are as follows:

· Line (2): We use the STR feature of SQLLDR. We can specify what character or string is used to terminate a record. This allows us to load data with embedded newlines easily. The string x'7E0A' is simply a tilde followed by a newline.

· Line (5): We use our separator character and enclosure character. We do not use OPTIONALLY ENCLOSED BY, since we will be enclosing every single field after doubling any occurrence of the enclosure character in the raw data.

· Line (11): We use a large numeric date format. This does two things: it avoids any NLS issues with regard to the data, and it preserves the time component of the date field.

The raw data (.dat) file generated from the preceding code looks like this:

"7369","SMITH","CLERK","7902","17121980000000","800","","20"~
"7499","ALLEN","SALESMAN","7698","20021981000000","1600","300","30"~
"7521","WARD","SALESMAN","7698","22021981000000","1250","500","30"~
"7566","JONES","MANAGER","7839","02041981000000","2975","","20"~
"7654","MARTIN","SALESMAN","7698","28091981000000","1250","1400","30"~
"7698","BLAKE","MANAGER","7839","01051981000000","2850","","30"~
"7782","CLARK","MANAGER","7839","09061981000000","2450","","10"~
"7788","SCOTT","ANALYST","7566","19041987000000","3000","","20"~
"7839","KING","PRESIDENT","","17111981000000","5000","","10"~
"7844","TURNER","SALESMAN","7698","08091981000000","1500","0","30"~
"7876","ADAMS","CLERK","7788","23051987000000","1100","","20"~
"7900","JAMES","CLERK","7698","03121981000000","950","","30"~
"7902","FORD","ANALYST","7566","03121981000000","3000","","20"~
"7934","MILLER","CLERK","7782","23011982000000","1300","","10"~

Things to note in the .dat file are as follows:

· Each field is enclosed in our enclosure character.

· The DATES are unloaded as large numbers.

· Each line of data in this file ends with a ~ as requested.

We can now reload this data easily using SQLLDR. You may add options to the SQLLDR command line as you see fit.

As stated previously, the logic of the unload package may be implemented in a variety of languages and tools. On the Ask Tom web site, you will find this example implemented not only in PL/SQL as it is here but also in Pro*C and SQL*Plus scripts. Pro*C is the fastest implementation, and it always writes to the client workstation file system. PL/SQL is a good all-around implementation (no need to compile and install on client workstations), but it always writes to the server file system. SQL*Plus is a good middle ground, offering fair performance and the ability to write to the client file system.

Summary

In this chapter, we covered many of the ins and outs of data loading and unloading. First we discussed the advantages that external tables have over SQLLDR. Then we looked at easy techniques to get started with external tables. We also showed examples of using the PREPROCESSORdirective to execute OS commands prior to loading the data.

Then we looked at an Oracle 10g and above feature, the external table unload, and the ability to easily create and move extracts of data from database to database. We wrapped this up by investigating how to unload data from a table into a dump file that can be used to move data from one database to another.

We discussed that in most scenarios you should be using external tables and not SQLLDR. However, there are some situations that may require the use of SQLLDR, like loading data over the network. We then examined many of the basic techniques for loading delimited data, fixed-width data, LOBs, and the like.

Finally, we looked at the reverse process, data unloading, and how to get data out of the database in a format that other tools—such as spreadsheets or the like—may use. In the course of that discussion, we developed a PL/SQL utility to demonstrate the process—one that unloads data in a SQLLDR (or external table) friendly format, but could easily be modified to meet your needs.