Retrieval: Multiple Tables and Aggregation - Beginning Oracle SQL: For Oracle Database 12c, Third Edition (2014)

Beginning Oracle SQL: For Oracle Database 12c, Third Edition (2014)

Chapter 8. Retrieval: Multiple Tables and Aggregation

This chapter resumes the discussion of the retrieval possibilities of the SQL language. It is a logical continuation of Chapters 4 and 5.

Section 8.1 introduces the concept of row or tuple variables. We did not discuss them so far, because we haven’t needed them up to now. By the way, most SQL textbooks don’t mention tuple variables at all—at least not the way this book does. When you start specifying multiple tables in the FROM clause of your SELECT statements, it is a good idea to start using tuple variables (also referred to as table aliases in Oracle) in a consistent way.

Section 8.2 explains joins, which specify a comma-separated list of table names in the FROM clause and filter the desired row combinations with the WHERE clause. Section 8.3 shows the ANSI/ISO standard syntax to produce joins, and Section 8.4 goes into more details about outer joins.

In large information systems (containing huge amounts of detailed information), it is quite common to be interested in aggregated (or summarized or condensed) information. For example, you may want to get a course overview for a specific year, showing the number of attendees per course, with the average evaluation scores. You can formulate the underlying queries you need for such reports by using the GROUP BY clause of the SELECT command. Group functions (such as COUNT, AVG, MIN, and MAX) play an important role in such queries. If you have aggregated your data with a GROUP BY clause, you can optionally use the HAVING clause to filter query results at the group level. Topics surrounding basic aggregation are covered in Sections 8.5, 8.6, and 8.7. Section 8.8 continues the discussion of aggregation to introduce some more advanced features of the GROUP BY clause, such as CUBE and ROLLUP. Section 8.9 introduces the concept of partitioned outer joins. Section 8.10 finishes with the three set operators of the SQL language: UNION, MINUS, and INTERSECT.

8.1 Tuple Variables

Until now, we have formulated our SQL statements as follows:

select ename, init, job
from employees
where deptno = 20;

Actually, this statement is rather incomplete. In this chapter, we must be a little more precise, because the SQL commands are getting slightly more complicated. To be complete and accurate, we should have written this statement as shown in Listing 8-1.

Listing 8-1. Using Tuple Variables in a Query

select e.ename, e.init, e.job
from employees e
where e.deptno = 20;

In this example, e is a tuple variable. Tuple is just a “dignified” term for row, derived from the relational theory. In Oracle, tuple variables are referred to as table aliases (which is actually rather confusing), and the ANSI/ISO standard talks about correlation names.

Note the syntax in Listing 8-1: You “declare” the tuple variable in the FROM clause, immediately following the table name, separated by white space only.

A tuple variable always ranges over a table, or a table expression. In other words, in the example in Listing 8-1, e is a variable representing one row from the EMPLOYEES table at any time. Within the context of a specific row, you can refer to specific column (or attribute) values, as shown in the SELECT and WHERE clauses of the example in Listing 8-1. The tuple variable precedes the column name, separated by a period. Figure 8-1 shows the column reference e.JOB and its value ADMIN for employee 7900.

9781430265566_Fig08-01.jpg

Figure 8-1. The EMPLOYEES table with a tuple variable

Do you remember those old-fashioned calendars with one page per month, with a transparent strip that could move up and down to select a certain week, and a little window that could move on that strip from the left to the right to select a specific day of the month? If not, Figure 8-2shows an example of such a calendar. The transparent strip would be the tuple variable in this metaphor.

9781430265566_Fig08-02.jpg

Figure 8-2. Calendar with sliding day indicator window

Using the concept of tuple variables, we can describe the execution of the SQL command in Listing 8-1 as follows:

1. The tuple variable e ranges (row by row) over the EMPLOYEES table (the row order is irrelevant).

2. Each row e is checked against the WHERE clause, and it is passed to an intermediate result set if the WHERE clause evaluates to TRUE.

3. For each row in the intermediate result set, the expressions in the SELECT clause are evaluated to produce the final query result.

As long as you are writing simple queries (as we have done so far in this book); you don’t need to worry about tuple variables. The Oracle DBMS understands your SQL intentions anyway. However, as soon as your SQL statements become more complicated, it might be wise (or even mandatory) to start using tuple variables. Tuple variables always have at least one advantage: they enhance the readability and maintainability of your SQL code.

8.2 Joins

You can specify multiple tables in the FROM component of a query. We start this section with an intended mistake, to evoke an Oracle error message. See what happens in Listing 8-2 where our intention is to discover which employees belong to which departments.

Listing 8-2. Ambiguously Defined Columns

select deptno, location, ename, init
from employees, departments;

select deptno, location, ename, init
*
ERROR at line 1:
ORA-00918: column ambiguously defined

The message, including the asterisk (*), reveals the problem here. The Oracle DBMS cannot figure out which DEPTNO column we are referring to. Both tables mentioned in the FROM clause have a DEPTNO column, and that’s why we get an error message.

Cartesian Products

See Listing 8-3 for a second attempt to find which employees belong to which departments. Because we fixed the ambiguity issue, we get query results, but these results don’t meet our expectations. The tuple variables e and d range freely over both tables, because there is no constraining theWHERE clause; therefore, the query result we get is the Cartesian product of both tables, resulting in 56 rows. We have 14 employees and 4 departments, and 14 times 4 results in 56 possible combinations of all rows of employees and all rows of departments.

Listing 8-3. The Cartesian Product of Two Tables

select d.deptno, d.location, e.ename, e.init
from employees e, departments d;

DEPTNO LOCATION ENAME INIT
-------- -------- -------- -----
10 NEW YORK SMITH N
10 NEW YORK ALLEN JAM
10 NEW YORK WARD TF
10 NEW YORK JONES JM
10 NEW YORK MARTIN P
10 NEW YORK BLAKE R
10 NEW YORK CLARK AB
10 NEW YORK SCOTT SCJ
...
40 BOSTON ADAMS AA
40 BOSTON JONES R
40 BOSTON FORD MG
40 BOSTON MILLER TJA

Equijoins

The results in Listing 8-3 reveal the remaining problem: the query lacks a WHERE clause. In Listing 8-4, we fix the problem by adding a WHERE clause, and we also add an ORDER BY clause to get the results ordered by department, and within each department, by employee name.

Listing 8-4. Joining Two Tables

select d.deptno, d.location, e.ename, e.init
from employees e, departments d
where e.deptno = d.deptno
order by d.deptno, e.ename;

DEPTNO LOCATION ENAME INIT
-------- -------- -------- -----
10 NEW YORK CLARK AB
10 NEW YORK KING CC
10 NEW YORK MILLER TJA
20 DALLAS ADAMS AA
20 DALLAS FORD MG
20 DALLAS JONES JM
20 DALLAS SCOTT SCJ
20 DALLAS SMITH N
30 CHICAGO ALLEN JAM
30 CHICAGO BLAKE R
30 CHICAGO JONES R
30 CHICAGO MARTIN P
30 CHICAGO TURNER JJ
30 CHICAGO WARD TF

Listing 8-4 shows a join or, to be more precise, an equijoin. This is the most common join type in SQL.

SQL LAYOUT CONVENTIONS

Your SQL statements should be correct in the first place, of course. As soon as SQL statements get longer and more complicated, it becomes more and more important to adopt a certain layout style. Additional white space (spaces, tabs, and new lines) has no meaning in the SQL language, but it certainly enhances code readability and maintainability. You could have spread the query in Listing 8-4 over multiple lines, as follows:

select d.deptno
, d.location
, e.ename
, e.init
from employees e
, departments d
where e.deptno = d.deptno
order by d.deptno
, e.ename;

This SQL layout convention has proved itself to be very useful in practice. Note the placement of the commas at the beginning of the next line as opposed to the end of the current line. This makes adding and removing lines easier, resulting in fewer unintended errors. Any other standard is fine, too. This is mostly a matter of taste. Just make sure to adopt a style and use it consistently.

Non-equijoins

If you use a comparison operator other than an equal sign in the WHERE clause in a join, it is called a non-equijoin or thetajoin. For an example of a thetajoin, see Listing 8-5, which calculates the total annual salary for each employee.

Listing 8-5. Thetajoin Example

select e.ename employee
, 12*e.msal+s.bonus total_salary
from employees e
, salgrades s
where e.msal between s.lowerlimit
and s.upperlimit;

EMPLOYEE TOTAL_SALARY
-------- ------------
SMITH 9600
JONES 9600
ADAMS 13200
WARD 15050
MARTIN 15050
MILLER 15650
TURNER 18100
ALLEN 19300
CLARK 29600
BLAKE 34400
JONES 35900
SCOTT 36200
FORD 36200
KING 60500

By the way, you can choose any name you like for your tuple variables. Listing 8-5 uses e and s, but any other names would work, including longer names consisting of any combination of letters and digits. Enhanced readability is the only reason why this book uses (as much as possible) the first characters of table names as tuple variables in SQL statements.

Joins of Three or More Tables

Let’s try to enhance the query of Listing 8-5. In a third column, we also want to see the name of the department that the employee works for. Department names are stored in the DEPARTMENTS table, so we add three more lines to the query, as shown in Listing 8-6.

Listing 8-6. Joining Three Tables

select e.ename employee
, 12*e.msal+s.bonus total_salary
, d.dname department
from employees e
, salgrades s
, departments d
where e.msal between s.lowerlimit
and s.upperlimit
and e.deptno = d.deptno;

EMPLOYEE TOTAL_SALARY DEPARTMENT
-------- ------------ ----------
SMITH 9600 TRAINING
JONES 9600 SALES
ADAMS 13200 TRAINING
WARD 15050 SALES
MARTIN 15050 SALES
MILLER 15650 ACCOUNTING
TURNER 18100 SALES
ALLEN 19300 SALES
CLARK 29600 ACCOUNTING
BLAKE 34400 SALES
JONES 35900 TRAINING
SCOTT 36200 TRAINING
FORD 36200 TRAINING
KING 60500 ACCOUNTING

The main principle is simple. We now have three free tuple variables (e, s, and d) ranging over three tables. Therefore, we need (at least) two conditions in the WHERE clause to get the correct row combinations in the query result.

For the sake of completeness, you should note that the SQL language supports table names as default tuple variables, without the need to declare them explicitly in the FROM clause. Look at the following example:

select employees.ename, departments.location
from employees, departments
where employees.deptno = departments.deptno;

This SQL statement is syntactically correct. However, we will avoid using this SQL “feature” in this book. It is rather confusing to refer to a table in one place and to refer to a specific row from a table in another place with the same name, without making a clear distinction between row and table references. Moreover, the names of the tables used in this book are long enough to justify declaring explicit tuple variables in the FROM clause and using them everywhere else in the SQL statement, thus reducing the number of keystrokes.

Self-Joins

In SQL, you can also join a table to itself. Although this join type is essentially the same as a regular join, it has its own name: autojoin or self-join. In other words, autojoins contain tables being referenced more than once in the FROM clause. This provides another good reason why you should use explicit tuple variables (as opposed to relying on table names as implicit tuple variables) in your SQL statements. In autojoins, the table names result in ambiguity issues. So why not use tuple variables consistently in all your SQL statements?

Listing 8-7 shows an example of an autojoin. The query produces an overview of all employees born after January 1, 1965, with a second column showing the name of their managers. (You may want to refer to Figure C-3 in Appendix C, which shows a diagram of the hierarchy of theEMPLOYEES table.)

Listing 8-7. Autojoin (Self-Join) Example

select e.ename as employee
, m.ename as manager
from employees m
, employees e
where e.mgr = m.empno
and e.bdate > date '1965-01-01';

EMPLOYEE MANAGER
-------- --------
TURNER BLAKE
JONES BLAKE
ADAMS SCOTT
JONES KING
CLARK KING
SMITH FORD

Because we have two tuple variables e and m, both ranging freely over the same table, we get 14 × 14 = 196 possible row combinations. The WHERE clause filters out the correct combinations, where row m reflects the manager of row e.

8.3 The JOIN Clause

The join examples shown in the previous section use the Cartesian product operator (the comma in the FROM clause) as a starting point, and then filter the rows using an appropriate WHERE clause. There’s absolutely nothing wrong with that approach, and the syntax is fully compliant with the ANSI/ISO SQL standard, but the ANSI/ISO SQL standard also supports alternative syntax to specify joins. This alternative join syntax is covered in this section.

First, let’s look again at the join statement in Listing 8-7. You could argue that the WHERE clause of that query contains two different condition types: line 5 contains the join condition to make sure you combine the right rows, and line 6 is a “real” (non-join) condition to filter the employees based on their birth dates.

Listing 8-8 shows an equivalent query, producing the same results, using a different syntax. Note the keywords JOIN and ON. Also note that this join syntax doesn’t use any commas in the FROM clause.

Listing 8-8. JOIN . . . ON Example

select e.ename as employee
, m.ename as manager
from employees m
JOIN
employees e
ON e.mgr = m.empno
where e.bdate > date '1965-01-01'
order by employee;

EMPLOYEE MANAGER
-------- --------
ADAMS SCOTT
CLARK KING
JONES BLAKE
JONES KING
SMITH FORD
TURNER BLAKE

The syntax of Listing 8-8 is more elegant than the syntax in Listing 8-7, because the join is fully specified in the FROM clause and the WHERE clause contains only the filtering (i.e., the non-join) condition.

Natural Joins

You can also use the NATURAL JOIN operator in the FROM clause. Listing 8-9 shows an example that joins the EMPLOYEES table with the HISTORY table.

Question: Before reading on, please try to answer, how is it possible that Listing 8-9 produces 15 rows in the result, instead of 14?

Listing 8-9. Natural Join Example

select ename, beginyear, msal, deptno
from employees
natural join
history;

ENAME BEGINYEAR MSAL DEPTNO
-------- --------- -------- --------
SMITH 2000 800 20
ALLEN 1999 1600 30
WARD 1992 1250 30
WARD 2000 1250 30
JONES 1999 2975 20
MARTIN 1999 1250 30
BLAKE 1989 2850 30
CLARK 1988 2450 10
SCOTT 2000 3000 20
KING 2000 5000 10
TURNER 2000 1500 30
ADAMS 2000 1100 20
JONES 2000 800 30
FORD 2000 3000 20
MILLER 2000 1300 10

Explanation: To understand what’s happening in Listing 8-9, you must know how the NATURAL JOIN operator is defined in the SQL language. Listing 8-9 illustrates the behavior of the NATURAL JOIN operator:

1. The NATURAL JOIN operator determines which columns the two tables (EMPLOYEES and HISTORY) have in common. In this case, these are the three columns EMPNO, MSAL, and DEPTNO.

2. It joins the two tables (using an equijoin) over all columns they have in common.

3. It suppresses the duplicate columns resulting from the join operation in the previous step. This is why you don’t get an error message about MSAL and DEPTNO in the SELECT clause being ambiguously defined.

4. Finally, the NATURAL JOIN operator evaluates the remaining query clauses. In Listing 8-9, the only remaining clause is the SELECT clause. The final result shows the desired four columns.

Apparently, every employee occurs only once in the result, except WARD. This means that this employee has been employed by the same department (30) for the same salary (1250) during two distinct periods of his career. This is a pure coincidence. If the query had returned 14 rows instead of 15, we would probably not have been triggered to investigate the query for correctness. Remember that some wrong queries may give “correct” results by accident.

This example shows that you should be very careful when using the NATURAL JOIN operator. Probably the biggest danger is that a natural join may “suddenly” start producing strange and undesirable results if you add new columns to your tables, or you rename existing columns, thus accidentally creating matching column names.

image Caution Natural joins are safe only if you practice a very strict column-naming standard in your database designs.

Equijoins on Columns with the Same Name

SQL offers an alternative way to specify equijoins, allowing you to explicitly specify the columns you want to participate in the equijoin operation. As you saw in Listing 8-8, you can use the ON clause followed by fully specified join predicates. You can also use the USING clause, specifying column names instead of full predicates. See Listing 8-10 for an example.

Listing 8-10. JOIN . . . USING Example

select e.ename, e.bdate
, h.deptno, h.msal
from employees e
join
history h
using (empno)
where e.job = 'ADMIN';

ENAME BDATE DEPTNO MSAL
-------- ----------- -------- --------
JONES 03-DEC-1969 30 800
MILLER 23-JAN-1962 10 1275
MILLER 23-JAN-1962 10 1280
MILLER 23-JAN-1962 10 1290
MILLER 23-JAN-1962 10 1300

Note that you need tuple variables again, because you join over only the EMPNO column; the columns h.DEPTNO and e.DEPTNO are now different.

Figure 8-3 shows the syntax diagram of the ANSI/ISO join syntax, including the NATURAL JOIN operator, the ON clause, and the USING clause.

9781430265566_Fig08-03.jpg

Figure 8-3. ANSI/ISO join syntax diagram

Note that you can also use a CROSS JOIN syntax. The result is identical to the effect of the comma operator in the FROM clause: the Cartesian product. The CROSS JOIN syntax prevents the use of the ON or USING clauses, but permits the use of WHERE clause predicates, which is how the Oracle SQL language syntax worked prior to the introduction of the ANSI/ISO SQL syntax. To put it another way, think of CROSS JOIN as the name for the “old-fashioned” join syntax in the context of the newer ANSI/ISO SQL join syntax.

The examples in the remainder of this book will show a mixture of “old-fashioned” joins (as introduced in Section 8.2) and the alternative ANSI/ISO SQL join syntax explained in this section.

8.4 Outer Joins

Earlier in the chapter, in Listing 8-4, we executed a regular join (an equijoin) similar to the one shown in Listing 8-11.

Listing 8-11. Regular Join

select d.deptno, d.location
, e.ename, e.init
from employees e, departments d
where e.deptno = d.deptno
order by d.deptno, e.ename;

DEPTNO LOCATION ENAME INIT
-------- -------- -------- -----
10 NEW YORK CLARK AB
10 NEW YORK KING CC
10 NEW YORK MILLER TJA
20 DALLAS ADAMS AA
20 DALLAS FORD MG
20 DALLAS JONES JM
20 DALLAS SCOTT SCJ
20 DALLAS SMITH N
30 CHICAGO ALLEN JAM
30 CHICAGO BLAKE R
30 CHICAGO JONES R
30 CHICAGO MARTIN P
30 CHICAGO TURNER JJ
30 CHICAGO WARD TF

The result in Listing 8-11 shows no rows for department 40, for an obvious reason: that department does exist in the DEPARTMENTS table, but it has no corresponding employees. In other words, if tuple variable d refers to department 40, there is not a single row e in the EMPLOYEEStable to make the WHERE clause evaluate to TRUE.

If you want the fact that department 40 exists to be reflected in your join results, you can make that happen with an outer join. For outer joins in Oracle, you can choose between two syntax options:

· The “old” outer join syntax, supported by Oracle since many releases, and implemented many years before the ANSI/ISO standard defined a more elegant outer join syntax

· The ANSI/ISO standard outer join syntax

We will discuss an example of both outer join syntax variants, based on the regular join in Listing 8-11.

Old Oracle-Specific Outer Join Syntax

First, change the fourth line of the command in Listing 8-11 and add a plus sign between parentheses, as shown in Listing 8-12.

Listing 8-12. The (+) Outer Join Syntax

select d.deptno, d.location
, e.ename, e.init
from employees e, departments d
where e.deptno(+) = d.deptno
order by d.deptno, e.ename;

DEPTNO LOCATION ENAME INIT
-------- -------- -------- -----
10 NEW YORK CLARK AB
10 NEW YORK KING CC
10 NEW YORK MILLER TJA
20 DALLAS ADAMS AA
20 DALLAS FORD MG
20 DALLAS JONES JM
20 DALLAS SCOTT SCJ
20 DALLAS SMITH N
30 CHICAGO ALLEN JAM
30 CHICAGO BLAKE R
30 CHICAGO JONES R
30 CHICAGO MARTIN P
30 CHICAGO TURNER JJ
30 CHICAGO WARD TF
40 BOSTON

As you can see, department 40 now also appears in the result. The effect of the addition (+)in the WHERE clause has combined department 40 with two null values for the employee data. The main disadvantage of this outer join syntax is that you must make sure to add the (+) operator in the right places in your SQL command, namely on the “outer” or optional side of the join condition. Failing to do so normally results in disabling the outer join effect. Another disadvantage of this outer join syntax is its lack of readability.

New Outer Join Syntax

The new ANSI/ISO outer join syntax is much more elegant and readable. Listing 8-13 shows the version to get the same results as in Listing 8-12.

Listing 8-13. ANSI/ISO Outer Join Example

select deptno, d.location
, e.ename, e.init
from employees e
right outer join
departments d
using (deptno)
order by deptno, e.ename;

DEPTNO LOCATION ENAME INIT
-------- -------- -------- -----
10 NEW YORK CLARK AB
10 NEW YORK KING CC
10 NEW YORK MILLER TJA
20 DALLAS ADAMS AA
20 DALLAS FORD MG
20 DALLAS JONES JM
20 DALLAS SCOTT SCJ
20 DALLAS SMITH N
30 CHICAGO ALLEN JAM
30 CHICAGO BLAKE R
30 CHICAGO JONES R
30 CHICAGO MARTIN P
30 CHICAGO TURNER JJ
30 CHICAGO WARD TF
40 BOSTON

In Listing 8-13 we used a RIGHT OUTER JOIN, because we suspect the presence of rows at the right-hand side (the DEPARTMENTS table) without corresponding rows at the left-hand side (the EMPLOYEES table). If you switched the two table names in the FROM clause, you would need the LEFT OUTER JOIN operator. Oracle also supports the FULL OUTER JOIN syntax, where both tables participating in the join operation handle rows without corresponding rows on the other side in a special way. Figure 8-4 shows all three outer join syntax possibilities.

The outer join operator is especially useful if you want to aggregate (summarize) data; for example, when you want to produce a course overview showing the number of attendees for each scheduled course. In such an overview, you obviously also want to see all scheduled courses for which no registrations are entered yet, so you might consider cancelling or postponing those courses. This type of query (with aggregation) is the topic of Section 8.5.

9781430265566_Fig08-04.jpg

Figure 8-4. ANSI/ISO outer join syntax diagram

Outer Joins and Performance

Although outer joins obviously imply some additional processing for the DBMS, there is no reason to avoid outer joins for performance reasons. The Oracle optimizer knows how to handle outer joins efficiently. Moreover, given a certain data model, you sometimes need outer joins. Don’t try to invent your own workarounds in such cases, and don’t believe unfounded statements like “outer joins are bad.” As always, seek ways to empirically prove or disprove such theories, using SQL traces and utilities such as SQL*Plus AUTOTRACE, rather than blindly accepting someone’s unfounded assertion.

In Section 8.9, we will revisit outer joins to discuss partitioned outer joins.

8.5 The GROUP BY Component

Until now, we have considered queries showing information about only individual rows. Each row in our query results so far had a one-to-one correspondence with some row in the database. However, in real life, you often want to produce aggregated information from a database, where the rows in the query results represent information about a set of database rows. For example, you might want to produce an overview showing the number of employees (the head count) per department. For this type of query, you need the GROUP BY clause of the SELECT command, as shown in Listing 8-14.

Listing 8-14. The GROUP BY Clause

select e.deptno as "department"
, count(e.empno) as "number of employees"
from employees e
group by e.deptno;

department number of employees
---------- -------------------
10 3
20 5
30 6

Listing 8-14 shows the COUNT function at work, to count the number of employees per department. COUNT is an example of a group function, and we’ll look at it and the other group functions in Section 8.6.

The result of this query is a table, of course—just like any result of a query. However, there is no one-to-one mapping anymore between the rows of the EMPLOYEES table and the three rows of the result. Instead, you aggregate employee data per department.

To explain how the GROUP BY operator works, and how the SQL language handles aggregation, Listing 8-15 shows an imaginary representation of an intermediate result. Listing 8-15 shows a pseudo-table, with three rows and six columns. For readability, some columns of theEMPLOYEES table are omitted. In the last column, you see the three different department numbers, and the other five columns show sets of attribute values. These sets are represented by enumerating their elements in a comma-separated list between braces. Some of these sets contain null values only, such as e.COMM for departments 10 and 20.

Listing 8-15. The Effect of GROUP BY e.DEPTNO

e.EMPNO e.JOB e.MGR e.MSAL e.COMM e.DEPTNO
======= ============ ====== ====== ====== ========
{7782 {'MANAGER' {7839 {2450 {NULL 10
,7839 ,'DIRECTOR' ,NULL ,5000 ,NULL
,7934} ,'ADMIN' } ,7782} ,1300} ,NULL}
-----------------------------------------------------------------
{7369 {'TRAINER' {7902 { 800 {NULL 20
,7566 ,'MANAGER' ,7839 ,2975 ,NULL
,7788 ,'TRAINER' ,7566 ,3000 ,NULL
,7876 ,'TRAINER' ,7788 ,1100 ,NULL
,7902} ,'TRAINER'} ,7566} ,3000} ,NULL}
-----------------------------------------------------------------
{7499 {'SALESREP' {7698 {1600 { 300 30
,7521 ,'SALESREP' ,7698 ,1250 , 500
,7654 ,'SALESREP' ,7698 ,1250 ,1400
,7698 ,'MANAGER' ,7839 ,2850 ,NULL
,7844 ,'SALESREP' ,7698 ,1500 , 0
,7900} ,'ADMIN' } ,7698} , 800} ,NULL}
-----------------------------------------------------------------

image Note The representation in Listing 8-15 is purely fictitious and only serves educational purposes. Data structures as shown in Listing 8-15 do not occur in reality.

Going back to Listing 8-14, it now becomes clear what the COUNT(e.EMPNO) function does: it returns the number of elements of each e.EMPNO set.

You could argue that (as an effect of the GROUP BY e.DEPTNO clause) the last column in Listing 8-15 (e.DEPTNO) contains “regular” values, and the other five columns become “set-valued” attributes. You can use only e.DEPTNO in the SELECT clause. If you want to see data from the other columns in your query result, you must use group functions (such as COUNT) to aggregate those sets into a single value. See the next section for a discussion of group functions.

image Note To be more precise, we should refer to multisets instead of sets in this context. Duplicate values are maintained, as you can see in Listing 8-15. We will discuss multisets in Chapter 12.

Multiple-Column Grouping

You can also group on multiple-column expressions, separated by commas. For example, the query in Listing 8-16 produces an overview of the number of registrations per course.

Listing 8-16. Grouping on Two Columns

select r.course, r.begindate
, count(r.attendee) as attendees
from registrations r
group by r.course, r.begindate;

COURSE BEGINDATE ATTENDEES
------ ----------- ---------
JAV 13-DEC-1999 5
JAV 01-FEB-2000 3
OAU 10-AUG-1999 3
OAU 27-SEP-2000 1
PLS 11-SEP-2000 3
SQL 12-APR-1999 4
SQL 04-OCT-1999 3
SQL 13-DEC-1999 2
XML 03-FEB-2000 2

This result shows one row for each different (COURSE, BEGINDATE) combination found in the REGISTRATIONS table.

image Note As you can see, the rows in Listing 8-16 are ordered on the columns of the GROUP BY clause. However, if you want a certain ordering of your query results, you should never rely on implicit DBMS behavior and always specify an ORDER BY clause.

GROUP BY and Null Values

If a column expression on which you apply the GROUP BY clause contains null values, these null values end up together in a separate group. See Listing 8-17 for an example.

Listing 8-17. GROUP BY and Null Values

select e.comm, count(e.empno)
from employees e
group by e.comm;

COMM COUNT(E.EMPNO)
-------- --------------
0 1
300 1
500 1
1400 1
10

Apparently, we have ten employees without commission.

8.6 Group Functions

In the previous section, we used the COUNT function to count the number of employees per department and the number of registrations per course. COUNT is an example of a group function. All group functions have two important properties in common:

· They can be applied only to sets of values.

· They return a single aggregated value, derived from that set of values.

That’s why group functions often occur in combination with GROUP BY (and optionally the HAVING clause, covered in Section 8.7) in SQL commands. The most important Oracle group functions are listed in Table 8-1.

Table 8-1. Common Oracle Group Functions

Function

Description

Applicable To

COUNT()

Number of values

All datatypes

SUM()

Sum of all values

Numeric data

MIN()

Minimum value

All datatypes

MAX()

Maximum value

All datatypes

AVG()

Average value

Numeric data

MEDIAN()

Median (middle value)

Numeric or date (time) data

STATS_MODE()

Modus (most frequent value)

All datatypes

STDDEV()

Standard deviation

Numeric data

VARIANCE()

Statistical variance

Numeric data

The last column in Table 8-1 shows the applicable datatypes for all group functions. The functions MIN and MAX are applicable to any datatype, including dates and alphanumeric strings. MIN and MAX need only an ordering (sorting) criterion for the set of values. Note also that you can apply the AVG function only to numbers, because the average is defined as the SUM divided by the COUNT, and the SUM function accepts only numeric data.

Let’s look at some group function examples in Listing 8-18.

Listing 8-18. Some Examples of Group Functions

select e.deptno
, count(e.job)
, sum(e.comm)
, avg(e.msal)
, median(e.msal)
from employees e
group by e.deptno;

DEPTNO COUNT(E.JOB) SUM(E.COMM) AVG(E.MSAL) MEDIAN(E.MSAL)
-------- ------------ ----------- ----------- --------------
10 3 2916.667 2450
20 5 2175 2975
30 6 2200 1541.667 1375

Group Functions and Duplicate Values

If you apply a group function to a set of column values, that set of values may contain duplicate values. By default, these duplicate values are all treated as individual values, contributing to the end result of all group functions applied to the set of values. For example, we have five employees in department 20, but we have only two different jobs in that department. Nevertheless, Listing 8-18 shows 5 as the result of COUNT(e.JOB) for department 20.

If you want SQL group functions to ignore duplicate values (except one, of course), you must specify the keyword DISTINCT immediately after the first parenthesis. Although it is syntactically correct, the addition of DISTINCT is meaningless for the MIN and MAX functions. Look atListing 8-19 for some examples.

Listing 8-19. Using the DISTINCT Option for Group Functions

select count(deptno), count(distinct deptno)
, avg(comm), avg(coalesce(comm,0))
from employees;
COUNT(DEPTNO) COUNT(DISTINCTDEPTNO) AVG(COMM) AVG(COALESCE(COMM,0))
------------- --------------------- --------- ---------------------
14 3 550 157.1429

Note that Listing 8-19 also shows that you can use group functions in the SELECT clause of a query without a GROUP BY clause. The absence of a GROUP BY clause in combination with the presence of group functions in the SELECT clause always results in a single-row result. In other words, the full table is aggregated into a single row. You can achieve precisely the same result by grouping on a constant expression. Try this yourself; for example, see what happens if you add GROUP BY 'x' to the query in Listing 8-19.

Group Functions and Null Values

The ANSI/ISO SQL standard postulates group functions to ignore null values completely. There is only one exception to this rule: the COUNT(*) function. This special case is discussed later in this section. This is a reasonable compromise. The only other consistent behavior for group functions would be to return a null value as soon as the input contains a null value. This would imply that all your SQL statements (containing group functions) should contain additional code to handle null values explicitly. So, ignoring null values completely is not a bad idea. Just make sure that you understand the consequences of this behavior. See Table 8-2 for some typical examples.

Table 8-2. Behavior of Group Functions and Null Values

Table8-2.jpg

The SUMfunction does not make any distinction between {1,2,3,NULL} and {1,2,3,0}. The MIN and AVG functions don’t make any distinction between {1,2,3,NULL} and {1,2,3,2}. The MAX function gives the same result on all three sets.

Looking back at Listing 8-19, you see an example of function nesting: the AVG function operates on the result of the COALESCE function. This is a typical method to handle null values explicitly. As you can see from Listing 8-19, the results of AVG(COMM) andAVG(COALESCE(COMM,0)) are obviously different. In this case, the Oracle DBMS replaces all null values by zeros before applying the AVG function, because the null values in the COMM column actually mean “not applicable.”

The next query, shown in Listing 8-20, tells us how many different courses are scheduled for each trainer and the total number of scheduled courses.

Listing 8-20. GROUP BY and DISTINCT

select trainer
, count(distinct course)
, count(*)
from offerings
group by trainer;

TRAINER COUNT(DISTINCTCOURSE) COUNT(*)
-------- --------------------- --------
7369 2 3
7566 2 2
7788 2 2
7876 1 1
7902 2 2
3 3

Apparently, we have three course offerings without a trainer being assigned.

Grouping the Results of a Join

The query in Listing 8-21 shows the average evaluation ratings for each trainer, over all courses delivered.

Listing 8-21. GROUP BY on a Join

select o.trainer, avg(r.evaluation)
from offerings o
join
registrations r
using (course,begindate)
group by o.trainer;

TRAINER AVG(R.EVALUATION)
-------- -----------------
7369 4
7566 4.25
7788
7876 4
7902 4

Notice the USING clause in line 5, with the COURSE and BEGINDATE columns. This USING clause with two columns is needed to get the correct join results.

The COUNT(*) Function

As mentioned earlier, group functions operate on a set of values, with one important exception. Besides column names, you can specify the asterisk (*) as an argument to the COUNT function. This widens the scope of the COUNT function from a specific column to the full row level.COUNT(*) returns the number of rows in the entire group.

image Note If you think that SELECT COUNT(1) is faster than SELECT COUNT(*), try a little experiment and prepare to be surprised—you will find out that there is no difference. Don’t trust opinions, look for ways to prove using facts. . .

Listing 8-20 already showed an example of using the COUNT(*) function, to get the total number of scheduled courses for each trainer from the OFFERINGS table. Listing 8-22 shows another example of using the COUNT(*) function, this time applied against the EMPLOYEES table.

Listing 8-22. Count Employees Per Department (First Attempt)

select e.deptno, count(*)
from employees e
group by e.deptno;

DEPTNO COUNT(*)
-------- --------
10 3
20 5
30 6

Obviously, department 40 is missing in this result. If you want to change the query into an outer join in order to show department 40 as well, you must be careful. What’s wrong with the query in Listing 8-23? Apparently, we suddenly have one employee working for department 40.

Listing 8-23. Count Employees Per Department (Second Attempt)

select deptno, count(*)
from employees e
right outer join
departments d
using (deptno)
group by deptno;

DEPTNO COUNT(*)
-------- --------
10 3
20 5
30 6
40 1

Compare the results in Listing 8-23 with the results in Listing 8-24. The only difference is the argument of the COUNT function. Listing 8-24 obviously shows the correct result, because department 40 has no employees. By counting over the primary key e.EMPNO, you are sure that all “real” employees are counted, while the null value introduced by the outer join is correctly ignored. You could have used any other NOT NULL column as well.

Listing 8-24. Count Employees Per Department (Third Attempt)

select deptno, count(e.empno)
from employees e
right outer join
departments d
using (deptno)
group by deptno;

DEPTNO COUNT(E.EMPNO)
-------- --------------
10 3
20 5
30 6
40 0

At the end of Chapter 5, you saw an example of a PL/SQL stored function to count all employees per department (Section 5.8, Listing 5-31). In that chapter, I mentioned that this counting problem is not trivial to solve in standard SQL. In Listings 8-22, 8-23, and 8-24, you see that you should indeed be careful. You need an outer join, and you should make sure to specify the correct argument for the COUNT function to get correct results.

image Caution You should be careful with the COUNT function, especially if null values might cause problems (since group functions ignore them) and you want to count row occurrences.

Valid SELECT and GROUP BY Clause Combinations

If your queries contain a GROUP BY clause, some syntax combinations are invalid and result in Oracle error messages, such as the following:

ORA-00937: not a single-group group function.

This always means that there is a mismatch between your SELECT clause and your GROUP BY clause.

To demonstrate valid versus invalid syntax, Table 8-3 shows one invalid and three valid syntax examples. Table 8-3 assumes you have a table T with four columns A, B, C, and D.

Table 8-3. Valid and Invalid GROUP BY Syntax Examples

Syntax

Valid?

select a, b, max(c) from t ... group by a

No

select a, b, max(c) from t ... group by a,b

Yes

select a, count(b), min(c) from t ... group by a

Yes

select count(c) from t ... group by a

Yes

The examples in Table 8-3 illustrate the following two general rules:

· You do not need to select the column expression you group on (see the last example).

· Any column expression that is not part of the GROUP BY clause can occur only in the SELECT clause as an argument to a group function. That’s why the first example is invalid.

By the way, all GROUP BY examples so far showed only column names, but you can also group on column expressions, such as in the example shown in Listing 8-25.

Listing 8-25. Grouping on Column Expressions

select case mod(empno,2)
when 0 then 'EVEN '
else 'ODD '
end as empno
sum(msal)
from employees
group by mod(empno,2);

EMPNO SUM(MSAL)
----- ---------
EVEN 20225
ODD 8650

This query shows the salary sums for employees with even and odd employee numbers.

8.7 The HAVING Clause

If you aggregate rows into groups with GROUP BY, you might also want to filter your query result further by allowing only certain groups into the final query result. You can achieve this with the HAVING clause. Normally, you use the HAVING clause only following a GROUP BY clause. For example, Listing 8-26 shows information about departments with more than four employees.

Listing 8-26. HAVING Clause Example

select deptno, count(empno)
from employees
group by deptno
having count(empno) >= 4;

DEPTNO COUNT(EMPNO)
-------- ------------
20 5
30 6

The Difference Between WHERE and HAVING

It is important to distinguish the WHERE clause from the HAVING clause. To illustrate this difference, Listing 8-27 shows a WHERE clause added to the previous query.

Listing 8-27. HAVING vs. WHERE

select deptno, count(empno)
from employees
where bdate > date '1960-01-01'
group by deptno
having count(empno) >= 4;

DEPTNO COUNT(EMPNO)
-------- ------------
30 5

The WHERE condition regarding the day of birth (line 3) can be checked against individual rows of the EMPLOYEES table. On the other hand, the HAVING COUNT(EMPNO) condition (line 5) makes sense only at the group level. That’s why group functions should never occur in aWHERE clause. They typically result in the following Oracle error message:

ORA-00934: group function is not allowed here.

You’ll see this error message in Listing 8-29, caused by a classic SQL mistake, as discussed shortly.

HAVING Clauses Without Group Functions

The SQL language allows you to write queries with a HAVING clause without a preceding GROUP BY clause. In that case, Oracle assumes an implicit GROUP BY on a constant expression, just as when you use group functions in the SELECT clause without specifying a GROUP BY clause; that is, the full table is treated as a single group.

On the other hand, valid HAVING clauses without group functions are very rare, and they should be rewritten. In Listing 8-28, the second query is much more efficient than the first one, because it is more efficient to filter out rows using the WHERE clause before aggregation rather than afterward using the HAVING clause. It is a general rule to filter as early as possible in a multi-step operation like an aggregated SQL query, because filtering earlier passes less data along to subsequent operations, saving both processing and memory resources.

Listing 8-28. HAVING Clause Without a Group Function

select deptno, count(*)
from employees
group by deptno
having deptno <= 20;

DEPTNO COUNT(*)
-------- --------
10 3
20 5

select deptno, count(*)
from employees
where deptno <= 20
group by deptno;

DEPTNO COUNT(*)
-------- --------
10 3
20 5

A Classic SQL Mistake

Take a look at the query in Listing 8-29. It looks very logical, doesn’t it? Who earns more than the average salary?

Listing 8-29. Error Message: Group Function Is Not Allowed Here

select empno
from employees
where msal > avg(msal);

where msal > avg(msal)
*
ERROR at line 3:
ORA-00934: group function is not allowed here

However, if you think in terms of tuple variables, the problem becomes obvious: the WHERE clause has only a single row as its context, turning the AVG function into something impossible to derive.

You can solve this problem in many ways. Listings 8-30 and 8-31 show two suggestions. In Listing 8-30, we use a sub-query to calculate average value.

Listing 8-30. One Way to Find Who Earns More Than the Average Salary

select e.empno
from employees e
where e.msal > (select avg(x.msal)
from employees x );

EMPNO
--------
7566
7698
7782
7788
7839
7902

Listing 8-31. Another Way to Find Who Earns More Than the Average Salary

select e1.empno
from employees e1
, employees e2
group by e1.empno
, e1.msal
having e1.msal > avg(e2.msal);

EMPNO
--------
7566
7698
7782
7788
7839
7902

The solution in Listing 8-31 would probably not win an SQL beauty contest, but it is certainly worth further examination. This solution is based on the Cartesian product of the EMPLOYEES table with itself. Notice that it doesn’t have a WHERE clause. Notice also that you group one1.EMPNO and e1.MSAL, which allows you to refer to this column in the HAVING clause.

Grouping on Additional Columns

You sometimes need this (apparently) superfluous grouping on additional columns. For example, suppose you want to see the employee number and the employee name, followed by the total number of course registrations. The query in Listing 8-32, which could be a first attempt to solve this problem, produces an Oracle error message.

Listing 8-32. Error Message: Not a GROUP BY Expression

select e.empno, e.ename, count(*)
from employees e
join
registrations r
on (e.empno = r.attendee)
group by e.empno;
select e.empno, e.ename, count(*)
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression

The pseudo-intermediate result in Listing 8-33 explains what went wrong here and why you must also group on e.ENAME.

Listing 8-33. Pseudo-Intermediate GROUP BY Result

GROUP BY e.EMPNO GROUP BY e.EMPNO,e.ENAME

e.EMPNO e.ENAME e.INIT ... e.EMPNO e.ENAME e.INIT ...
======= ========= ====== ======= ======== ======
7369 {'SMITH'} {'N'} 7369 'SMITH' {'N'}
7499 {'ALLEN'} {'JAM'} 7499 'ALLEN' {'JAM'}
7521 {'WARD' } ... 7521 ... ...
7566 ... ...

The two results look similar; however, there is an important difference between sets consisting of a single element, such as {'SMITH'}, and a literal value, such as 'SMITH'. In mathematics, sets with a single element are commonly referred to as singleton sets, or just singletons.

Listing 8-34 shows another instructive mistake.

Listing 8-34. Error Message: Not a Single-Group Group Function

select deptno
, sum(msal)
from employees;

select deptno
*
ERROR at line 1:
ORA-00937: not a single-group group function

In the absence of a GROUP BY clause, the SUM function would return a single row, while DEPTNO would produce 14 department numbers. Two columns with different row counts cannot be presented side-by-side in a single result. After the correction in Listing 8-35, the error message disappears, and you get the desired results.

Listing 8-35. Correction of the Error Message in Listing 8-34

select deptno
, sum(msal)
from employees
group by deptno;

DEPTNO SUM(MSAL)
-------- -------------
10 8750
20 10875
30 9250

In summary, if your query contains a GROUP BY clause), the SELECT clause is allowed to contain only group expressions. A group expression is a column name that is part of the GROUP BY clause, or a group function applied to any other column expression. See also Table 8-3 at the end of Section 8.6.

8.8 Advanced GROUP BY Features

The previous sections showed examples of using “standard” GROUP BY clauses. You can also use some more advanced features of the GROUP BY clause. Here, we will look at GROUP BY CUBE and GROUP BY ROLLUP.

Let’s start with a regular GROUP BY example, shown in Listing 8-36.

Listing 8-36. Regular GROUP BY Example

select deptno, job
, count(empno) headcount
from employees
group by deptno, job;

DEPTNO JOB HEADCOUNT
-------- ---------- ---------
10 MANAGER 1
10 DIRECTOR 1
10 ADMIN 1
20 MANAGER 1
20 TRAINER 4
30 MANAGER 1
30 SALESREP 4
30 ADMIN 1

You get an overview with the number of employees per department and within each department per job. To keep things simple, let’s forget about department 40, the department without employees.

GROUP BY ROLLUP

Notice what happens if you change the GROUP BY clause and add the keyword ROLLUP, as shown in Listing 8-37.

Listing 8-37. GROUP BY ROLLUP Example

select deptno, job
, count(empno) headcount
from employees
group by ROLLUP(deptno, job);

DEPTNO JOB HEADCOUNT
-------- -------- ---------
10 ADMIN 1
10 MANAGER 1
10 DIRECTOR 1
>>> 10 3 <<<
20 MANAGER 1
20 TRAINER 4
>>> 20 5 <<<
30 ADMIN 1
30 MANAGER 1
30 SALESREP 4
>>> 30 6 <<<
>>> 14 <<<

The ROLLUP addition results in four additional rows, marked with >>> and <<< in Listing 8-37 for readability. Three of these four additional rows show the head count per department over all jobs, and the last row shows the total number of employees.

GROUP BY CUBE

You can also use the CUBE keyword in the GROUP BY clause. Listing 8-38 shows an example.

Listing 8-38. GROUP BY CUBE Example

select deptno, job
, count(empno) headcount
from employees
group by CUBE(deptno, job);

DEPTNO JOB HEADCOUNT
-------- -------- ---------
14
>>> ADMIN 2 <<<
>>> MANAGER 3 <<<
>>> TRAINER 4 <<<
>>> DIRECTOR 1 <<<
>>> SALESREP 4 <<<
10 3
10 MANAGER 1
10 DIRECTOR 1
10 ADMIN 1
20 5
20 MANAGER 1
20 TRAINER 4
30 6
30 MANAGER 1
30 SALESREP 4
30 ADMIN 1

This time, you get five more rows in the query result, marked in the same way with >>> and <<<, showing the number of employees per job, regardless of which department employs them.

image Tip Both GROUP BY CUBE and GROUP BY ROLLUP are special cases of the GROUP BY GROUPING SETS syntax, offering more flexibility. You can also merge the results of different grouping operations into a single GROUP BY clause by specifying them in a comma-separated list. For more details, see Oracle SQL Reference.

CUBE, ROLLUP, and Null Values

The CUBE and ROLLUP keywords generate many null values in query results, as you can see in Listings 8-37 and 8-38. You can distinguish these system-generated null values from other null values, for example, to replace them with some explanatory text. You can use the GROUPING andGROUPING_ID functions for that purpose.

The GROUPING Function

Listing 8-39 shows an example of the GROUPING function.

Listing 8-39. GROUPING Function Example

select deptno
, case GROUPING(job)
when 0 then job
when 1 then '**total**'
end job
, count(empno) headcount
from employees
group by rollup(deptno, job);

DEPTNO JOB HEADCOUNT
-------- --------- ---------
10 ADMIN 1
10 MANAGER 1
10 DIRECTOR 1
10 **total** 3
20 MANAGER 1
20 TRAINER 4
20 **total** 5
30 ADMIN 1
30 MANAGER 1
30 SALESREP 4
30 **total** 6
**total** 14

Unfortunately, the GROUPING function can return only two results: 0 or 1. That’s why the last two lines both show '**total**'.

The GROUPING_ID Function

The GROUPING_ID function is more flexible than the GROUPING function, because it can return several different results, as you can see in Listing 8-40.

Listing 8-40. GROUPING_ID Function Example with ROLLUP

select deptno
, case GROUPING_ID(deptno, job)
when 0 then job
when 1 then '**dept **'
when 3 then '**total**'
end job
, count(empno) headcount
from employees
group by rollup(deptno, job);

DEPTNO JOB HEADCOUNT
-------- --------- ---------
10 ADMIN 1
10 MANAGER 1
10 DIRECTOR 1
10 **dept ** 3
20 MANAGER 1
20 TRAINER 4
20 **dept ** 5
30 ADMIN 1
30 MANAGER 1
30 SALESREP 4
30 **dept ** 6
**total** 14

You may be puzzled by the value 3 being used on the fifth line in Listing 8-40. Things become clear when you convert 3 to a binary representation, which results in the binary number 11. The two ones in this number act as a flag to trap the situation in which both columns contain a null value. GROUP BY ROLLUP can produce only 1 (binary 01) and 3 (binary 11), but GROUP BY CUBE can also generate 2 (binary 10). Look at the results in Listing 8-41. Obviously, GROUPING_ID produces a 0 (zero) for all “regular” rows in the result.

Listing 8-41. GROUPING_ID Function Example with CUBE

select deptno, job
, GROUPING_ID(deptno, job) gid
from employees
group by cube(deptno, job);

DEPTNO JOB GID
-------- -------- --------
3
ADMIN 2
MANAGER 2
TRAINER 2
DIRECTOR 2
SALESREP 2
10 1
10 ADMIN 0
10 MANAGER 0
10 DIRECTOR 0
20 1
20 MANAGER 0
20 TRAINER 0
30 1
30 ADMIN 0
30 MANAGER 0
30 SALESREP 0

8.9 Partitioned Outer Joins

We discussed outer joins in Section 8.4. This section introduces partitioned outer joins. To explain what partitioned outer joins are, let’s start with a regular (right) outer join in Listing 8-42.

Listing 8-42. Regular Right Outer Join Example

break on department skip 1 on job

select d.dname as department
, e.job as job
, e.ename as employee
from employees e
right outer join
departments d
using (deptno)
order by department, job;

DEPARTMENT JOB EMPLOYEE
---------- -------- --------
ACCOUNTING ADMIN MILLER
DIRECTOR KING
MANAGER CLARK

HR <<<

SALES ADMIN JONES
MANAGER BLAKE
SALESREP ALLEN
WARD
TURNER
MARTIN

TRAINING MANAGER JONES
TRAINER SMITH
FORD
ADAMS
SCOTT

15 rows selected.

The SQL*Plus BREAK command allows you to enhance the readability of query results. In Listing 8-42, we use the BREAK command to suppress repeating values in the DEPARTMENT and JOB columns, and to insert an empty line between the departments. (See Chapter 11 for details about BREAK.) The result shows 15 rows, as expected. We have 14 employees, and the additional row (marked with <<<) is added by the outer join for the HR department without employees.

Look at Listing 8-43 to see what happens if we add one extra clause, just before the RIGHT OUTER JOIN operator.

Listing 8-43. Partitioned Outer Join Example

select d.dname as department
, e.job as job
, e.ename as employee
from employees e
PARTITION BY (JOB)
right outer join
departments d
using (deptno)
order by department, job;

DEPARTMENT JOB EMPLOYEE
---------- -------- --------
ACCOUNTING ADMIN MILLER
DIRECTOR KING
MANAGER CLARK
SALESREP <<<
TRAINER <<<

HR ADMIN <<<
DIRECTOR <<<
MANAGER <<<
SALESREP <<<
TRAINER <<<

SALES ADMIN JONES
DIRECTOR <<<
MANAGER BLAKE
SALESREP ALLEN
WARD
TURNER
MARTIN
TRAINER <<<

TRAINING ADMIN <<<
DIRECTOR <<<
MANAGER JONES
SALESREP <<<
TRAINER SMITH
FORD
ADAMS
SCOTT

Listing 8-43 shows at least one row for each combination of a department and a job. Compared with Listing 8-42, the single row for the HR department is replaced with 12 additional rows, highlighting all nonexisting department/job combinations. A regular outer join considers full tables when searching for matching rows in the other table. The partitioned outer join works as follows:

1. Split the driving table in partitions based on a column expression (in Listing 8-43, this column expression is JOB).

2. Produce separate outer join results for each partition with the other table.

3. Merge the results of the previous step into a single result.

Partitioned outer joins are especially useful when you want to aggregate information over the time dimension, a typical requirement for data warehouse reporting. See Oracle SQL Reference for more details and examples.

8.10 Set Operators

You can use the SQL set operators UNION, MINUS, and INTERSECT to combine the results of two independent query blocks into a single result. As you saw in Chapter 2, the set operators have the syntax shown in Figure 8-5.

9781430265566_Fig08-05.jpg

Figure 8-5. Set operators syntax diagram

These SQL operators correspond with the UNION, MINUS, and INTERSECT operators you know from mathematics. Don’t we all have fond memories of our teachers drawing those Venn diagrams on the whiteboard (or blackboard, for you older readers)? See also Figure 1-3 (in Chapter 1). The meanings of these set operators in SQL are listed in Table 8-4.

Table 8-4. Set Operators

Operator

Result

Q1 UNION Q2

All rows occurring in Q1 or in Q2 (or in both)

Q1 UNION ALL Q2

As UNION, retaining duplicate rows

Q1 MINUS Q2

The rows from Q1, without the rows from Q2

Q1 INTERSECT Q2

The rows occurring in Q1 and in Q2

By default, all three set operators suppress duplicate rows in the query result. The only exception to this rule is the UNION ALL operator, which does not eliminate duplicate rows. One important advantage of the UNION ALL operator is that the Oracle DBMS does not need to sort the rows. Sorting is needed for all other set operators to trace duplicate rows.

The UNION, MINUS, and INTERSECT operators cannot be applied to any arbitrary set of two queries. The intermediate (separate) results of queries Q1 and Q2 must be “compatible” in order to use them as arguments to a set operator. In this context, compatibility means the following:

· Q1 and Q2 must select the same number of column expressions.

· The datatypes of those column expressions must match.

Some other rules and guidelines for SQL set operators are the following:

· The result table inherits the column names (or aliases) from Q1.

· Q1 cannot contain an ORDER BY clause.

· If you specify an ORDER BY clause at the end of the Q2 query, it doesn’t refer to Q2, but rather to the total result of the set operator.

Set operators are very convenient when building new queries by combining the multiple query blocks you wrote (and tested) before, without writing completely new SQL code. This simplifies testing, because you have more control over correctness.

Listing 8-44 answers the following question: “Which locations host course offerings without having a department?”

Listing 8-44. MINUS Set Operator Example

select o.location from offerings o
MINUS
select d.location from departments d;

LOCATION
--------
SEATTLE

You can also try to solve this problem without using the MINUS operator. See Listing 8-45 for a suggestion.

Listing 8-45. Alternative Solution Without Using the MINUS Operator

select DISTINCT o.location
from offerings o
where o.locationnot in
(select d.location
from departments d)

Note that you must add a DISTINCT operator to handle situations where you have multiple course offerings in the same location. As explained before, the MINUS operator automatically removes duplicate rows.

Are the two queries in Listing 8-44 and 8-45 logically equivalent? They appear to be logically the same, but they are not quite as identical logically as they first appear. The first query will return two rows. One is for Seattle. The other is a null, representing the one course offering with an unknown location. The MINUS operator does not remove the null value, whereas that same null value fails to pass the WHERE condition in Listing 8-45. This is just one more example of the subtle pitfalls inherent in dealing with nulls in your data. You can also produce outer join results by using the UNION operator. You will see how to do this in Listings 8-46 and 8-47.

Listing 8-46. Regular Join

select d.deptno
, d.dname
, count(e.empno) as headcount
from employees e
, departments d
where e.deptno = d.deptno
group by d.deptno
, d.dname;

DEPTNO DNAME HEADCOUNT
-------- ---------- ---------
10 ACCOUNTING 3
20 TRAINING 5
30 SALES 6

We start with a regular join in Listing 8-46. In Listing 8-47 you add the additional department(s) needed for the outer join with a UNION operator, while assigning the right number of employees for those departments: zero.

Listing 8-47. Expansion to an Outer Join with a UNION Operator

select d.deptno
, d.dname
, count(e.empno) as headcount
from employees e
, departments d
where e.deptno = d.deptno
group by d.deptno
, d.dname
union
select x.deptno
, x.dname
, 0 as headcount
from departments x
where x.deptno not in (select y.deptno
from employees y);

DEPTNO DNAME HEADCOUNT
-------- ---------- ---------
10 ACCOUNTING 3
20 TRAINING 5
30 SALES 6
40 HR 0

8.11 Exercises

The following exercises will help you to better understand the topics covered in this chapter. The answers are presented in Appendix B

1. Produce an overview of all course offerings. Provide the course code, begin date, course duration, and name of the trainer.

2. Provide an overview, in two columns, showing the names of all employees who ever attended an SQL course, with the name of the trainer.

3. For all employees, list their name, initials, and yearly salary (including bonus and commission).

4. For all course offerings, list the course code, begin date, and number of registrations. Sort your results on the number of registrations, from high to low.

5. List the course code, begin date, and number of registrations for all course offerings in 1999 with at least three registrations.

6. Provide the employee numbers of all employees who ever taught a course as a trainer, but never attended a course as an attendee.

7. Which employees attended a specific course more than once?

8. For all trainers, provide their name and initials, the number of courses they taught, the total number of students they had in their classes, and the average evaluation rating. Round the evaluation ratings to one decimal.

9. List the name and initials of all trainers who ever had their own manager as a student in a general course (category GEN).

10.Did we ever use two classrooms at the same time in the same course location?

11.Produce a matrix report (one column per department, one row for each job) where each cell shows the number of employees for a specific department and a specific job. In a single SQL statement, it is impossible to dynamically derive the number of columns needed, so you may assume you have three departments only: 10, 20, and 30.

12.Listing 8-26 produces information about all departments with more than four employees. How can you change the query to show information about all departments with fewer than four employees?

13.Look at Listings 8-44 and 8-45. Are those two queries logically equivalent? Investigate the two queries and explain the differences, if any.