Transform and Summarize Data With Functions - SQL - The Shortest Route For Beginners (2015)

SQL - The Shortest Route For Beginners (2015)

Chapter 4. Transform and Summarize Data With Functions

A function can be defined as an operation that is performed with the objective to transform the appearance of data. Like other computer languages, SQL also supports the use of functions to manipulate data. Functions are usually applied to:

· Convert data types of columns

· Data calculation

· Alter display formats

· Provide collective output for groups of rows

Functions are divided into the following two categories:

Single Row functions : As the name implies, the functions under this category act only on individual rows, and return a single result for every processed row. You can use these functions in SELECT, WHERE, and ORDER BY clauses. Single row functions are further categorized as follows:

· Character functions: These functions accept character data as input and can return both character and number values.

· Number functions: Number functions manipulate numeric data. They receive numeric input and return only numeric values.

· Date functions: Besides presenting date and time data in some desirable format, these functions are also helpful in comparing date values and computing intervals between dates.

· Conversion functions: With the help of these functions you can convert data type of some data to another data type. For example, to concatenate a number value to a character string, you convert the numeric value to character data type.

Aggregate Functions : In contrast to the single row functions, the functions under this category operate on groups of rows, and return one result per group. These functions are used to retrieve summarized data for analysis and reporting purposes. In this book, you’ll be practicing with the following most common aggregate functions, supported by all platforms.

· Average function – AVG

· Count function – COUNT

· Maximum function – MAX

· Minimum function – MIN

· Sum function – SUM

CONCAT Function

This function is equivalent to the concatenation operator (||) and is used to concatenate first character value to the second character value.

Syntax :

CONCAT (char1, char2)

The following example concatenates first and last names of all employees.

SQL Statement :

SELECT CONCAT(first_name, last_name)

FROM employees;

Output :

NOTE: In order to put a space between the two names, nest another CONCAT() function into an existing one, like this:

Select CONCAT(CONCAT(first_name, ' '), last_name) Employee From employees;

Consider the following example if you wish to join two column values with some meaningful text in between:

Select CONCAT(CONCAT(last_name, ' is working as '), job_id) "Name and Job" From employees;

INITCAP Function

INITCAP returns character, with the first letter of each word in uppercase, all other letters in lowercase. The following example converts the first letter of the job id column to uppercase, and the rest to lower case. Note that the values stored in this table column are all upper case.

Syntax :

INITCAP (character)

SQL Statement :

SELECT INITCAP(job_id)

FROM employees;

Output :

LENGTH Function

The LENGTH function returns the length of character. Note that it is a character function that returns the answer in numbers, as shown in the following example where it is used to count number of character in the first names of employees.

NOTE: In Microsoft SQL Server and Microsoft Access use DATALENGTH() and LEN() functions, respectively.

Syntax :

LENGTH (character)

SQL Statement :

SELECT first_name, LENGTH(first_name) Length

FROM employees;

Output :

LOWER Function

The LOWER function returns character, with all letters lowercase. The example presented below transforms all data in first name and job id columns to lower case.

NOTE: Use LCASE() function for the same purpose in Microsoft Access.

Syntax :

LOWER (character)

SQL Statement :

SELECT LOWER(first_name||’ ‘||last_name) Name, LOWER(job_id) Job

FROM employees;

Output :

NOTE: As mentioned earlier, single row functions can also be used in the WHERE clause. Let’s see an example. Running the first statement below will not yield any result. Change it so that it matches the second statement. Now you’ll get some rows. The LOWER() function in the second statement fetched the result, because all the job ids are stored in uppercase in the employees table, and you forced the query (by using the LOWER function) to first convert all the column values to lower case, match each value with the provided lower case string (pu_clerk), and then return the result.

Select first_name, last_name, job_id from employees where job_id='pu_clerk';

Select first_name, last_name, job_id from employees where lower(job_id)='pu_clerk';

NVL Function

When you query a table, the null values stored in a column (e.g commission_pct) are shown as (null). Using the NVL function, you can replace these nulls with some meaningful values. According to the following syntax, if expr1 is null, then NVL returns expr2. If expr1 is not null, then NVL returns expr1. In the following example, you get a list of employees along with their commissions, replacing null values with zero if the employees are not entitled to get commissions.

Syntax :

NVL (expr1, expr2)

SQL Statement :

SELECT last_name, NVL(commission_pct, 0) commission

FROM employees

WHERE last_name LIKE 'B%'

ORDER BY last_name;

Output :

NOTE: In the presented example we used a number column and replaced all null values with zero. Null values in a character column must be replaced as shown in the following example, where a character value (‘None’) is used to replace all null values in the state province column.

SELECT city,NVL(state_province, 'None') Province

FROM locations;

SUBSTR Function

The SUBSTR function returns specified characters from character value, starting from P, L characters long. The following statement fetches records of employees who have ‘lex’ (three characters) in their names starting from second position.

NOTE: In Microsoft Access use MID() function. In Microsoft SQL Server, MySQL, and MariaDB you have to use SUBSTRING().

Syntax :

SUBSTR (character, P,L)

SQL Statement :

SELECT first_name, last_name, salary

FROM employees

WHERE substr(first_name,2,3)='lex';

Output :

UPPER Function

The UPPER function is contrary to the LOWER function and returns all characters in uppercase letters as shown in the following statement where each employee’s first name is displayed in uppercase.

NOTE: Microsoft Access uses UCASE() function.

Syntax :

UPPER (character)

SQL Statement :

SELECT UPPER(first_name)

FROM employees;

Output :

ROUND Function

ROUND() is a numeric function which rounds the column, expression, or value defined in "n" to "integer" places to the right of the decimal point. If you omit integer, then n is rounded to zero places. If integer is negative, then n is rounded off to the left of the decimal point. n can be any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type.

The following statement rounds the value (47.842) to hundredth, zero, and ten decimal places.

NOTE: DUAL is a table provided with Oracle Database. It resides in the SYS schema but can be accessed by any user. The major benefit of this table is that you can use it in your SELECT statements to compute constant expressions, as is done in this exercise.

Syntax :

ROUND (n,integer)

SQL Statement :

SELECT ROUND(47.842,2), ROUND(47.842,0), ROUND(47.842,-1)

FROM SYS.DUAL;

Output :

NOTE: There are over two dozen numeric functions provided by Oracle and other vendors. These functions are primarily used for algebraic, geometric, or trigonometric calculations and, therefore, are not as frequently used as character or date functions.

TRUNC Function

TRUNC() is also a numeric function and works just like the ROUND() function. It returns n1 truncated to n2 decimal places. If n2 is omitted, then n1 is truncated to 0 places. n2 can be negative to truncate (make zero) n2 digits left of the decimal point. Similar to the ROUND() function, it also takes as an argument any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type. If you omit n2, then the function returns the same data type as the numeric data type of the argument. If you include n2, then the function returns NUMBER.

Syntax :

TRUNC (n1,n2)

SQL Statement :

SELECT TRUNC(47.842310,2), TRUNC(47.842310,0), TRUNC(47.842310,-1)

FROM SYS.DUAL;

Output :

Date Time Functions

Every DBMS allows you to store date and time values in tables using specific date data types and in specific formats. Each implementation follows its own storage format to save the date and time, for example, Oracle’s default display and input format for any date is DD-MON-YY. Unfortunately, this default storage varies among different implementations and is, therefore, least portable.

Similar to the character functions, date and time functions are also used to manipulate the representation of data. These function are used not only to present date and time data in some desirable format, but are also helpful in comparing date values, and computing intervals between dates.

To start with, here is an example in Oracle which fetches current date from the system, followed by a list of functions that perform the same task for other platforms.

SQL Statement :

SELECT SYSDATE

FROM DUAL;

Output :

The following table lists respective functions as employed by various DBMSs to get current system date:

DBMS

FUNCTION

SQL STATEMENT

OUTPUT (Format)

MySQL/MariaDB

CURDATE()

SELECT CURDATE();

2014-11-19 (YYYY-MM-DD)

PostgreSQL

CURRENT_DATE

SELECT CURRENT_DATE;

2014-11-19 (YYYY-MM-DD)

Microsoft Access

NOW()

SELECT Now() FROM Products;

11/19/2014 2:28:17 AM

Microsoft SQL Server

GETDATE()

SELECT GETDATE();

2014-11-19 13:10:02.047

SQLite

DATE()

SELECT DATE('now');

2014-11-19 (YYYY-MM-DD)

DB2

CURRENT DATE

SELECT CURRENT DATE

FROM sysibm.sysdummy1;

2014-11-19 (YYYY-MM-DD)

Date Manipulation Function s

The following examples present date manipulation functions for different implementations, starting with Oracle. The Employees table contains a date column named Hire_Date. All the examples in this section retrieve a list of all employees who were hired in 2003.

The first one below is for Oracle. In this example we used nested functions to first convert the year portion in the hire date column to character – to_char(hire_date,’YYYY’). The character value is then converted to a number with the help of the to_number function, and is matched with the specified year.

NOTE: TO_NUMBER, TO_CHAR, and TO_DATE are known as conversion functions and will be discussed shortly.

SQL Statement :

SELECT first_name, hire_date

FROM employees

WHERE to_number(to_char(hire_date, 'YYYY')) = 2003;

Output :

The table that follows describes how you would issue the above statement in other DBMSs to get the same output using platform specific date functions.

DBMS

SQL STATEMENT WITH SPECIFIC DATE FUNCTIONS

Alternate in Oracle using the BETWEEN operator

SELECT first_name, hire_date

FROM employees

WHERE hire_date BETWEEN to_date('01-JAN-2003') and to_date('31-DEC-2003');

Microsoft SQL Server

SELECT first_name, hire_date

FROM employees

WHERE DATEPART(yy, hire_date) = 2003;

Microsoft Access

SELECT first_name, hire_date

FROM employees

WHERE DATEPART('yyyy', hire_date) = 2003;

MariaDB and MySQL

SELECT first_name, hire_date

FROM employees

WHERE YEAR(hire_date) = 2003;

SQLite

SELECT first_name, hire_date

FROM employees

WHERE strftime('%Y', hire_date) = 2003;

PostgreSQL

SELECT first_name, hire_date

FROM employees

WHERE DATE_PART('year', hire_date) = 2003;

There is a long list of such functions for each platform. Refer to your DBMS documentation for the list of the date-time manipulation functions it supports. Since we’re connected to an Oracle database session, we’ll explore some date functions specific to Oracle in the next few sections.

MONTHS_BETWEEN Function

The first date function in Oracle that we’ll be experimenting with is the MONTHS_BETWEEN function. As the name implies, this function is useful in determining number of months between two dates. The output of this function can be positive or negative. For the result to be positive, date1 must be later than date2. Conversely, negative result is displayed when date1 is earlier than date2.

Syntax :

MONTHS_BETWEEN (date1,date2)

The following example statement evaluates employment tenure in months for employee number 200. The result (134.084…) is based on the difference between the current system date (in my case it is 19-NOV-2014) and the date value stored in the hire date column (i.e. 17-SEP-2003).

SQL Statement :

SELECT first_name,hire_date,MONTHS_BETWEEN(sysdate,hire_date) Months_Employed

FROM employees

WHERE employee_id=200;

Output :

ADD_MONTHS Function

Sometimes you need to assess a future date or even a date in the past for some reasons. The ADD_MONTHS function assists you in this kind of calculation. In order to find a future date, add integer number of months to the date. Similarly, to get previous date, enter a negative value for integer, as demonstrated in the following two examples. Note that the system date for both these example is assumed as 19-NOV-2014.

Syntax :

ADD_MONTHS (date,integer)

SQL Statement :

SELECT ADD_MONTHS(sysdate,1) Month

FROM dual;

SELECT ADD_MONTHS(sysdate,-1) Month

FROM dual;

Output :

NEXT_DAY Function

The NEXT_DAY function returns the next day of the week specified in char. The return type is always date. The argument char must be a day of the week, and can be passed as the full name (‘Friday’), an abbreviation (‘Fri’), or a number representing a day (6). SYSDATE value in this example too is 19-NOV-2014.

Syntax :

NEXT_DAY (date,char)

SQL Statement :

SELECT 'Next Friday will be on '||NEXT_DAY(sysdate,'Friday') "Next Friday"

FROM dual;

Output :

LAST_DAY Function

The LAST_DAY function returns the date of the last day of the month specified in the date argument. The following statement demonstrates two different uses of this function and displays data in three columns. The first one, SYSDATE, returns current system data to supplement the result appearing in the next two columns. The second column shows the last day (30th November) of the current month (based on SYSDATE), while the last one displays the last day of the date specified as character string.

Syntax :

LAST_DAY (date)

SQL Statement :

SELECT sysdate, LAST_DAY(sysdate), LAST_DAY('16-DEC-14')

FROM dual;

Output :

Conversion Functions

Conversion functions allow you to convert a data type into another data type. For example, your table contains some data stored in character format that you want to convert to numeric to perform some mathematical operation. Similarly, you can also covert numeric data to character, but representation of such converted data doesn't support mathematical functions and computations.

You can convert data in the following ways:

1. Date to character

2. Number to character

3. Character to number

4. Character to date

As of this writing, Oracle provides 38 conversion functions to perform various tasks. To keep things simple for the time being, we'll be looking at the following four functions that are most commonly used, and fulfill the four conversion tasks mentioned above.

· TO_CHAR (datetime)

· TO_CHAR (number)

· TO_NUMBER

· TO_DATE

NOTE: In many cases, a database provides its own conversion functions to convert one data type into another. But, there are several occasions when users want to explicitly specify themselves what data type they need to change data into. For such scenarios, SQL includes an ANSI CAST function that converts data types into other data types of your choice. The basic syntax is as follows:

Syntax: CAST (EXPRESSION AS NEW_DATA_TYPE)

Example: SELECT CAST(hire_date AS CHAR(25)) AS "Hired On" FROM employees

DATETIME Data Types and Elements

Before we get our feet wet, let's go through some standard data types for date and time. There are three standard SQL data types for date and time storage:

· DATE: DATE is formatted as YYYY-MM-DD and ranges from 0001-01-01 to 9999-12-31.

· TIME: TIME is formatted as HH:MI:SS.nn... and ranges from 00:00:00... to 23:59:61.999....

· TIMESTAMP: TIMESTAMP is formatted as YYYY-MM-DD HH:MI:SS.nn... and ranges from

0001-01-01 00:00:00... to 9999-12-31 23:59:61.999....

The seconds value 61.999 is provided due to the possible insertion or omission of a leap second in a minute.

The following tables list date and time format elements for Oracle. To remind you, in Oracle, the default format to display date values is DD-MON-YY. You have already seen this format in the preceding examples.

Date Elements :

ELEMENT

DESCRIPTION

OUTPUT

YYYY

Represents the full year in numbers.

2014

YEAR

Represents the year spelled out.

TWENTY FOURTEEN

MM

Represents the 2 digits value for month.

11

MONTH

Represents the full name of the month.

NOVEMBER

DD

Represents 2 digits day of the month.

20

DY

Represents the 3 letter abbreviation of the day of the week.

THU (for Thursday)

DAY

Represents the full name of the day.

THURSDAY

Time Elements :

ELEMENT

DESCRIPTION

AM or PM

Represents meridian indicator.

A.M. or P.M.

Represents meridian indicator with dots.

HH or HH12 or HH24

Represents hour of day or hour (1-12) or hour (0-23).

MI

Represents minutes (0-59).

SS

Represents seconds (0-59).

TO_CHAR (datetime) Function – Oracle

The TO_CHAR(datetime) function is used to covert a date value to a character string. The conversion of dates can take place for any number of reasons. Typical reasons for date conversions are as follows:

· To compare date values of different data types.

· To convert a date value from its default format to one specified by you.

The following query retrieves the system date in 24 hours time format and with full year. Change the query by replacing the elements defined in the previous section and note the outcome. For example, changing the format model to 'fmDD "of" MONTH YYYY' would yield: 20 of November 2014. The fm prefix (which stands for fill mode) is added to remove padded blanks.

Syntax :

TO_CHAR (datetime,[’fmt’])

SQL Statement :

SELECT TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS') AS "Current Date"

FROM dual;

Output :

NOTE

- The format model (fmt) is case-sensitive (mon will return nov for November).

- Format model must be enclosed in single quotes.

- Use the fill mode (fm) to remove padded blanks or leading zeros.

TO_CHAR (number) Function – Oracle

The TO_CHAR(number) function converts "n" (a number) to a character data type, using the optional number format (fmt). This function is normally utilized when you intend to concatenate number values to a string, as demonstrated in the example below where a complete sentence is generated by concatenating two numeric values (employee id and salary) to character strings. Use the elements listed in the following table to set the output format.

Number Elements :

ELEMENTS

DESCRIPTION

EXAMPLE

RESULT

9

Represents numeric position and determines display width.

999999

1234

0

Used to display leading zeros.

099999

001234

$

Displays values with a leading dollar sign.

$999999

$1234

.

Places a decimal point in the specified position.

999999.99

1234.00

,

Positions commas in defined positions.

999,999

1,234

MI

Returns negative value with a trailing minus sign (-).

999999MI

1234-

PR

Returns negative value in <angle brackets>.

999999PR

<1234>

Syntax :

TO_CHAR (n,[’fmt’])

SQL Statement :

SELECT 'Employee number '||TO_CHAR(employee_id)||' gets '||TO_CHAR(salary,'fm$9,999,999') “Salaries”

FROM employees;

Output :

TO_NUMBER Function – Oracle

TO_NUMBER converts a character expression (expr) to a value of NUMBER data type. For a character string to be converted to a number, the expr must typically be 0 through 9. For example, the string HELLO cannot be converted to a number, whereas postal/zip codes stored as character string in a table can be converted to numbers using this function. You can also use (+),(-), and (.) symbols to represent positive, negative, and decimal values. The following statement calculates an incremented salary for employee number 200 by converting and then adding a character value (‘600.00’) to actual salary. In order to reduce the salary, enter a negative number like this: -600.00.

Syntax :

TO_NUMBER (expr,[’fmt’])

SQL Statement :

SELECT salary, salary+TO_NUMBER('600.00') "INCREMENTED SALARY"

FROM employees

WHERE employee_id=200;

Output :

TO_DATE Function – Oracle

TO_DATE converts character data type (char) to a value of DATE data type. The fmt is a datetime model format specifying the format of char. It is good practice always to specify a format mask (fmt) with TO_DATE, as shown in the example that follows. The statement fetches record of an employee who was hired on March 3, 2005. The TO_DATE function converts the char value (‘March 3, 2005’) into date data type, formats it as ‘Month DD, YYYY’, and then compares it to the hire date column value in the employees table. The output date is presented in the default format. If you wish to also see the output in the same format, change the hire_date column in the SELECT clause like this:

TO_CHAR(hire_date,'fmMonth DD, YYYY') "HIRE DATE".

Syntax :

TO_DATE (char,[’fmt’])

SQL Statement :

SELECT first_name, salary, hire_date

FROM employees

WHERE hire_date=TO_DATE('March 3, 2005','Month DD, YYYY');

Output :

Calculating Dates

Since dates are stored as numbers in database table, you can perform calculations on date values using common arithmetic operators like (+), (-), and (/), as mentioned in the following table. The date value for SYSDATE in the Example column is assumed to be 21-NOV-2014 (12PM).

Operation

Description

Result

Example

Date + Number

Add number of days to a date.

Date

SYSDATE+5 returns 26-NOV-14

Date – Number

Subtract number of days from a date.

Date

SYSDATE-5 returns 16-NOV-14

Date – Date

Subtract one date from another to find number of days between those dates.

Number of days

SYSDATE-to_date('01-NOV-14') returns 20.50

Date + Number/24

Add number of hours to a date.

Date

SYSDATE+24/24 returns 22-NOV-14

The following example returns number of employment weeks for each employee by using the division arithmetic operator.

SQL Statement :

SELECT first_name, (sysdate-hire_date)/4 "WEEKS"

FROM employees;

Output :

Aggregate Functions

The focus of this section is on how data can be grouped and aggregated to allow you to interact with it at some higher level of granularity than what is stored in the database, using aggregate functions. Aggregate functions return a single result row based on groups of rows, rather than on single rows. You use these functions in your SQL queries to retrieve data for analysis and reporting purposes. For example, you can use these functions to:

· Evaluate total number of records in a table, or number of rows that meet some specific criteria.

· Retrieve summary information such as total, average, highest, or lowest value from a table column.

You'll use the following aggregate functions in the upcoming sections to achieve the objectives mentioned above. Note that unlike single row functions, these aggregate functions are supported by all SQL implementations.

FUNCTION

SYNTAX

AVG

AVG([DISTINCT|ALL]|expr)

COUNT

COUNT([DISTINCT|ALL]|expr)

MAX

MAX([DISTINCT|ALL]|expr)

MIN

MIN([DISTINCT|ALL]|expr)

SUM

SUM([DISTINCT|ALL]|expr)

Aggregate functions are usually added to the SELECT list, but can also appear in ORDER BY and HAVING clauses. They are commonly used in conjunction with the GROUP BY clause in a query to divide the rows of a queried table into groups.

AVG() Function

The AVG() function is used to obtain the average value of data in a specific column. The function calculates the average value by dividing the sum of values in the specified column by the number of rows in the table. This function takes as an argument any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The first statement below uses the AVG() function in its simplest form to return the average salaries of all employees as a single value.

NOTE:

To obtain the average value of multiple columns, you can use multiple AVG() functions in a query, each separated by a comma, like this: AVG(salary),AVG(commission_pct).

The AVG() function ignores column rows that contain NULL values.

All the five aggregate functions include DISTINCT|ALL clauses. ALL is the default clause which applies the function to all rows; to only include unique values, specify the DISTINCT clause, as done in the second example.

The use of DISTINCT clause within aggregate functions is not supported in Microsoft Access.

SQL Statement :

SELECT AVG(salary) “Average Salary”

FROM employees;

Output :

In the following example, a higher average salary is

returned due to the use of DISTINCT clause. There are

multiple salaries with the same values in the Employees table, therefore, excluding these duplicates resulted in a higher average.

SQL Statement :

SELECT AVG(DISTINCT salary) “Average Salary”

FROM employees;

Output :

AVG() Function (continued)

Using the AVG() function you can also assess the average value of some specific criteria. For example, the following query determines the average salary of employees in department number 50. This filtration is based on the WHERE clause which filters only salaries for employees working in the specified department.

SQL Statement :

SELECT AVG(salary) “Average Salary”

FROM employees

WHERE department_id=50;

Output :

COUNT() Function

As the name implies, the COUNT() function counts total rows in a table or it returns number of rows for the criterion specified in expr. You can use this function in the following two formats:

COUNT(*): It counts the number of rows in a table, including duplicate rows and rows containing

null values.

COUNT(expr): It returns the count of non-null rows in the column specified in expr.

The first example here counts all rows, irrespective of values, and returns the total number of records in the Employees table:

SQL Statement :

SELECT COUNT(*)

FROM employees;

Output :

This one counts the number of employees who are entitled to get commission. To narrows the result further, you can add the WHERE clause to these statements, like this: WHERE department_id=80.

SQL Statement :

SELECT COUNT(commission_pct)

FROM employees;

Output :

NOTE: There are 35 such records in the Employees table that carry values in the commission percent column. Other records from the table are eliminated, because they all have null values in this column.

MIN() and MAX() Functions

These two functions are used to get highest and lowest values from a column. A general perception is that these two functions are used to only retrieve high and low numeric values, which is not correct. In fact, you can apply them to date data and even to character values, as demonstrated in the following examples.

SQL statement to obtain high and low NUMERIC data :

SELECT MIN(salary) “Minimum”, MAX(salary) “Maximum”

FROM employees;

Output :

SQL statement to obtain high and low DATE data :

SELECT MIN(hire_date) “Minimum”, MAX(hire_date) “Maximum”

FROM employees;

Output :

The following example shows how you can apply high and low values to a character column. In this statement we applied the two functions on the first name column to fetch alphabetized high and low values.

SQL statement to obtain high and low CHARACTER data :

SELECT MIN(first_name) “Minimum”, MAX(first_name) “Maximum”

FROM employees;

Output :

Aggregate functions can also be nested into each other. For example, the following statement calculates the average of the maximum salaries of all the departments. The GROUP BY function will be discussed in a while.

SQL Statement :

SELECT AVG(MAX(salary))

FROM employees

GROUP BY department_id ;

Output :

SUM() Function

The SUM function returns the total of values of expr. The first function in the following example statement - SUM(salary) – is applied to a table column (salary) to calculate the total amount of salaries being paid in department number 80, while the second function - SUM(salary * commission_pct) – is an expression that retrieves the total amount of commission being paid in this department by multiplying each salary with each value in the commission percent column.

SQL Statement :

SELECT SUM(salary), SUM(salary * commission_pct)

FROM employees

WHERE department_id =80;

Output :

Aggregate Functions Used Together

So far, you have used aggregate functions individually (except for one used in the nested example). A specific situation may demand the use of all these functions in a single statement. Have a look at it in the following example which presents a complete summarized picture of the Employees table by congregating all the five functions in a single statement with respective return values.

SQL Statement :

SELECT COUNT(*) “Total Employees”, MIN(salary) “Minimum Salary”,

MAX(salary) “Maximum Salary”, AVG(salary) “Average Salary”,

SUM(salary) “Total Salary”

FROM employees;

Output :

The GROUP BY Clause

The SELECT syntax contains two more clauses: GROUP BY and HAVING. In the next few sections, you'll go through the use of these two important clauses, starting with the GROUP BY clause. The GROUP BY clause is used in a SELECT statement to divide the rows in a table into smaller groups. Aggregate functions are commonly used with this clause to produce summarized subsets of data. DBMSs apply the aggregate functions to each group of rows and return a single result row for each group. If this clause is omitted, then the aggregate functions are applied to all rows in the queried table - as you saw in the previous sections. For example, in the COUNT() function example you executed the statement (SELECT COUNT(*) FROM EMPLOYEES;) to get total number of employees from the table. But what will you do if you are asked to prepare a statement which returns number of employees working under each department? In situations like this, you make use of the GROUP BY clause that helps you get aggregate figures for each group. Let’s go through an example to handle the problem situation.

In the example presented on the next page, the DBMS groups the data by department number and then calculates the aggregate figures for each group using the GROUP BY clause. As a result, employees are counted for each department rather than for the entire table. The output shows 45 employees in department number 50 and 1 employee in department number 10.

Guidelines for the GROUP BY clause :

· Place the GROUP BY clause between the WHERE and HAVING clauses. By using a WHERE clause, you pre-exclude rows before dividing them into groups, whereas, HAVING filters data after the groups are formed.

· The current example uses just one column in the GROUP BY clause. However, you can create nested groups by adding as many columns as you need.

· Leaving the column used in the aggregate function, you must include all the columns or expressions (specified in the SELECT list) in the GROUP BY clause.

· If the grouping column contains a row with a NULL value, a separate NULL group will be created for such record. See record number 12 in the output screenshot on the next page. This null group is created because the department number is not specified in the table.

SQL Statement :

SELECT department_id “Department”, Count(*) “Number of Employees”

FROM employees

GROUP BY department_id

ORDER BY department_id;

Output :

Using WHERE and GROUP BY Together

As stated above, you can use a WHERE clause in a SELECT statement along with the GROUP BY clause to pre-exclude rows before dividing them into groups. For this, the WHERE clause must be placed before the GROUP BY clause. Here is an example which uses both these clauses together to display a list of job categories and total monthly salaries for each category, excluding the sales force: SA_MAN and SA_REP.

As you can see, the WHERE clause is used before GROUP BY and carries a condition to exclude irrelevant rows. The condition excludes the two categories of employees (Sales Managers and Sales Representatives), mentioned above. Also note the ORDER BY clause which uses the expression (specified in the SELECT list) to sort the output.

WHERE vs. HAVING

WHERE should be used for standard filtering, and HAVING (discussed next) must be used with GROUP BY to restrict groups. You cannot use the WHERE clause to restrict groups. The following two examples further elaborates these principles:

Incorrect:

select department_id, AVG(salary) from employees where AVG(salary)>5000 group by department_id;

Correct:

select department_id, AVG(salary) from employees group by department_id having AVG(salary)>5000;

SQL Statement :

SELECT job_id “Job Category”, sum(salary) “Salary”

FROM employees

WHERE job_id NOT LIKE ‘SA%’

GROUP BY job_id

ORDER BY sum(salary);

Output :

The HAVING Clause

As stated above, the HAVING clause restricts the groups of returned rows to those matching the specified condition. If you omit this clause, then the DBMS returns summarized result for all groups, as you saw in the example presented on Page 79. The DBMS performs the following steps when it sees this clause in a SELECT statement:

· Exclude rows not satisfying the WHERE clause.

· Transforms rows into groups.

· Applies the group function.

· Displays the groups for which the condition specified in the HAVING clause is true.

You already went through an example of this clause presented on the previous page; let’s have some more on this. The following example was also demonstrated on the previous page, and is repeated here by adding the HAVING clause to further restrict the result on the basis of aggregate information. Compare the result of this statement with the previous one. The HAVING clause condition eliminated the first three groups that appeared in the previous result.

SQL Statement :

SELECT job_id “Job Category”, sum(salary) “Salary”

FROM employees

WHERE job_id NOT LIKE ‘SA%’

GROUP BY job_id

HAVING sum(salary) >= 8300

ORDER BY sum(salary);

Output :

One more thing that needs to be clarified with respect to the GROUP BY and HAVING clauses is that, the GROUP BY clause can be used without using any aggregation function in the SELECT list. In the following example, both these clauses exist without using any aggregation function in the SELECT list. Because the SUM aggregate function is referenced in the HAVING clause, the GROUP BY clause is also specified.

SQL Statement :

SELECT department_id

FROM employees

GROUP BY department_id

HAVING sum(salary) < 8000;

Output :

Test Your Skill

1. Create a query that calculates 10% incremented salary for each employee. Display the new values under the name “INCREMENTED_SALARY” and as whole numbers.

2. Create a query that produces the following output for all employees. Use the CONCAT function to join names.

3. Create a query to display last name of employees with hire date as follows. Restrict the output to those who were hired in 2003.

4. Count number of characters in each department’s name as shown below:

5. Provide a list of employees as illustrated below. The output displays the first name of each employee with his/her manager number. If the manager number is null, display a text “No Manager” instead.

6. Pick three letters from the first name of each employee starting from the second position, like this:

7. Display hire dates of Jennifer in yyyy.dd.mon format.

8. Produce the following list which displays employees whose employment tenure is greater than 100 months.

9. Create a statement to display the number of employees for each job category within each department.

10. Display the minimum salary for each job id having a minimum salary more than 15000.