Chapter 5

CHAPTER 5

------

AGGREGATE ROW FUNCTIONS

Business managers often need aggregate information in order to support various decision-making tasks. These types of managerial questions tend to focus on information that represents exceptional situations. This is information that falls outside of the normal expectations for business operations. Sometimes, exception data represents undesirable situations, such as low product sales. At other times, exception data represents very good situations, such as sales that exceed corporate goals. At still other times, managers may need information about averages or counting values in order to compare performance across time periods, across departments, or across employee groups.

Typical managerial questions include: What product experienced the largest sales volume last quarter? What group of products were the worst sellers last month? What is the average salary for employees within a specific job type? How many employees in a specific department are female or male? What is the average number of hours employees devoted to completing a specific project or group of projects? These questions require your SQL queries to display data that is exceptional in some fashion, and this information must be aggregated in some fashion. SQL has numerous predefined aggregate functions that can be used to write queries to produce exactly this kind of information.

Objectives

In this chapter, you will learn to use many of SQL's aggregate functions to write queries. You will also learn the GROUP BY and HAVING clauses. The GROUP BY clause specifies how to group rows from a data table when aggregating information, while the HAVING clause filters out rows that do not belong in specified groups. Your learning objectives for the chapter are:

  • Write queries with aggregate functions: SUM, AVG, COUNT, MAX, and MIN.
  • Use the GROUP BY clause to answer complex managerial questions.
  • Nest aggregate functions.
  • Use the GROUP BY clause with NULL values.
  • Use the GROUP BY clause with the WHERE and ORDER BY clauses.
  • Use the HAVING clause to filter out rows from a result table.

IMPLEMENTING AGGREGATE FUNCTIONS

Aggregate functions perform a variety of actions such as counting all the rows in a table, averaging a column's data, and summing numeric data. Aggregates can also search a table to find the highest "MAX" or lowest "MIN" values in a column. As with other types of queries, you can restrict, or filter out the rows these functions act on with the WHERE clause.

For example, if a manager needs to know how many employees work in an organization, the aggregate function named COUNT(*) can be used to produce this information. The COUNT(*) function shown in SQL Example 5.1 counts all rows in a table. The wild card asterisk (*) is used as the parameter in the function—it means to count all of the rows.

/* SQL Example 5.1 */

SELECT COUNT(*)

FROM Employee;

COUNT(*)

------

24

The result table for the COUNT(*) function is a single column from a single row known as a scalar result or value. Notice that the result table has a column heading that corresponds to the name of the aggregate function specified in the SELECT clause. The output column can be assigned a more meaningful column name as is shown in SQL Example 5.2. Here an alias column name is specified after the aggregate function. Remember that if the alias column name is more than one word, then the column name must be enclosed in double-quotes.

/* SQL Example 5.2 */

SELECT COUNT(*) "Number of Employees"

FROM Employee;

Number of Employees

------

24

Table 5.1 lists some of the commonly used aggregate functions including their syntax and use.

Table 5.1

Function Syntax / Function Use
SUM( [ALL | DISTINCT] expression ) / The total of the (distinct) values in a numeric column/expression.
AVG( [ALL | DISTINCT] expression ) / The average of the (distinct) values in a numeric column/expression.
COUNT( [ALL | DISTINCT] expression ) / The number of (distinct) non-NULL values in a column/expression.
COUNT(*) / The number of selected rows.
MAX(expression) / The highest value in a column/expression.
MIN(expression) / The lowest value in a column/expression.

The ALL and DISTINCT keywords are optional, and perform as they do with the SELECT clauses that you have learned to write. The ALL keyword is the default where the option is allowed. The expression listed in the syntax can be a constant, a function, or any combination of column names, constants, and functions connected by arithmetic operators. However, aggregate functions are most often used with a column name.

There are two rules that you must understand and follow when using aggregates:

  • Aggregate functions can be used in both the SELECT and HAVING clauses (the HAVING clause is covered later in this chapter).
  • Aggregate functions cannotbe used in a WHERE clause. For example, the query in SQL Example 5.2 is wrong and will produce the Oracle ORA-00934group function is not allowed here error message.

/* SQL Example 5.2 */

SELECT *

FROM Employee

WHERE AVG(Salary) > 40000;

ERROR at line 3: ORA-00934: group function is not allowed here.

If you think about what an aggregate function does, and what the purpose of a WHERE clause is, then this error makes perfect sense. Remember, a WHERE clause includes or excludes rows from a result table based on user-defined criteria. The aggregate function then acts upon all rows or a subset of rows that satisfy the criteria specified by a WHERE clause. Since the WHERE clause must execute before the aggregate function takes effect, you cannot include an aggregate function in a WHERE clause. Later in this chapter you will learn how to use the HAVING clause to filter out rows with grouped data—after aggregate functions have been calculated.

Using the AVG Function

Suppose that managers need to determine the average salary of employees for the firm for budgeting or some similar purpose. You can use the AVG function to compute the average value for the Salary column in the employee table. For example, the query in SQL Example 5.3 returns the average of the employee salaries. The output column in the result table is formatted and renamed "Average Employee Salary."

/* SQL Example 5.3 */

COLUMN "Average Employee Salary" FORMAT $999,999;

SELECT AVG(Salary) "Average Employee Salary"

FROM Employee;

Average Employee Salary

------

$15,694

Now suppose that a similar, yet different management question is posed. What is the average salary offered to employees? This question asks you to incorporate the concept of computing the average of the distinct salaries paid by the organization. The same query with the DISTINCT keyword in the aggregate function returns a different average as shown in SQL Example 5.4.

/* SQL Example 5.4 */

SELECT AVG(DISTINCT Salary) "Average Employee Salary"

FROM Employee;

Average Employee Salary

------

$16,336

The difference between the two queries occurs because the DISTINCT keyword causes SQL to omit duplicate values from the processing. In SQL Example 5.3, there were two employee rows, each with a salary of $2,200 that were selected and used in computing the average displayed in the result table. In SQL Example 5.4, the salary of $2,200 or any other salary figure that is paid to more than one employee is only used once in computing the average DISTINCT salary. In both queries, NULL values stored in the Salary column are ignored.

Using the Sum (SUM) Function

Suppose that a senior manager needs to know the total salary being paid currently by the organization. This question can be answered by using the SUM function. This function can compute the total of a numeric table column. The SELECT statement shown in SQL Example 5.5 returns the requested total of the Salary column from the employee table.

/* SQL Example 5.5 */

COLUMN "Total Salary" FORMAT $999,999;

SELECT SUM(Salary) "Total Salary"

FROM Employee;

Total Salary

------

$345,265

If management is preparing a budget for various departments, you may be asked to write a query to compute the total salary for different departments. The query shown in SQL Example 5.6 computes the total Salary for employees assigned to department 8.

/* SQL Example 5.6 */

COLUMN "Total Salary Dept8" FORMAT $999,999;

SELECT SUM(Salary) "Total Salary Dept8"

FROM Employee

WHERE DepartmentNumber = 8;

Total Salary Dept 8

------

$45,020

Keep in mind that SQL is not case-sensitive with respect to keywords. Query 5.6 could also have been typed in lower case as is shown in SQL Example 5.7.

/* SQL Example 5.7 */

select sum(Salary) "Total Salary Dept8"

fromemployee

where DepartmentNumber = 8;

Regardless, the result table will be identical. This is just to remind you that keywords may be entered in either lowercase or uppercase, and this rule also applies to aggregate function names. By convention, we type keywords in uppercase to differentiate between keywords and column/table names.

Using the Minimum (MIN) and Maximum (MAX) Functions

At times, managers need to know which value in a column is the largest or smallest of all values. Questions such as what product was sold the most or least, which employee is paid the largest or smallest salary, and similar questions arise continually in business. SQL provides two aggregate functions to assist you in writing queries to answer these types of questions. The MIN function returns the lowest value stored in a data column. Similarly, the MAX function returns the largest value stored in a data column. However, unlike SUM and AVG, the MIN and MAX functions work with numeric, character, and date data columns.

SQL Example 5.8 gives a query that uses the MIN function to find the lowest value stored in the LastName column of the employee table. This is analogous to determining which employee's last name comes first in the alphabet. Conversely, MAX() lists the employee row where last name comes last (highest) in the alphabet.

/* SQL Example 5.8 */

SELECT MIN(LastName), MAX(LastName)

FROM Employee;

MIN(LASTNAME) MAX(LASTNAME)

------

Adams Zumwalt

More often you will use the MIN and MAX functions to manipulate numeric data columns. Let's return to the management question of what are the highest and lowest salaries paid to employees of the firm. The query shown in SQL Example 5.9 uses the MIN and MAX function to answer this question. Notice that the query does not provide the actual names of the employees with these salary values. We will learn later in this chapter how to combine aggregate functions with column names to answer such a question.

/* SQL Example 5.9 */

COLUMN "Highest Salary" FORMAT $999,999

COLUMN "Lowest Salary" FORMAT $999,999

SELECT MAX(Salary) "Highest Salary",MIN(Salary) "Lowest Salary"

FROM Employee;

Highest Salary Lowest Salary

------

32500 $2,200

SQL Example 5.9b uses MIN and MAX to display information from the DateHired column of the employee table. Notice that the MIN and MAX functions appear to work backwards, but if you consider the dates listed, clearly the date 14-Dec-79 is chronologically earlier (smaller) than the date 15-Oct-01. Unfortunately the query doesn't inform us as to which has worked longest for the firm or which employee is the most recent hire. You will learn to write such queries later in your studies.

/* SQL Example 5.9b */

COLUMN "Oldest Hire" FORMAT A11;

COLUMN "Newest Hire" FORMAT A11;

SELECT MAX(DateHired) "Newest Hire",MIN(DateHired) "Oldest Hire"

FROM Employee;

Newest Hire Oldest Hire

------

15-OCT-01 14-DEC-79

Using the COUNT Function

At the beginning of this chapter, you learned to use the COUNT(*) function to count the number of rows in a table. The COUNT function does essentially the same thing. The difference is that you can define a specific column to be counted. When the COUNT function processes a specified column, rows containing a NULL value in the named column are omitted from the count. Recall that a NULL value stands for "unknown," and that this should not be confused with a blank or a zero value. The query shown in SQL Example 5.10 counts the number of employees that are assigned a supervisor. Employees not assigned a supervisor will have a NULL value for the supervisor's identifying number column (SupervisorID).

/* SQL Example 5.10 */

SELECT COUNT(SupervisorID) "Number Supervised Employees"

FROM Employee;

Number Supervised Employees

------

23

In contrast, the COUNT(*) function in SQL Example 5.11 counts each employee row regardless of NULL values.

/* SQL Example 5.11 */

SELECT COUNT(*) "Number of Employees"

FROM Employee;

Number of Employees

------

24

USING GROUP BY WITH AGGREGATE FUNCTIONS

Now that you've gained familiarity with aggregate functions, you're ready to add power to your queries. Aggregate functions are normally used in conjunction with a GROUP BY clause. While it is possible to use the GROUP BY clause without aggregates, such a construction has very limited functionality, and could lead to a result table that is confusing or misleading. We will focus on using the GROUP BY clause with aggregate clauses.

When properly used, the GROUP BY clause enables you to use aggregate functions to answer more complex managerial questions such as:

  • What is the average salary of employees in each department?
  • How many employees work in each department?
  • How many employees are working on a particular project?

Review the queries posed in the previous section. You'll find queries that use aggregate functions to answer questions regarding a single department in the MadisonHospitaldatabase. However, if a manager requests information about the average salary for each department, you would need to write a separate query for each department using the approaches demonstrated thus far. Clearly there has to be a better way! SQL comes to the rescue with the GROUP BY clause and enables you to answer questions about each department with a single query.

The query in SQL Example 5.12 answers the managerial question, how many employees work for each department? The count produced by the COUNT(*) aggregate function is grouped by department based on the DepartmentNumber column value.

/* SQL Example 5.12 */

SELECTDepartmentNumber"Department", COUNT(*) "Employee Count"

FROM Employee

GROUP BY DepartmentNumber;

Department Employee Count

------

1 2

2 3

3 5

more rows will be displayed . . .

Further, Oracle provides considerable flexibility in specifying the GROUP BY clause. The column name used in a GROUP BY need not be listed in the SELECT clause; however, it must be a column name from one of the tables listed in the FROM clause. We can rewrite the query in SQL Example 5.12 without specifying the DepartmentNumber column as part of the result table, but as you can see from the result table for SQL Example 5.13, the results are rather cryptic without the DepartmentNumber column to identify the meaning of the aggregate count.

/* SQL Example 5.13 */

SELECT COUNT(*) "Employee Count"

FROM Employee

GROUP BY DepartmentNumber;

Employee Count

------

2

3

5

more rows will be displayed . . .

Note, however, that if your SELECT clause includes both column names and aggregate functions (as was the case with the query in SQL Example 5.12) then you must also have a GROUP BY clause in your query. Further, the column name(s) in the GROUP BY clause must match the column name(s) listed in the SELECT clause. Otherwise, Oracle will return error messages as shown below in SQL Example 5.14 and SQL Example 5.15.

/* SQL Example 5.14 */

SELECTDepartmentNumber"Department", COUNT(*) "Employee Count"

FROM Employee;

ERROR at line 1: ORA-00937: not a single-group group function

/* SQL Example 5.15 */

SELECTDepartmentNumber"Department", COUNT(*) "Employee Count"

FROM Employee

GROUP BY Gender;

ERROR at line 1: ORA-00979: not a GROUP BY expression

The GROUP BY clause does have some limitations. For example, you cannot use an aggregate function in a GROUP BY clause. Oracle will return the ORA-00934: group function not allowed here error message as demonstrated by SQL Example 5.16.

/* SQL Example 5.16 */

SELECT AVG(Salary), Salary * 1.25

FROM Employee

GROUP BY AVG(Salary);

ERROR at line 3: ORA-00934: group function is not allowed here

Using GROUP BY with Expressions

In addition to column names, any expression listed in a SELECT clause can also be used with a GROUP BY clause. Suppose one of our managers needs to know the average salary of employees for our organization. Further, the manager needs to know what the new average salary figures will be if all employees receive a 25% raise (a great business year for the firm)! SQL Example 5.17 produces the needed information by using a GROUP BY clause to group on the expression, Salary * 1.25, as opposed to a column name.

/* SQL Example 5.17 */