1

MIS 3500 DBMS * Instructor: Bob Travica

Lab on Advanced Querying and Other Uses of SQL

Updated 2018

The purpose of this lab is to complete the topic of querying and provide you with some methods that could be useful for developing your term project application. Specifically, you willexplorethese topics:

- Writing Outer Joins

-Writing complex queries that use other queries and checking query accuracy

-Writing Crosstab Query

-Writing SQL statements for modifying data (Update Query)

-Writing SQL statements for deleting data (DELETE Query)

-Writing SQL statements for inserting data into another table (INSERT Query)

To go through this lab you need to use the Sally’s Pet Store 2007 system. You can

download it here.Its database is different from the 2010 version we used in other occasions in the Animal line of products (there are animal orders, sales, etc.); take a look the schema before you proceed to querying.

1. Outer Joins

INNER JOIN displays just the rows matched between two tables, that is, the rows that have the same values in columns for the primary key and its matching foreign key. In contrast, OUTER JOIN displays all the rows from one table as well as the matched rows from another table. You can think of INNER JOIN as symmetric and OUTER JON as asymmetric. A useful consequence of OUTER JOIN is that it can uncover rows that do not match on the values of the primary and the foreign key.

Consider the question: What animals have not been sold? In the class, it was queried by using the INNER JOIN combined with NOT IN and IS Null on the tables Animal and Sales. (Recall that the table Animal stores the data on the animals inventory without deleting data when an animal is sold – the archival purpose.)

Here is such a query using the Boolean operator NOT with the close IN and applied to PetStore 2007:

SELECT Animal.AnimalID FROM Animal WHERE Animal.AnimalID NOT IN (SELECT SaleAnimal.AnimalID FROM SaleAnimal);

Output: 7 rows of column AnimalID, values:4, 12, 19, 25, 34, 88, 181.

To simplify the query by avoiding the query-subquery format, the user can call up the outer join:

SELECT Animal.AnimalID, SaleAnimal.AnimalID

FROM Animal LEFT JOIN SaleAnimal ON Animal.AnimalID=SaleAnimal.AnimalID

ORDER BY SaleAnimal.AnimalID, Animal.AnimalID Asc;

The left join outputs all the rows from the “left table” (the one cited first in the SQL statement) and the matching rows from the other table. The unmatched values of the primary key (AnimalID) from the left table will be associated with null values of the foreign key (AnimalID) in the other table (blank cells or somehow else designated). Note that these rows and blank FK cells do not really exist in table SaleAnimal; they are just created in the RAM for the sake of outputting results of the LEFT JOIN query. Check the table SaleAnimal (open it and then sort on SaleAnimal.AnimalID as shown in Figure 1).

Figure 1. Result of LEFT JOIN

Therefore, the left join will output all rows from the table cited before the command LEFT JOIN and all the matched from the joined table, along with blank rows not containing FK values.

The following modification of the query above will give you just the unmatched rows, those in which the value of AnimalID is null:

SELECT Animal.AnimalID As [Unsold Animals]

FROM Animal LEFT JOIN SaleAnimal ON Animal.AnimalID=SaleAnimal.AnimalID

WHERE SaleAnimal.AnimalID IS null;

Since there is LEFT JOIN, there must be RIGHT JOIN as well. Think of the question: Which customers have not made any purchase? To answer this question, you need to retrieve the tables Sale and Customer. Again, you can use the NOT IN option. Or you can take the perspective of the Sale table and ask which Customer IDs exist in the Customer table but do not appear in the Sale table. A right join comes handy in this case:

SELECT Sale.CustomerID, SaleDate, Customer.CustomerID

FROM Sale RIGHT JOIN Customer ON Customer.CustomerID=Sale.CustomerID

ORDER BY Sale.CustomerID, Customer.CustomerID, SaleDate Asc ;

The right join will output all the rows from the table cited after the command RIGHT JOIN (Customer) and all the rowsthat are matched between the two tables on the values of Customer ID. See Figure 2.

Note that SaleDate is included just as an additional detail proving that a sale did or did not occur.

Figure 2. Result of RIGHT JOIN

Notice also that the same output could be obtained by simply reversing the order of tables in the FROM line and using a LEFT JOIN instead. RIGHT JOIN is provided for convenience, but it is equivalent to LEFT JOIN when the order of tables cited is reversed.

2. Complex Queries and Checks

Complex queries do not need necessarily to involve a large number of tables. A complex query is also one that involves a bigger number of processing steps needed to create an output.

Consider this standard business question: What is the total value of animal orders, including shipping costs, placed in February?

If you try to write a single query for answering this question, it may not be that easy to get a correct result. In particular, due to the 1:M relationship between AnimalOrder and AnimalOrderItem, shipping costs could be calculated wrongly for each animal appearing on an order rather than for the entire order (check the Pet Store schema). More on this problem in queries 4-6 below.

Fortunately, you can always simplify query writing by breaking a question into pieces, and addressing these by separate queries. At the end, you either automatically or manually put together the results of these individual, simpler quires. Therefore, to answer the question above, you can write two simpler queries—one to calculate shipping costs, and the other toget the value of animals ordered. Follow the procedure below and try it out.

1. Write the query:

SELECT SUM(AnimalOrder.ShippingCost) AS [Total Shipping Costs]

FROM AnimalOrder

WHERE Month(AnimalOrder.OrderDate)=2;

Output = $894.92. Save this query as QueryTotalShipping. (Use File/Save As/Save Object As.)

2. Write the query:

SELECT SUM(AnimalOrderItem.Cost) AS [Total Animal Orders]

FROM AnimalOrder

INNER JOIN AnimalOrderItem ON AnimalOrderItem.OrderID=AnimalOrder.OrderID

WHERE Month(AnimalOrder.OrderDate)=2 ;

Output = $5,475.33. Save this query as QueryTotalAnimalsPurchased.

3. Therefore, you get the total of animal orders and shipping costs by adding outputs from steps1 and 2: $894.92+$5,475.33= $6,370.25. This total could be retrieved by the third query:

SELECT SUM([QueryTotalShipping].[Total Shipping Costs] + [QueryTotalAnimalsPurchased].[Total Animal Orders]) AS [Animals & Shipping Total]

FROM QueryTotalShipping, QueryTotalAnimalsPurchased;

Output: $6,370.25.

Save this query as QueryTotalAnimal-Purchases-Shipping. When it is run, this query will trigger the two associated queries and deliver the end result.

Note: The FROM line resembles the older SQL syntax, and there is no PK-FK relationship.

However,if you try to write the query in one step without exercising caution, you may get inflated totals as with the following queries:

4.

SELECT SUM(AnimalOrderItem.Cost + AnimalOrder.ShippingCost) AS [Sum Animal Orders]

FROM AnimalOrder

INNER JOIN AnimalOrderItem ON AnimalOrderItem.OrderID=AnimalOrder.OrderID WHERE Month(AnimalOrder.OrderDate)=2;

The output: $8,867.99.

5. Same with:

SELECT sum(AnimalOrderItem.Cost)+sum(AnimalOrder.ShippingCost) AS TotalCosts

FROM AnimalOrderItem INNER JOIN AnimalOrderON

AnimalOrderItem.OrderID=AnimalOrder.OrderID

WHERE Month(AnimalOrder.OrderDate)=2;

6. And with this one; here you can see the inflated shipping costs:

SELECT SUM(AnimalOrderItem.Cost) as TotalValueOfAnimalOrders, SUM(AnimalOrder.ShippingCost) AS TotalShippingCosts

FROM AnimalOrderItem INNER JOIN AnimalOrder ON AnimalOrderItem.OrderID=AnimalOrder.OrderID

WHERE Month(AnimalOrder.OrderDate)=2;

In queries 4-6, shipping costs are inflated since they are associated with each animal that appears on an order, rather than with an order as a whole.

2.1 Exercises

Here are some query exercises. If you need help, check Solutions at the bottom of this file.

1. Find totalsales per customer for any (chosen) month.Hint: a month can be specified by a month function - Month(Sale.SaleDate). Think of the parameterquery. Also, pay attention to the GROUP BY line or to the associated error messages: it must list the attributes used for grouping records.

2. Which merchandise items with more than 100 units on hand have not been sold? Hint: Use an OUTER JOIN to answer the question.

Output: 2 items

3. Which animal orders contained more than one animal?

Output: 22 items

3. Self-Referencing a Table in a Query

Sometimes even a query of a single table can be tricky. Consider the common business question: Who manages whom? Or specifically in the Pert Store business, which employees are managed by manager Bill Gibson?

The problem with this query is that there are no separate tables on employees and managers. This would be waste of space because managers are also employees that have all the employee data. Therefore, the background class diagram is of the type unary, containing just the class Employee. In the corresponding Employee table, each employee has the attribute ManagerID, which takes values of the key attribute EmployeeID (ManagerID in a particular record = EmployeeID in some other record). Since every employee has some manager, with the exception of the top manager, the column ManagerID will be populated in all but 1 record (the employee who is a top executive).

The goal, therefore, is to match each employee with their manager, which on the level of attributes involves associating an EmployeeID with a ManagerID (another employee). The value of ManagerID functionally depends on the value of EmployeeID. To answer “who manages whom”, you need to self-reference the table, or build a reflexive join. In other words, youjoin the table with itself (actually, with its copy). This is done by calling this table once by its real name and then by a made-up name (an alias). The aliased table is a copy of this table that the system makes dynamically (in the main memory) and joins it to table Employee. Consider this query to find out who is managed by manager Bill Gibson:

SELECT Employee.EmployeeID, Employee.LastName, Employee.FirstName,

Employee.ManagerID

FROM Employee AS Manager

INNER JOIN Employee ON Manager.EmployeeID=Employee.ManagerID

WHERE (Manager.LastName="Gibson") AND (Manager.FirstName="Bill");

Output:

EmployeeID / LastName / FirstName
5 / James / Leisha
6 / Eaton / Anissa
7 / Farris / Dustin
9 / O'Connor / Jessica
10 / Shields / Howard

Analysis: Focus on the table Employee. We can figure out from a brief inspection of the Employee table the underlying business rule: each manager can manage many employees, and each employee has only one manager. Therefore, Manager and Employee are in the 1:M relationship. This implies that the same ManagerID repeats many times in table Employee where it is matched with different values of EmployeeID. We want to pick out employee rows where name attributes are Bill Gibson, but we do not know what hisEmployeeID is.

The figure below shows how the system simulates a transient (temporary) table Manager that is in 1:M relationship with the existing table Employee. Assuming that Gibson has no manager, you can see that Gibson is the manager of the other two employees as Gibson’s ID is associated with them. We need table Manager to find Gibson on the name attributes. The tricky part is to figure the names of matching columns (PK, FK). Since EmployeeID remains PK of the new table Manager, the table’s key fully named is Manager.EmployeeID. In table Employee, the FK is ManagerID, and so its full name is Employee.ManagerID.

Manager
EmployeeID / LastName / FirstName
5 / Gibson / Bill
6 / Trump / …
7 / Putin / …
/ Employee
EmployeeID / LastName / ManagerID
5 / Gibson
6 / Trump / 5
7 / Putin / 5

4. Crosstab Query

Crosstab queries are useful for representing multidimensional data. For example, you may want to see total sales per animal category by each employee. This can be very useful for management purposes. One can get this output via a crosstab query. The crosstab query in this exercise can be useful for building reports atop of it (this will be addressed in the lab on system development).

To design a crosstab query, you first need to write a SELECT query. Then you use this query as the input for the crosstab query. A SELECT query can find total sales per animal category and employee, and then

a crosstab query regroups records, cross-tabulates them, and performs calculations.

SELECT Animal.Category, Employee.LastName, Sum(SaleAnimal.SalePrice) AS [Value Sold]

FROM Employee, Sale, SaleAnimal, Animal

WHERE Employee.EmployeeID=Sale.EmployeeID AND Sale.SaleID=SaleAnimal.SaleID AND SaleAnimal.AnimalID=Animal.AnimalID

GROUP BY Animal.Category, Employee.LastName;

Output, partial (43 rows in total):

Category / LastName / Value Sold
Bird / Carpenter / $379.88
Bird / Eaton / $153.07
Bird / Farris / $202.03
Bird / Gibson / $326.21
Bird / Hopkins / $310.82

Save this query under the title QueryCategoryEmployeeValue. Now you can build a crosstab query from this query by using this procedure:

1. In the main menu at the top, click Create

2. In the Create menu, click Query Wizard

3. In the New Query dialog box, click Crosstab Query Wizard and the OK button

4. In the Crosstab Query Wizard dialog box, check the Queries label in the View section

5. In the list of queries available, find your QueryCategoryEmployeeValue; click it and then click the Next button.

6. On the next screen, select the field Last Name for the row headings; click Next button

7. On the next screen, select Category for the column headings; click Next

8. On the next screen, click Value Sold (provided that you named this way the sum of the

SalePrice field in the QueryCategoryEmployeeValue). Then, select Sum function and

click Next.

10. On the next screen, accept the choice for naming the crosstab query – it should be
QueryCategoryEmployeeValue_Crosstab – and click the Finish button

Figure 3. shows the output and its SQL statement (note the new commands).

Figure 3. Crosstab Query

TRANSFORM Sum(QueryCategoryEmployeeValue.[Value Sold]) AS [SumOfValue Sold]

SELECT QueryCategoryEmployeeValue.[LastName], Sum(QueryCategoryEmployeeValue.[Value Sold]) AS [Total Of Value Sold]

FROM QueryCategoryEmployeeValue

GROUP BY QueryCategoryEmployeeValue.[LastName]

PIVOT QueryCategoryEmployeeValue.[Category];

5. Writing SQL statements for modifying data (Update Query)

Many tasks in developing and managing database systems can be performed with SQL queries. They cover all the CRUD operations you already know, just a bit differently named (Create=INSERT, Read=SELECT, and UPDATE and DELETE labeled as in “CRUD”). The queries performing other than retrieval tasks are sometimes called Action Queries. As the retrieval query is named after the command SELECT, action queries are named after theirkey command words; for example, UPDATE query.

The UPDATE query changes values of attributes in already existing records. To practice it, let us first add a record to the table Animal in the Sally’s Pet Store system. Open the table and start typing in the blank row at the bottom as shown below:

Animal
AnimalID / Name / Category / Breed / DateBorn / Gender / Registered / Color / ListPrice / Photo
202 / Fify / Dog / Mastiff / 2/2/2007 / Male / ACK / Brown / $100.00

Save the table. The new record with AnimalID 202 appears in the table (or some other number that comes after the last number of AnimalID in your version of Pet Store). Next, assume that you want to change the Registered attribute (it is mistyped) and Color (say, “Gold” is a more precise description). Instead of going directly to records and retyping these modified data, an UPDATE query can do the job:

UPDATE Animal

SET Color = "GOLD"

WHERE AnimalID=202;

If your Animal table is opened on the screen, you may need to click in the row you just updated to see the change. Save the table.

To practice this type of query more, make a copy of table Animal, and then work with it. Write an UPDATE query to set the Color to “Gold” for all animals and to increase the List Price for 5%. If you get stuck, check the solutions at the bottom of this file.While “goldening” all the animals is just for fun, you can easily see a business value of the massive adjustment of prices that the other part of this query does.

If you get stuck, check Solutions. After running a query, you may need (again) to click in the table and/or to close and re-open it in order to see the changes.

6. Writing SQL statements for deleting data (DELETE Query)

The following DELETE query will delete a record you want from table Animal:

DELETE

FROM Animal

WHERE AnimalID=[ ];

Note that rows can be deleted directly in a table: likewith Excel, right-click the first, unnamed column in a row, and select Delete Record in the pop-up menu.

As with the UPDATE query, real power of the DELETE query surfaces with processing many rows. For example, the query below will delete all the rows on spiders in your table Copy of Animal. But first, let’s find quickly what spiders are listed in the catalogue. You can use the filtering function for this.Filtering is like retrieving data but without using SQL or QBE statements; also, you work directly with rows in tables (or query outputs).

Try this:

1. Open table Copy of Animal

2. Find the first occurrence of the word “spider”; for example, right-click the