Problem 1 12 : Conceptual Schema Design

Final Exam

COSC 3480

Tu., December 13, 11a

Your Name:

Your SSN:

Problem 1 [12]: Conceptual Schema Design

Problem 2 [12]: RAID and Buffer Management

Problem 3 [18]: Physical Database Design

Problem 4 [16]: Database Programming

Problem 5 [13]: XML

Problem 6 [6]: Normalization

Problem 7 [15]: B+-Trees

S [92]:

Grade:

The exam is “open books and notes” and you have 125 minutes to complete the exam; it counts approx. 30% towards the overall grade.

1) Conceptual Schema Design [12]

Design an Entity-Relationship Diagram that models the following objects and relationships in the world of baseball (MLB): teams, players, games, owners and contracts for a single season. Each team has a unique team name, and a city it plays in. Each person being part of the baseball world has a unique ssn and a name and a birth date. Additionally, for players their position and height are of interest. Players under contract with a team receive a monthly salary for their services, and teams have at least 9 and at most 30 players under contract; some players are unemployed and players can only play for at most one team. Each team has between one and five owners, and persons are not allowed to more than one team. Players can be owners. A game involves a home-team and visiting-team; additionally, the day of the game, the starting time of the game and the score of the game are of importance; teams might play each other multiple times (even on the same day, but the starting time for each game is different for multiple games played on the same day).

Indicate the cardinalities for each relationship type; assign roles (role names) to each relationship if there are ambiguities! Use sub-types, if helpful to express constraints!

2) RAID and Buffer Management [12]

a) Explain what data striping is and why it is useful to reduce disk access time [3]

b) Some RAID architectures use redundancy and parity schemes to enhance the reliability of a disk array. Give an example that illustrates how such an approach works! [5]

c) The buffer manager maintains a pin count for each frame in the buffer pool. What purpose(s) does the pin count serve for the buffer manager? What role does it play in the block manager’s decision making?[4]

3) Physical Database Design [18]

Assume a relation R(A, B, C) is given; R is stored as an unordered file and contains 1000000 (1 million) tuples. Attributes A, B and C need 4 byte of storage each, and blocks have a size of 4000 Byte. Furthermore, you can assume that if static hashing is used blocks are 80% full and do not contain any overflow pages. Each A value occurs at an average 20 times in the database, each B value occurs 100 times in the database, and each C value occurs 100000 times in the database. Moreover, we assume that static hashing is also used to implement index structures, and that index pointers require 4 byte of storage. What additional index structures would you create to speed up the following 3 SQL statements (value is a placeholder for an integer constant)?

S1: Delete S2: Select A S3: Delete

from R from R from R

where A=value where B=value where B=value

and C=value and C=value;

deletes 2 tuples deletes 10 tuples deletes 100 tuples

Describe which index structures you would create trying to minimize the sum of the cost of executing the three sql-statements; indicate for each SQL-statement how you would implement it and its cost (in number of block accesses including the formula you used to compute the cost).

4) Database Programming [16]

The following relational schema is given: employee(ssn, name, education-level, age), works-for(employee, employer, salary) and company(C#, c-name, location, sector) with ssn being used as a foreign key for employee in works-for, and C# being used as a foreign key for employer in works-for. You can assume that persons are employed at most once.

Create a table X(number_of_employees, avg-salary_paid), or X(n_e, a_s_p) for short, that gives the average salary that is paid by companies that have a certain number of employees. For example, if the database contains (one or multiple) companies that employ 10, 20, 50, and 10000 employees, X would contain 4 tuples in this case, specifying the average salary of persons working for companies with 10 employees, for companies with 20 employees, for companies with 50 employees and for companies with 10000 employees. If you prefer, you can write a sequence of SQL-statement, instead of a single query, to solve this problem.

5) XML [13]

a) One focus of XML is being portable. How does XML achieve portability? [3]

b) Why are domain specific DTD important for many businesses? [3]

c) Transform the E/R schema into an equivalent XML-DTD document definition. Also preserve the cardinality constraints in your mapping (if possible). Assume that location is a multi-valued attribute that gives the cities in which the company has offices [7].

(2,*)

(1,1)

6) Normalization [6]

Assume a relation R(A,B,C) is given with AàC, CàA. Is R in BCNF? If not transform R into a relational schema that is in BCNF and which does not have any lost functional dependencies!

7) B+-trees [15]

a) Assume a relation R(A,B,C) is stored as a B+-tree of height 4 (root, 2 intermediate layer, leaf) using attribute A as the search key with p=200 and m=100. Moreover, one node of the B+-tree is stored in one block on the disk. Let us assume the following SQL query is given:

SELECT B,C

FROM R

WHERE A>8888 OR A<444

which returns 500 answers (200 with A-values less than 444 and 300 with A-values greater than 8888). How would you implement the query? Be specific! Also compute the number of block access your chosen implementation needs. [8]

b) Assume that the following B+-tree with p=5 and k=4 is given. Furthermore, assume that the keys 88, 42, 43, 55, 40 are deleted in the indicated order. Show how the tree looks like after each deletion. [7]