SI 654 Database Application Design

Winter 2003

School of Information

University of Michigan

Instructor: Dragomir Radev

Assignment #3 (100 points)

The purpose of this assignment is to give you hands-on experience with the database administration, XML databases, and data mining.

Exercises

1. (40 points) In this assignment, you will be working with the Personal edition of Oracle. Please refer to chapter 12 of the book for some helpful instructions. Submit as much documentation as needed to show that you have understood the assignment. In particular, submit any SQL code that you have written as well as the output of SELECT(*) for any table or view that you create. Any metadata query that shows what you’ve done should also be included. Refer to table 12-22 for a description of the data dictionary that Oracle uses to store database metadata.

Follow the installation instructions that come up when you start the CD-ROM.
Create the three tables shown in Figure 12-8. Populate them with a few sample records.
Create relationships as indicated on pages 338—339.
Create indexes as indicated on pages 339—340.
Modify the table structure as shown on page 341.
Create the view shown on page 343.
(10 points extra credit) Create stored procedures and triggers as shown on pages 344—353. Briefly discuss what they do.

2. (10 points) In one page, describe how Oracle performs concurrency control. Refer to chapter 12.

3. (20 points) Imagine that you are working in the capacity of a consultant to an information processing company. Use the Web to answer the following two questions. For each question, turn in 2-3 pages of informative text.

What are the main standards/projects for representing and querying XML in databases? Give specific examples of software products that implement them.
What are some top data mining software packages and what are their characteristics (functionality, availability, hardware platform, price, etc.)?

CONTINUED ON THE BACK OF THE PAGE

4. (30 points) The table below is a very small database, which consists of six records about the London stock market. Each record corresponds to a particular day; each field corresponds to a question with binary answers. We are interested in predicting the value of the last field according to the other four (Note: the RID attribute is irrelevant).

RID / NY rises today? / Interest rate high? / Unemployment rate high? / It rose yesterday? / It rises today?
1 / Y / N / N / Y / Y
2 / N / Y / Y / Y / Y
3 / N / N / Y / N / Y
4 / N / Y / N / Y / N
5 / N / N / N / N / N
6 / N / Y / N / N / N

Induce a decision tree from the data in the table, using what you learned from class. Assume that all four predictor attributes as well as the target concept (“It rises today”) are Boolean (“Y” or “N”). Show all your entropy and information gain calculations.

b. Suppose a 7th record is inserted into the table, with the following values: (‘N’, ‘Y’, ‘Y’, ‘N’, NULL). Replace the “NULL” with the value predicted by your decision tree.

PLEASE start early! I will not accept late submissions except in medical emergencies.

And … HAVE FUN!