CIS 211C iLearning text notes associated with text slides.
Database Design
Section 6
6.1.4 A composite UID is a set of columns that you define which makes the data in your table unique.When you define more than one column as your primary key on a table, it is called a composite primary key. composite primary keys are very important when designing a data model with integrity.
6.1.5 There are three types of primary keys. A simple UID is a primary key that consists of a column that uniquely identifies an entity, for instances a person's Social Security number or a vehicle identification number (VIN). An artificial key is a column created for an entity to serve solely as the primary key and is visible to users. Examples of this would include student number, customer number or invoice number.
6.1.10 In an entity, there may be more than one attribute that could identify the row of data uniquely. However, in those instances, only one can be selected as the Primary UID and the others would be Secondary UIDs. In this instance where several attributes could be used to identify a record uniquely, we call these candidate UIDs, because each could be a candidate for becoming a Primary UID in and of themselves.
6.2.5 The last topic for this week’s discussion is normalization. Although we’ll look at three levels of normalization, and we call them first normal form, second normal form, and third normal form, this week we’ll look at the first normal form. The purposes of normalization are: to ensure that data is stored in only one location in the database; to increase data integrity; and, to identify missing tables, columns, and constraints in the original model. First normal form states that all attributes must be single-valued or that there be no multivalued attributes. In order for a data model to be in first normal form, this rule has to be followed. To validate first normal form, we check that each attribute has a single value for each occurrence of the entity. This means that no attribute should have repeating values. Sometimes when a database is first modeled, the first normal form is automatic. But, sometimes there are attributes that violate first normal form. We need to decide what to do with those. In this example, the classroom attribute of SCHOOL BUILDING has multiple values. Therefore, to put the entity in first normal form, a separate entity CLASSROOM is created that has a one-to-many relationship with SCHOOL BUILDING and has its own attributes. Let’s examine additional examples.
6.2.6
The examples on this slide are not in first normal form. In Example 1 there is a CLIENT entity that has event as an attribute. With this model the CLIENT could request more than one event, therefore violating first normal form. In example 2, the SHOPPING MALL entity has attributes labelled name and floor for the shops. There is usually more than one shop in a SHOPPING MALL demonstrating that this entity has repeating groups of name and floor. Finally, in example 3, the player attribute of the TEAM entity is a repeating group since a TEAM should have more than one player.
6.2.7 To correct the ER diagrams in the previous slide, we can see that in example 1, an EVENT entity is created and the attribute event is removed from the CLIENT entity. Additionally, a one-to-many relationship is created between the two entities. Likewise, in example 2, the SHOPPING MALL entity now has a one-to-many relationship with the new STORE entity, which has the attributes name and floor. Finally, example 3 is now in first normal form because a new entity PLAYER was created with its own attributes and the player attribute was removed from the TEAM entity. A one-to-many relationship was created to show that a TEAM may have one or more PLAYERs.
6.3.5
In the process of efficiently storing data, and eliminating redundancy, tables in a database are designed and created to be in one of five possible normal forms. Each normal form contains and enforces the rule of the previous form, and, in turn, applies some stricter rules on the design of tables. A set of tables in a database are initially said to be in zero normal form. Tables in first normal form have a primary key, no single column has multiple values and the columns that do not make up the primary key (non-key attributes) depend on the primary key. Now, second normal form is when the tables meet the criteria of first normal form AND, if the primary key is a composite of attributes (columns), the non-key attributes (columns) must depend on the whole key. Note that any table with a primary key that is composed of a single attribute (column) is automatically in second normal form.
6.3.6 Let’s review what we know about unique identifiers. First of all, every instance of an entity must be identified by a unique value. In an ERD, we use the pound or hash symbol to commonly identify the attribute that is the unique identifier. All unique identifiers must be unique and not null. In most cases a number is typically assigned as a unique identifier. In some cases unique identifiers are composites, this means that they are composed of more than one attribute, or by a combination of attributes in relationships, or even by relationships themselves. In this example, a barred unique id is used to show that the unique ID for the ACCOUNT entity is the combination of the ACCOUNT number and BANK number. Because no part of the UID may ever be null, the bar is placed on the solid line near the ACCOUNT entity.
6.3.7 The ACCOUNT side of this ERD shows a second normal form violation. In this case we have a relationship between ACCOUNT and BANK. Each ACCOUNT has a number, balance and date opened as attributes. And, the attribute ‘location’ is included with the ACCOUNT. However, the location is actually an attribute of BANK. In this case the attribute location is not dependent on the ACCOUNT number because the location is only relevant to the BANK. The location really belongs to the BANK id and is dependent upon what BANK the account is held in. By moving this attribute over to the BANK entity, we fix this second normal form violation.
6.3.9
Examine the intersection entity that is created between EVENT and SONG. Because each event can have more than one SONG, and each SONG can be played at more than one EVENT, this is a many-to-many relationship. The relationship was resolved by adding an intersection entity labelled PLAY LIST ITEM. We can identify each PLAY LIST ITEM uniquely by knowing at what EVENT it will be played and what SONG that’s being played. Therefore, we put a small bar at the many side of this relationship to show that the unique identifier for PLAY LIST ITEM is actually the EVENT id and the SONG id combined together. ()Because no part of the unique identifier may ever be null, the bars are placed on the solid lines. Both of these relationships are mandatory at the many side. You will never find an UID bar on a dashed line. Notice the second normal form violation, and how it is corrected, with the event date attribute being moved into the EVENT entity, and the duration attribute being moved into the SONG entity.
6.4.4 Let’s look at third normal form. Third normal form states that a non-UID attribute cannot be dependent on another non-UID attribute. Let’s examine what that means by looking at a violation. Look at the entity CD. CD has the attribute labelled id marked as the unique identifier. CD also has the attributes title, producer and year. Additional attributes provide information on the store from which the CD was purchased, the store name and address. The third normal form violation occurs with the attribute ‘store name’ and ‘store address’ because these attributes are not dependent upon the CD’s id attribute. Rather, they are dependent upon the store, and therefore are better modeled as attributes of a new entity STORE.
6.4.5 Although the CD entity is in 1NF and 2NF, it violates 3NF. To correct this violation, we place store name and store address as attributes under a new STORE entity. We can give STORE its own UID attribute of number and we can build a relationship between CD and STORE. The many-to-one relationship would then read as each CD may be in one and only one STORE, and each STORE may have one or more CDs.() We need to say one more thing about normalization before we move on. The three normal forms are progressive. What this means is you cannot have a database in second normal form unless it is already in first normal form. And, the database can only be in third normal form if it is already in first and second normal form. Some database designs include a fourth and a fifth normal form, but our training stops at third normal form.
Section 7
7.1.5 An arc is a diagramming method used to model two or more mutually exclusive relationships from the same entity. The arc is drawn between the two relationships connecting both of them and demonstrating mutual exclusivity. The arc indicates that any instance of that entity can have only one valid relationship of the relationship in the arc at, any one time. The arc can be drawn between two or more relationships. An arc is sometimes known as an exclusive arc. In programming we call this ‘x-or’ or ‘exclusive or’ type of situation. In this example, the BIILLBOARD can feature only one of the following items at a time: a MOVIE, a PRODUCT ADVERTISEMENT, or a PUBLIC ANNOUNCEMENT. To denote this on the ER model, we use the arc and circles at the junctions to indicate mutual exclusivity.
7.1.8 Let’s examine this example of an arc relationship. This example models a mailing LIST for an email account. The mailing LIST may contain individuals, and it may also contain groups. Each LIST item is either referring to an individual USER or a LIST. It may not refer to both. Notice that the arc is drawn between two relationships. The relationship in between the arc, the relationship that says each LIST item must be contained in one and only one LIST, does not participate in the arc. This is an independent relationship. The two relationships that participate in the arc are shown by the presence of a circle intersecting the relationships. This page is primarily used to demonstrate how to draw the arc relationship.
7.1.9 This example shows the relationship between a MEMBERSHIP and either a COMPANY or a CUSTOMER. Let’s read the sentences that describe the relationship. In this case, watch the way the relationship is read. Each MEMBERSHIP may be either held by one and only one COMPANY, or held by one and only one CUSTOMER. The relationship is normal from COMPANY to MEMBERSHIP. Each COMPANY may be the holder of one or more MEMBERSHIPs and also from CUSTOMER. Each CUSTOMER may be the holder of one or more MEMBERSHIPs. A MEMBERSHIP may not be held by both a COMPANY and a CUSTOMER. This is the mutual exclusivity that is modeled by the arc. Arcs may be drawn for optional relationships and for mandatory relationships. But, the two relationships participating in the arc must be of the same cardinality.
7.1.11 Arcs are similar to supertypes and subtypes and can sometimes be modeled as such. The difference is that we use supertypes and subtypes when we want to represent classifications or types of things, that have some things in common and some things different. We use arcs to represent mutually exclusive relationships between entities.
7.2.5 It is not uncommon that an entity has a barred relationship to another entity that has a barred relationship to a third entity and so on. This is called a cascading composite UID. Look carefully at this example. The highest level of this hierarchy is BUILDING. The BUILDING has an id, name, and address. A FLOOR is represented by a floor number in a particular building, but since every BUILDING has a first floor, the FLOOR number is not enough to be a unique identifier. It must be associated with a particular BUILDING. A SUITE is then identified by its number, the FLOOR number containing the SUITE and the BUILDING id. In sequence, the ROOM is identified by its id, the SUITE number, the FLOOR number, and the BUILDING id. All four of those attributes are part of the unique identifier for the room.
7.2.6 Next we will review two methods for modeling a hierarchical relationship. The first method is through a strict hierarchy, which shows levels of organization. The second method is called a recursive relationship, which shows a single entity related to itself. This circular relationship is sometimes known in data modeling circles as a ‘pig’s ear.’ The example on the left shows a typical corporate hierarchy. The company president or chief executive officer is at the top of the hierarchy. With directors in the next level and so forth until we find the workers that are at the bottom of the hierarchy. This hierarchy is modeled through an ER diagram showing different levels with many-to-one relationships. More commonly you will see this model inverted with the most volatile entity on the top, in this case ENTRY-LEVEL EMPLOYEE. And the least volatile entity, the entity with the fewest instances at the bottom, in this case PRESIDENT. Notice that this is a chain of many-to-one relationships.
7.2.7 A recursive relationship is often used to model employee relationships. Examine this recursive relationship with PARTNER. Notice PARTNERs have their own attributes: id, first name, and last name. The subtypes are all PARTNERs. In this case each PARTNER (as EVENT PLANNER or DJ) may be managed by one and only one MANAGER and going the other way around, each PARTNER (with the subtype MANAGER) may be the manager of one or more PARTNERs (such as EVENT PLANNER or DJ). Recursive relationships must be optional on both sides. Notice that the line is dashed from one end to the other. You might be asked why. Well, let’s think what would happen to the first and last employees.() If the MANAGER of the event must be supervised by someone, that doesn’t make sense. Or, if a DJ at the bottom of the seniority level must supervise somewhere else, that can’t be true also. The recursive relationship is optional at both ends because someone has to be at the top of the relationship. And someone else has to be at the bottom. In other words, all PARTNERs will have a manager_id except the MANAGER. So, if the entity is PARTNER and the relationship is ‘supervised by’ or ‘supervises’ then the MANAGER will not have a MANAGER. Also, all PARTNERs will not necessarily be MANAGERs, especially those at the bottom of the hierarchy.
7.2.8 This example shows a recursive relationship that is a many-to-many type of relationship. This relationship is common when modeling COMPONENTs. For example, in the automotive industry we could think of a fan as being a COMPONENT. A fan has parts such as, blades, bolts, and pulleys. So the fan would be made up of those other parts. But, a fan is also a COMPONENT or part of an engine. Because this is a many-to-many relationship, we need to add an intersection entity called ASSEMBLY RULE to successfully resolve this many-to-many relationship.
7.3.4 Modeling data over time requires special consideration. When you have attributes that change over time you have to question whether there is a need to keep historical data. Sometimes dates may be modeled as attributes and other times dates must be modeled as separate entities with a relationship to the entity in question. We’ll examine some examples of this. Some considerations must be made when time and dates are being modeled. If dates are stored as attributes, then every update means a loss of historical information. If you choose to model time as separate entities, your model can become more complex. It’s often difficult to join tables together when time conditions are modeled. But, an advantage is that users can work in advance to model a database that can be implemented at a future time.
7.3.5 This example discusses the weekly salary attribute for the EMPLOYEE entity, which would be overwritten when updated. However, with the new guideline that historical salary data be maintained, another entity for the historical salary must be created to hold that information.
7.3.6 To model the historical salary data, the SALARY HISTORY entity is created, and a relationship is made between EMPLOYEE and SALARY HISTORY. Notice that the salary start date is mandatory, and it is in fact part of the unique id, and the salary end date is optional. This allows the database to contain SALARY HISTORYs that are not yet completed. The unique identifier for SALARY HISTORY is the barred relationship to EMPLOYEE; that is, the UID is the EMPLOYEE id and the SALARY HISTORY salary start date since the date and employee id are tied together and may change over time as a new salary is entered.
7.3.7
This example details the way to model historical lending. In this case a JEWELRY PIECE is named and has a code as a unique identifier. It also has attributes of who is renting the piece at any given moment, but not who has rented it, the historical data. Also, a JEWELRY PIECE may have one or more MOVIE STARs who rent the piece depending upon what time is being examined. Therefore, the model must be revised to a many-to-many relationship.