Normalization Example
University Enrollment Database
When thinking about the logical normalization process we first look at all of the data required to accomplish a task.
Now consider the following object (data file or table), named ENROL, that contains the data fields (attributes) required to enroll the student in a class. (Note: this object contains all data fields whereas the above examples illustrated only the first four fields)
ENROLL (Course_Code, Course_Description, Student_Number, Student_Name, Address, City, Zipcode, Course_level, Course_Start_Date, Course_End_Date, Class_Room_Number, Building_Number, Building_Name, Building_Address, Lecturer_Number, Lecturer_Name, Department_Code, Department_Name)
Consider the following (part of a) database:
EnrollCourse Code / Course Description / Student Number / Student Name
503 / DBMS / 00001
00003
00005 / Rick
Smith
Jones
540 / MIS / 00002
00003
00004 / Fred
Smith
Sterling
Each course code can have any number of students in it, so the students' information makes a group what is called a repeating group. Data cannot be stored or processed in a database when it is in this form. What we must have is one record containing all the data for each student who is enrolled in a class. There can be no "gaps" in the data when stored in a file.
The following table (data file) illustrates the data in First Normal Form (1NF)
EnrollCourse Code / Course Description / Student Number / Student Name
503 / DBMS / 00001 / Rick
503 / DBMS / 00003 / Smith
503 / DBMS / 00005 / Jones
540 / MIS / 00002 / Fred
540 / MIS / 00003 / Smith
540 / MIS / 00004 / Sterling
Converting to 1NF basically requires that we "flatten" the database table given above so that each row (record) contains no repeating groups. Only one value per field can be entered and no "gaps" exist in the data.
A relation is said to be in 1NF if there are no repeating groups of attributes/columns.
It is simply a collection of data fields necessary to complete the job of enrolling the student in class, with each record in the file containing all data necessary data for the enrollment.
The problem with 1NF is that there is redundancy with respect to entering all of the data into a computer for each and every course in which you enroll. For example, your name, address, etc., will have to be entered for each course that you take. If you take four courses, your name will have to be entered four times. Developing a logical method of avoiding the entry of your name four times leads to the definition of what is called Second Normal Form (2NF).
We must next introduce the concept of a "KEY" field. A key field is the one that is used to uniquely identify each record in a data file.
For example, the Student_Number field can be used to uniquely identify each student's record in a student table. However, since one student may be enrolled in more than one class each quarter, the Student_Number field alone is not sufficient to uniquely identify each record in the ENROL file illustrated above. The combination of the Student_Number field and the Course_Code field forms a unique combination and can therefore be considered as the key field for the ENROLL table.
A relation is in 2NF if, and only if, it is in 1NF and every non-key attribute (field) is fully functionally dependent upon the key field. This means that all data attributes (fields) that are not used to uniquely identify records or rows in a table should not be entered into the database more than once. Any non-identifying data fields should be placed into separate objects (files). For example, we could remove the name, address, etc. columns into a relation named STUDENT and remove them from the ENROLL object. The result will give two relations (tables):
STUDENT(Student_Number, Student_Name, Address, City, State, Zipcode)
ENROLL (Student_Number, Course_Code, Course_Description, Course_level, Course_Start_Date, Course_End_Date, Class_Room_Number, Building_Number, Building_Name, Building_Address, Lecturer_Number, Lecturer_Name, Department_Code, Department_Name)
Here we see that the student name, address, etc., are functionally dependent upon the student number in the STUDENT file, and that the course description, start date, building name, etc., are functionally dependent upon the Student Number and the Course Code in the ENROLL file.
The relation between these objects (files) is said to be in 2NF. The relation is the logical linkage between the files so that all data necessary to enroll students in courses is available and may be uniquely retrieved when necessary.
While getting the data files into 2NF is better than 1NF, there are still some problems with the form. For example, if the location of the course (class room) changes buildings, all records in the ENROLL file for that course will have to be updated. The building name and address are "transitively dependent" upon the building number. Resolving the "transitive dependency" leads us to Third Normal Form (3NF).
A relation is in 3NF if, and only if, it is in 2NF and no non-key fields are transitively dependent upon the key field(s). That is, one non-key attribute cannot be functionally dependent upon another non-key field.
Our example is clearly not in 3NF since the building name (non-key field) depends upon the building number (non-key field). The relation can be resolved into 3NF by dividing it into component relations, each meeting 3NF form.
Not only this, even the class description, start time, and start date are transitively dependent upon the class code, which is not considered a key field here because it forms only part of the key field for the ENROL table. We can also recognize that Lecturer name is functionally dependent upon the Lecturer code, which is not a key field. The building code and Lecturer code fields are not key fields because they are not used to uniquely identify each record in the ENROLL file.
ENROLL (Student_Number, Course_Code)
BUILDING (Building_Number, Building_Name, Building_Address)
COURSE (Course_Code, Course_Description, Course_level, Course_Start_Date, Course_End_Date, Class_Room_Number, Building_Number, Lecturer_Code)
LECTURER (Lecturer_Code, Lecturer_Name, Department_Code, Department_Name)
PROGRAM (Program_Code, Program_Description)
STUDENT(Student_Number, Student_Name, Address, City, State, Zipcode)
Note also that the LECTURER relation is not in 3NF since the Department Name is transitively dependent upon the Department Code. We resolve this into:
LECTURER (Lecturer_Code, Lecturer_Name, Department_Code)
DEPARTMENT (Department_Code, Department_Name, Dept_Head)
This exercise illustrates that you must consider ALL relationships within the organization’s database and resolves ALL relations into 3NF. This can take some time and effort. A very important point here is that no data may be lost during the normalization process. We must always be able to reconstruct the original data after the normalization. To lose data will cause problems and will be the result of an invalid normalization process.