Business Database Systems Name:

PROFESSOR CHEN

ENTITY AND NORMALIZATION

I.ENTITY

An entity is something about which we store data. It may be tangible object such as employee, a part, a customer, a machine tool, or an office. It may be intangible, such as a job title, a profit center, an association, a financial allowance, a purchase, an estimate, or an insurance claim.

In analyzing information we study the entities of the enterprise in question. A typical corporation has several hundred entity types. Its set of entity types does not change much as time goes by unless the corporation moves into a fundamentally different type of business.

An entity has various attributes which we wish to record, such as size, value, date, color, usage code, address, quality, performance code, and so on. Often in data processing we are concerned with a collection of similar entities, such as employees, and we wish to record information about the same attributes of each of them.

An information specialist commonly maintains a record about each entity, and a data item in each record relates to each attributes. Similar records are grouped into files. The result is a two-dimensional array, or sometime is referred to as a flat file.

II.NORMALIZATION

The use of a good DBMS does not, by itself, give us the protection we need. We also need good logical design of the data structures used. Unless controlled, system analysts tend to design records that group together any collection of data items which they perceive as being useful. All manners of anomalies can arise because of inappropriate grouping of data items (i.e., fields or attributes).

The term normalized implies that the data has a correct logical structure representing the inherent properties of the data so that the data can be used for multiple types of applications.

If the data in a database is not correctly normalized, a new application may not be able to use the data in the database. It would be too expensive to restructure the data because that would require the rewriting of applications that use the data. In this situation a new file may be built for the application. Many new applications have the same problem, so many new files are built and the database concept breaks down. Correctly normalized data, then are desirable for the concept of shared database to work.

The data model, if correctly designed, is likely to be stable even though the procedures that use the model change frequently. When data is consolidated in database systems, data modeling is the key to success. The data structures become more complex, but the data flows are greatly simplified.

Until recently, database specialists have lacked a comprehensive technique for logical database design. As a result, database design has historically been an intuitive and often haphazard process.

How can we provide a foundation for logical database design?

The term normalization of data refers to the way data items are grouped together into record structures. Third normal form (3NF) (or fourth normal form, 4NF) is a grouping of data designed to avoid the anomalies and problems that can occur with data.

Data exists in real life as groups of data items. It exists on invoices, weigh bills, tax forms, driving licenses, and so on. These groupings are usually not in a normalized form. Not surprisingly, systems analysts have often implemented computer records that are also not normalized. However, data that is not normalized can lead to various subtle problems in the future.

Experience has shown that when computer data is organized in 3NF (or 4NF), the resulting data structures are more stable and able to accommodate change. Each attribute relates to its own entity and is not mixed up with attributes relating to different entities. The actions that create and update data can then be applied with simple structured design to one normalized record at a time.

Reacting to the perceived benefits from normal forms, some corporations have incorporated into their database standards manuals the requirement that all database structures be designed in third (or fourth) normal form. The physical implementation may occasionally deviate from 3NF (or 4NF) if the trade-off is fully explored and documented.

Usually, normalized data is better in terms of machine requirements as well as in logical structuring, but this is not always the case. Sometimes the physical designer (e.g., database administrator) finds it desirable to deviate from 3NF (or 4NF). A compromise is then needed. Which is preferable: somewhat better machine performance, or better protection from maintenance cost? Usually, the potential maintenance costs are much the more expensive.

In summary, the basic ideas of the normalization or data are simple, but the ramifications are many and subtle, and vary from one type of database usage to another. It is important to note that normalization describes the logical representation of data, not the physical. There are multiple ways of implementing it physically.

Remove Separate nonkey Remove

repeating attributes that are Transitive

Unnormalized group dependent on partial key Dependency

Relation ------> 1NF ------> 2NF ------> 3NF

(unstable) (unstable) (less stable) (stable)

Entity and Normalization - 1

The Normalization of Data

------

<----| Unnormalized Data |

| | (Def: records with repeating groups) |

| ------

| | 1. Decompose all nonflat data structures

| | into two-dimensional records.

| |

| V

| ------

| | First Normal Form (1NF) |

| | (Def: records with no repeating groups) |

| ------

| | 2. For records whose keys have more than one data item,

| | ensure that all other data items are dependent on the

| | whole key. Spilt the records, if necessary, to achieve this.

| |

| V

| ------

| | Second Normal Form (2NF) |

| | (Def: all nonkey attributes fully functionally |

| | dependent on the entire primary key) |

| ------

| | 3. Remove all the transitive dependencies,

| | splitting the record, if necessary, to achieve this.

| |

| V

| ------

| | Third Normal Form (3NF) |

| | (Def: all nonkey attributes fully functionally dependent |

| | on the entire primary key and nontransitively |

| | dependent on each candidate key) |

| ------

| | 4. Remove any conditional dependencies and/or

| | separate multi-valued dependencies, splitting

| | the records, if necessary, to achieve this

| V

| ------

| | Fourth Normal Form (4NF) |

| | (Def: a minor variant of 3NF, which is often ignored) |

| |------

|

| 5. Make every determinant is a candidate key

| ------

V | Boyce-Codd Normal Form (BCNF) |

--->| (Def: 1NF and if and only if every determinant is a candidate key |

------

(Every attribute in a record is dependent on the key, the whole key, and

nothing but the key.)

Note that a determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.

Entity and Normalization - 1