Topic: the Relational Database Model and Database Development Process

IST459: Notes: The Relational Database Model and Database Development Process¶

Table of Contents [Hide/Show]
Learning Unit 02 Notes - The Relational Database Model and Database Development Process
Learning Objectives
Part 1: The Relational Database Model
Letâ€™s start with a little history lessonâ€¦
In Theory... and in Practice
Part 2: Components of the Relational Model
Relations
Characteristics of Relations
Okayâ€¦what makes a table a relation?
Domains in theory and practice
Implementing Logical Domain in a DBMS
Keys
In practice, what makes a good primary key?
Null and Flags
Integrity Rules
Part 3: Database Development
Systems Development Lifecycle
Database Development Lifecycle
Various Systems Development Lifecycle Strategies
An alternative development model â€“ Prototyping

Topic: The Relational Database Model and Database Development Process

Table of Contents

Topic: The Relational Database Model and Database Development Process 1

Learning Objectives 2

Part 1: The Relational Database Model 2

Let’s start with a little history lesson… 2

In Theory... and in Practice 3

Part 2: Components of the Relational Model 3

Relations 3

Characteristics of Relations 3

Okay… But what makes a table a relation? 4

Domains in theory and practice 4

Implementing Logical Domain in a DBMS 5

Keys 6

In practice, what makes a good primary key? 7

Null and Flags 8

Integrity Rules 8

Part 3: Database Development 9

Systems Development Lifecycle 9

Database Development Lifecycle 10

Various Systems Development Lifecycle Strategies 11

An alternative development model - Prototyping 11

Learning Objectives¶

In this learning unit we will learn the about the relational database model and how it is put into practice on the modern DBMS. We will also explore the models and intricacies of database design and implementation. Some of these objectives will be covered in this document, others in the class lecture, assigned readings, and labs.

· Describe the Relational Model

· Define relational terms and understand the terminology in practice.

· Explain the System Development Life Cycle (SDLC)

· Explain the Database Life Cycle (DBLC)

· Explain how database development fits within the SDLC

· Compare and contrast various database SDLC strategies

Part 1: The Relational Database Model¶

The relational data model was originally conceived in a paper titled “A Relational Model for Large Shared Data Banks” by IBM’s E. F. Codd in 1970. This paper, and its popularity, started a trend towards commercial Relational database management systems (RDBMS) products. These applications were conceived during the “disco era” of the 70’s and early 80’s, and have grown into a $20B industry (Forrester research, 2005) with the major players being Oracle, IBM and Microsoft. Today, the RDBMS is the cornerstone of just about every business application and is used by a wide variety of market segments.

Let’s start with a little history lesson…¶

The relational data model or simply the relational model comes to us from the work of a number of mathematicians and computer researchers. An American mathematician, D.L. Childs, started the movement with his 1968 work the "Description of a Set-Theoretic Data Structures”, which used set theory as the basis for querying data. Childs’ work inspired E.F. “Ted” Codd, a researcher working at the IBM San Jose Research Laboratory to develop, what we now take for granted, the relational data model. In the 1970s Codd went on to build early database management system and query language prototypes predicated on a set-based model. Codd’s early work is documented in a landmark paper, “A Relational Model for Large Shared Data Banks” , where he describes the benefits of the relational model and its significance in managing large data structures. Codd identifies the following advantages of relational structures over the traditional data structures used by file access methods that were commonplace at the time.

1. Data Independence – “Provides a means of describing data with its natural structure only -- that is, without superimposing any additional structure for machine representation poses.”

2. Data Consistency – “A further advantage of the relational view is that it forms a sound basis for treating derivability, redundancy, and consistency of relations…”

3. Ease of Use – “… the relational view permits a clearer evaluation of the scope and logical limitations of present formatted data systems, and also the relative merits (from a logical standpoint) of competing representations of data within a single system.”

In Theory... and in Practice¶

It’s important to differentiate between actual relational theory and that which a relational database management system can accomplish in practice. Generally speaking, a DBMS can implement the elements of the relational model, but can also accomplish much, much more. Why? Flexibility. Not every task is suited to the rigidity of the relational model, and DBMS vendors are trying to make a business offering their customers the most bang for the buck.
Of course, all this flexibility comes at a price. The layperson often misinterprets understanding the intricacies of a software product, such as Microsoft Access, with the nuances of good relational design when in reality the two concepts are mutually exclusive. This allows the layperson to “shoot themselves in the foot” so to speak, and create a poor database design containing redundant data. So the bottom line is if you want to design efficient databases with minimal redundancy, learning the relational model will give you far more mileage than learning a particular DBMS.

FUDGE’S “MORE MILEAGE” STATEMENT:
IN THIS CLASS, WE WILL EMPHASIZE RELATIONAL DATABASE DESIGN OVER THE SPECIFICIS OF A PARTICULAR DBMS APPLICATION. WE WILL ALWAYS DIFFERENTIATE AMONG ELEMENTS OF THE RELATIONAL MODEL AND HOW THOSE ELEMENTS ARE TYPICALLY IMPLEMENTED BY VARIOUS DBMS APPLICATIONS, SUCH AS ORACLE OR MS SQL SERVER.

Part 2: Components of the Relational Model¶

Relations¶

The relational model focuses on a logical representation of data, rather than a physical one. This means that in an implementation, the programmer can focus on what rather than on how. For example, if you were to write a program to collect information from a webpage and save it into a text file, you would be responsible for the data management tasks, such as how the data gets saved in the file. If you were placing the same form data in a relational DBMS; you only need to tell it what to add to the database, and how to add it.
The relational model is based on set theory - you know {c, a, b, s} ∩ {b, a, d, s} = {a, b, s}. There are three main components to the relational model:

1. A structural component - sets of relations, which are tables of data, consisting of rows (tuples) and columns (attributes).

2. A manipulative component of operations which act upon the relations.

3. A set of integrity rules for maintaining integrity within the database.

Characteristics of Relations¶

At the heart of the relational model is the relation, better known in its implementation as a table. Think of a table as a 2 dimensional structure of common data. Each row in the table corresponds to a specific entity while each column corresponds to a particular domain. (The difference between a relation and a table will be discussed below, but for now, you may assume they are synonymous.) All relations:

· Have a unique name among the set of all relations in the database.

· Are 2 dimensional structures consisting of rows and columns of data. The specific order of the rows and columns is irrelevant.

· Each column (or attribute) must have a unique name (within the scope of the relation, not the database).

· Each column (or attribute) is drawn from a domain, or set of possible values from which the actual values are taken.

· Have a column or set of columns which uniquely identify each row in the table.

Okay… But what makes a table a relation?¶

What makes a table a relation? The last mentioned characteristic does - the fact that a column or set of columns must uniquely identify each row. In a DBMS such as MS SQL Server you can create a table where no column (or group of columns) uniquely identifies each row, but if you do, technically it won’t be a relation. Why is this a requirement and why should it matter? Remember, the relational model based on set theory, and all operations on relations are performed over sets of data. In set theory the order of the elements in a set does not matter, so the only means you have to retrieve a specific row is if you can find something which will uniquely identify it. For example, in the figure below, you can’t delete the 2nd Emmitt Smith because there is no way to differentiate it from the 1st row containing the same data.

So, is there some way we can force a table to be a relation, for example, so that we would not be able to add Emmit Smith twice? Yes, we can do this with meta-data that we add to the table itself. It will be discussed in the sections below.

Domains in theory and practice¶

The concept of domain, or set of possible values from which the actual values are taken, can be difficult to implement in practice. Sure, for ordinal values such as GPA domain easy to implement (decimals between 0.000 and 4.000, for example). But what about a column of employee first names? Now despite all the crazy names that are out there, most humans can easily differentiate among what is and isn’t a last name, such as when an address like “One Park Place” is inadvertently added to the name column. But how can we make the computer understand what is and isn’t a name? It’s a problem with a non-trivial solution.
For this reason, we separate domain into two categories. The current definition of domain is called logical domain, or the set of acceptable values. And the DBMS also implements physical domain representing the type of data acceptable in the columns. Physical domain has no place in relational theory, but is required in the DBMS because of how computers sort and compare data encoded in different forms. For example “December” comes before “January” when you compare them as text strings, but when you consider them as dates, January comes before December. To further complicate the issue, each DBMS implementation addresses physical domain, also known as data types! For example to store a number in the DBMS IBM DB2 you could use the data type INTEGER, but in Microsoft SQL server it would be int. Yes, they’re similar, but just different enough to make your life difficult!

Implementing Logical Domain in a DBMS¶

Let’s take a moment to refresh what we’ve learned so far:

· Tables are collections of similar entities. Tables are organized into rows and columns. (cars, for example)

· The entities are stored in the rows of the table and represent single instances of data (for example, my car, or your car)

· An attribute describes one particular facet of that entity. (my car’s color is green, for example)

· A column in the table represents the domain of data for an entity’s attribute ( the car’s color, or mpg rating)

· Domain is represented as both physical and logical domain. Physical domain represents the type of data in the column, logical domain represents the acceptable values. (For example the physical domain of a car’s color would be varchar(20) (in the SQL server DBMS), but the logical domain would be colors. With just physical domain you could enter “Mike” as a color, which would be unacceptable)

To implement logical domain we use constraints. Constraints are rules and conditions that must be met before data can be added or updated. There are techniques you can use to implement logical domain over DBMS tables. These include:

· Default Value Constraint - data used for an attribute when one isn’t specified.

· Check Constraint - an expression which must evaluate to “true” prior to insert or update

· Unique Constraint - a condition that forbids duplicate values are in a column or group of columns.

· Lookup table - a separate table containing all of the acceptable values for a given column. The lookup table is implemented using primary key and foreign key constraints.

PHYSICAL AND LOGICAL DOMAINS ARE METADATA - THEY HELP US DESCRIBE, DEFINE, AND STRUCURE OUR DATA, AS WELL AS HELP ENFORCE BUSINESS RULES.

Keys¶

If Yogi Berra was well versed in the relational model, I have no doubt he would say something like “Keys are the key to understanding relational databases. Yes, it’s true keys are the key!”
A key is a special kind of constraint on an attribute or set of attributes which can be used to lookup other attributes. Keys are used to look-up a set of rows in a table, or find one specific row in a table, and find rows in one table based on the attributes of another table. As you will see, the true power of the relational model comes from keys. Here are the types of keys found in relational databases:

· Super Key – Any combination of attributes which can be used to uniquely identify a row in the table.

· Candidate Key The smallest number of attributes required to uniquely identify each row in a table. For example, in an employee table the name, address, and home phone could be a super key, but the employee social-security number would be a candidate key, because it’s minimal. In practice, single-attribute candidate keys should be enforced with a unique constraint (after all you wouldn’t want two different employee rows with the same social security number, right?)

· Primary Key - A candidate key which has been chosen by the database designer to uniquely identify each row in the table. The primary key cannot contain null (empty) values, and is implemented as a constraint by the DBMS.

· Surrogate Key - A primary key whose values are automatically generated by the DBMS there is no need on the user’s part to enter a value for primary key - the system does it for you. Surrogate keys can be an auto-incremented integer, or a GUID or UUID (Read the RFC). Some DBMSs such as Postgres and Oracle permit the database designer to create global surrogate keys, called sequences.