Section Title Page 45
UNIT 2: Fundamentals of Relational and Object Model
SCOPE
Context
The invention of the relational database is largely attributed to Edward Codd who presented the idea in a seminal paper written in 1970. The relational database has an underlying mathematical basis which has been important in its development and acceptance. The aim of this unit is to explain the basic principles underlying the Relational and Object Model of database systems. The first prototypes of relational systems were created in the early seventies, and the first commercial relational system appeared on the market in the early eighties. It has now become a de facto standard for data storage and manipulation. Currently all of the major suppliers of database systems, such as Oracle, IBM, Sybase, Informix etc. base their products on the relational model, however, they support object model as well.
The importance of Object model can not be ignored due to emergence of large and very complex data. The concepts concerning what constitutes a relational database system are well understood and the robustness of the relational model due to the widespread adoption over the last 15 years by the commercial world. However, the concepts and technology that constitute an object model arestill emerging and are considered a priority in the database research domain.
In this unit, we will illustrate the structural aspects of the relational and object models, that is, the way they are used to organise data. We show how the concept of relation and object can be used to represent the information in a database. Then, we briefly discuss the techniques for the representation of incomplete information and we examine the integrity constraints, which allow the specification of additional properties that must be satisfied by the database. The unit also discusses the algebraic operations that are core to the relational and object database query.
Relationship to Other Units
The concept of relational and object model underpins most of the major database systems in use today including the database applications on the Web. Therefore, a clear understanding of the principles described in this unit is fundamental to all of the following units. Unit 3 describes the techniques of modelling on the basis of the concepts described in this unit. Unit 4 on Structured Query Language (SQL), the language used in real systems for the definition, updating and querying of relational databases completes the topic on relational and object model.
Learning Outcomes
Having successfully completed this unit you should be able to:
· Analyse the structure of the relational model and illustrate basic concepts of the relational model, such as table, relation, attribute, null value, primary key and foreign key.
· Apply integrity constraints and referential constraints to retain business data quality.
· Explain the operators used in Relational Algebra and apply Relational Algebra to express queries on Relational databases.
· Illustrate the concepts of objects, identifier, abstract data type, methods, class and inheritance in the context of object oriented database management
· Define the operators used in object oriented database query
Required Study Time
You should plan to spend approximately 18 hours studying this unit. You may find it convenient to break up your study as follows:
Activity / TimePreparation (Introduction and On-line Planning): / 0.5 hrs
Disk-based Content: / 4 hrs
Application: / 4 hrs
Set textbook Content: / 4 hrs
Thinking (On-line discussions, Review questions): / 4 hrs
Tutorial Work: / 1.5 hrs
Related Course Work: / 0 hrs
Total: / 18 Hours
Equipment/Software Required
· A Web browser – for browsing Web sites and Web-based database applications. Internet Explorer 5.0 is recommended.
· Word Processor, i.e. Microsoft Word
Learning Journal
You will be expected to keep a learning journal throughout this module. It will help, if you keep a record of new/difficult programming, unusual rules and lessons learnt from the activities. You can refer to your Learning Journal at any point.
Reading Materials
· Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison-Wesley, 4th Edition, ISBN: 0321210255;
CONTENT
2.1 Structure of the Relational Model
The relational model was proposed by E. F. Codd in 1970. It deals with database management from an abstract point of view. The model provides specifications of an abstract database management system. The precision is derived from solid theoretical foundation that includes predicate calculus and theory of relations. To use the database management systems based on the relational model, however, users do not need to master the theoretical foundations. Codd defined the model as consisting of the following three components:
- Data Structure - a collection of data structure types for building the database.
- Data Manipulation - a collection of operators that may be used to retrieve, derive or modify data stored in the data structures.
- Data Integrity - a collection of rules that implicitly or explicitly define a consistent database state or changes of states. We discuss the three components.
Data Structure
Often the information that an organisation wishes to store in a computer and process is complex and unstructured. For example, we may know that a school in a university has 200 students, most are full-time with an average age of 22 years, and most are female. Since natural language is not a good language for machine processing, the information must be structured for efficient processing. In the relational model the information is structured in a very simple way.
The beauty of the relational model is its simplicity of structure. Its fundamental property is that all information about the entities and their attributes as well as the relationships is presented to the user as tables (called relations) and nothing but tables. The rows of the tables may be considered as records and the columns as fields. Each row therefore consists of an entity occurrence or a relationship occurrence. Each column refers to an attribute. The model is called relational and not tabular because tables are a lower level of abstractions than the mathematical concept of relation. Tables give the impression that positions of rows and columns are important. In the relational model, it is assumed that no ordering of rows and columns is defined.
We consider the following database to illustrate the basic concepts of the relational data model.
A student relation is represented by student id, student name and student address and a module relation can be represented by its id, name and school’s name. To establish a link in between these two relations we can create another relation, called enrolment, to show the students who have taken the modules.
The above database could be mapped into the following relational schema which consists of three relation schemes. Each relation scheme presents the structure of a relation by specifying its name and the names of its attributes enclosed in parenthesis. Often the primary key of a relation is marked by underlining.
student(student_id, student_name, address)
enrolment(student_id, module_id)
module(module_id, module_name, school)
An example of a database based on the above relational model is:
Student_id / Student_name / Address9912311
2045332
9812121
9923433
9967576
2011189
9972133 / Julie Macdonald
David Cohn
Julie Marshall
Ahmad Khan
Majeed Naeem
Scott William
Tony Stockman / 22, Platt Hall
22, Platt Hall
33, Usher Hall
21, Sasson Hall
21, Sasson Hall
56, Sasson Hall
127, Writtle House
Table 1. The relation student
student_id / Module_id9912311
2045332
9967576
9812121
9967576
9972133 / BIS2020
BIS2020
BIS2030
ACC2010
STX2000
MUS2000
Table 2: The relation enrolment
module_id / module_name / SchoolBIS2020
BIS2030
MUS2000
STX2000
ACC2010 / Database Management
System Analysis
Jazz Performance II
Applied Statistics
Accounting for Bus. / Computing
Computing
Art
Business
Business
Table 3: The relation module
We list a number of properties of relations:
- Each relation contains only one record type.
- Each relation has a fixed number of columns that are explicitly named. Each attribute name within a relation is unique.
- No two rows in a relation are the same.
- Each item or element in the relation is atomic, that is, in each row, every attribute has only one value that cannot be decomposed and therefore no repeating groups are allowed.
- Rows have no ordering associated with them.
- Columns have no ordering associated with them (although most commercially available systems do).
The above properties are simple and based on practical considerations. The first property ensures that only one type of information is stored in each relation. The second property involves naming each column uniquely. This has several benefits. The names can be chosen to convey what each column is and the names enable one to distinguish between the column and its domain. Furthermore, the names are much easier to remember than the position of each column if the number of columns is large.
The third property of not having duplicate rows appears obvious but is not always accepted by all users and designers of DBMS. The property is essential since no sensible context free meaning can be assigned to a number of rows that are exactly the same.
The next property requires that each element in each relation be atomic that cannot be decomposed into smaller pieces. In the relation model, the only composite or compound type (data that can be decomposed into smaller pieces) is a relation. This simplicity of structure leads to relatively simple query and manipulative languages.
A relation is a set of tuples. As in other types of sets, there is no ordering associated with the tuples in a relation. Just like the second property, the ordering of the rows should not be significant since in a large database no user should be expected to have any understanding of the positions of the rows of a relation. Also, in some situations, the DBMS should be permitted to reorganise the data and change row orderings if that is necessary for efficiency reasons. The columns are identified by their names. The tuples on the other hand are identified by their contents, possibly the attributes that are unique for each tuple.
The relation is a set of tuples and is closely related to the concept of relation in mathematics. Each row in a relation may be viewed as an assertion. For example, the relation student asserts that a student by the name of Majed Naeem has student_id 9967576 and lives at 21, Sasson Hall. Similarly the relation module asserts that one of the modules offered by the School of Computing is BIS2020 Database Management.
Each row also may be viewed as a point in a n-dimensional space (assuming n attributes). For example, the relation enrolment may be considered a 2-dimensional space with one dimension being the student_id and the other being module_id. Each tuple may then be looked at as a point in this 2-dimensional space.
In the relational model, a relation is the only compound data structure since relation does not allow repeating groups or pointers.
We now define the relational terminology:
Relation / essentially a tableTuple / a row in the relation
Attribute / a column in the relation
Degree of a relation / number of attributes in the relation
Cardinality of a relation / number of tuples in the relation (N-ary relations- a relation with degree N)
Domain / a set of values that an attribute is permitted to take. Same domain may be used by a number of different attributes
Primary key / as discussed each relation must have an attribute (or a set of attributes) that uniquely identifies each tuple
Candidate key
Each such attribute (or a set of attributes) is called a candidate key of the relation if it satisfies the following properties:
· the attribute or the set of attributes uniquely identifies each tuple in the relation (called uniqueness), and
· if the key is a set of attributes then no subset of these attributes has property (a) (called minimality).
There may be several distinct set of attributes that may serve as candidate keys. One of the candidate keys is arbitrarily chosen as the primary key of the relation.
The primary key of the relation student is student_id, of relation enrolment is (student_id, module_id), and finally the primary key of relation module is module_id. The relation student probably has another candidate key. If we can assume the names to be unique than the student_name is a candidate key. If the names are not unique but the names and address together are unique, then the two attributes (student_id, address) is a candidate key. Note that both student_id and (student_id, address) cannot be candidate keys, only one has the ability to be a candidate key. Similarly, for the relation module, module_name would be a candidate key if the module names are unique.
Foreign key
In the relational model, information about entities and relationships are represented in the same way i.e. by relations. Since the structure of all information is the same, the same operators may be applied to them. We should note that not all relationship types need to be represented by separate relations. In many cases, the relationship can be established by propagating keys. Key propagation involves placing the primary key of one entity within the relation of another entity. However, many-to-many (m:m) relationships do require a separate relation to represent the association.
This can be illustrated further by the following example:
Let the relationship occupies be many-to-one, that is, one staff may occupy only one room but a room may be occupied by more than one staff. Let the attributes of the entity staff be staff number ( s_num), staff name ( name) and status. Attributes of the entity room are room number ( r_num), capacity, building. occupies has no attributes. s_num and r_num are the primary keys of the two entities staff and room.
One way to represent the above database is to have the following three relations:
staff(s_num, name, status)
occupies(s_num, r_num)
room(r_num, capacity, building)
There is of course another possibility. Since the relationship occupies is many-to-one, that is, each staff occupies only one room, we can represent the relationship by including the primary key of relation room in the entity staff. This is propagation of the key of the entity room to entity staff. We then obtain the following database: