Revision of Key Database Concepts
(Part 1 of 3)
Why Database Systems?
Data
One of the most important resources in ALL organisations.
Without information, hence data, how could we:
- Control manufacturing processes?
- Process sales of goods?
- Diagnose patients’ illnesses?
- Forecast sales?
- Run the University of Teesside?
Information
To be useful, information must be:
- Accurate
- Timely
- Relevant
Therefore, you need adequate facilities for:
- Storing (and verifying) data
- Manipulating data
- Extracting data
Traditional Approach
From the earliest days computers were used to store files of information.
Separate systems, ie separate files and programs, were developed for each application, eg payroll files, personnel files, accounts files, etc.
Problems
- Inconsistency
- Redundancy
- Lack of integration and control
Solution?
The Database Approach
Instead of having separate files for separate applications, data are organised into a single set of underlying files from which the applications draw the data that are relevant to them.
What is a Database?
“A shared collection of logically related data, and a description of this data, designed to meet the information needs of an organisation.”
‘Database Systems’ by Connolly & Begg
Addison-Wesley, ISBN 0-201-70857-4
“A database system can be thought of as a computerised record-keeping system. Such a system involves the data itself (stored in the database), hardware, software and – most important! – users.
… Databases are integrated and (usually) shared; they are used to store persistent data.”
‘An Introduction to Database Systems’ by
C J Date, Addison-Wesley, ISBN 0-201-38590-2
(7th Edition, 2000)
What is a Database Management System (DBMS)?
“A software system that enables users to define, create, maintain, and control access to the database.”
- Provides the interface between the user and the data in the database.
- Allocates storage to data and maintains indices so that any required data can be retrieved.
- Protects data against unauthorised access.
- Safeguards data against corruption.
- Provides recovery and restart facilities after a hardware or software failure.
Advantages of the Database Approach
- No unnecessary duplication of data.
- Greater consistency of data.
- Wider availability of data.
- Greater flexibility of use of data.
- Improved data integrity.
- Improved security.
- Improved backup and recovery services.
- Can change the data structure without altering associated programs.
- A database is dynamic: it can grow and change.
- Data management can be more consistent and systematic.
The Three Level ANSI-SPARC Architecture
In 1975 the ANSI Standards Planning and Requirements Committee proposed a standard terminology and general architecture for database systems. The objective is to separate each user’s view of the database from the way it is physically represented.
3 levels or views of data within a database:
- External Level
The users’ view of the database. Also known as the applications view.
- Conceptual Level
The overall view of the database. Also known as the global view.
- Internal Level
The physical representation of the database on the computer. Also known as the storage view.
Schemas
The overall description of the database is called the databaseschema. There are 3 different types of schema in the database.
External Schema
There are multiple external schemas (or subschemas), each one corresponding to a different view of the data.
Conceptual Schema
There is one conceptual schema, which describes the data stored in the database, the relationships and the integrity constraints.
Internal Schema
There is one internal schema, which describes how the data are stored in the database and how they are accessed.
Mapping
Provides the translation between the schemas at different levels. The DBMS is responsible for mapping between the 3 types of schema.
The DBMS must ensure that each external schema is derivable from the conceptual schema.
The DBMS must use the information in the conceptual schema to map between each
external schema and the internal schema.
Data Independence
A major objective for the 3-level architecture is to provide data independence, ie upper levels must be unaffected by changes to lower levels.
There are 2 kinds of data independence:
Logical data independence refers to the immunity of the external schemas to changes in the conceptual schema.
It should be possible to alter tables, columns or relationships without having to alter existing external schemas or rewrite application programs (other than those that are directly affected).
Physical data independence refers to the immunity of the conceptual schema to changes in the internal schema.
It should be possible to alter file organizations, storage devices, indexes, etc, without having to alter the conceptual or external schemas.
The System Catalogue
The database schema is defined using a special language called a Data Definition Language (DDL).
The result of the compilation of the DDL statements is a set of tables stored in special files collectively called the system catalog.
This is a repository of meta-data (data about data), ie information describing the data in the database, typically containing the name, description, source and usage information for each data item.
The system catalog is also known as the datadictionary or the data directory.
Database Languages
- A Data Definition Language (DDL) is used to specify the data in the database.
- A Data Manipulation Language (DML) is used to access the data.
- A Data Control Language (DCL) is used to control access to the data.
Some databases have a combined DDL, DML and DCL (often called a Query Language), eg SQL.
Types of Database
5 main logical structures (in terms of how data are organised, stored and manipulated):
1.Hierarchical
2.Network
3.Relational
4.Object-oriented
5.Object-relational
1
Revision of Key Database Concepts