CS 426 - Advanced Database
CS 426 - Advanced Database
DBMS (Database Management System)
A database-management system (DBMS) is
- A collection of interrelated data and
- A set of programs to access those data.
- An environment that is both convenient and efficient to use
Database System Applications
Databases are widely used. Here are some representative applications:
Enterprise Information
•Sale
•Accounting
•Human resources
•Manufacturing
•Online Retailers
Banking and Finance
•Banking
•Credit Card Transactions
•Finance
Universities
Airlines
Telecommunications
Data Manipulation Language
A data-manipulation language (DML)is a language that enables users to access or manipulate data as organized by the appropriate data model. The types of access are:
- Retrieval of information stored in the database
- Insertion of new information into the database
- Deletion of information from the database
- Modification of information stored in the database
Example:
selectinstructor.name frominstructor
whereinstructor.dept name = ’History’;
Data Definition Language
To specify a database schema by a set of definitions expressed by a special language called a data-definition language (DDL).
The DDL is also used to specify additional properties of the data.
Example:
create table department
(dept_name char (20),
building char (15),
budget numeric (12,2));
Relational Database
A relational database is based on the relational model and uses a collection of tables to represent both data and the relationships among those data.
Each table has multiple columns and each column has a unique name.
Example:
Structured Query Language
SQL can define the structure of the data, modify data in the database, and specify security constraints.
The SQL language has several parts:
• Data-definition language (DDL). The SQL DDL provides commands for defining relation schemas, deleting relations, and modifying relation schemas.
• Data-manipulation language (DML). The SQL DML provides the ability to query information from the database and to insert tuples into, delete tuples from, and modify tuples in the database.
• Integrity. The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. Updates that violate integrity constraints are disallowed.
• View definition. The SQL DDL includes commands for defining views.
• Transaction control. SQL includes commands for specifying the beginning and ending of transactions.
• Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements can be embedded within general-purpose programming languages, such as C, C++, and Java.
• Authorization. The SQL DDL includes commands for specifying access rights to relations and views.
SQL Data Types
The SQL standard supports a variety of built-in types, including:
• char(n): A fixed-length character string with user-specified length n. The full form, character, can be used instead.
• varchar(n): A variable-length character string with user-specified maximum length n. The full form, character varying, is equivalent.
• int: An integer (a finite subset of the integers that is machine dependent). The full form, integer, is equivalent.
• smallint: A small integer (a machine-dependent subset of the integer type).
• numeric(p, d):A fixed-point number with user-specified precision. The number consists of p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus, numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored exactly in a field of this type.
• real, double precision: Floating-point and double-precision floating-point numbers with machine-dependent precision.
• float(n): A floating-point number, with precision of at least n digits.
Advanced SQL
1. Accessing SQL From a Programming Language
There are two approaches to accessing SQL from a general-purpose programming language:
• Dynamic SQL:
- Dynamic SQL allows the program to construct an SQL query as a character string at runtime, submit the query, and then retrieve the result into program variables a tuple at a time.
- The dynamic SQL component of SQL allows programs to construct and submit SQL queries at runtime.
• Embedded SQL:
- Like dynamic SQL, embedded SQL provides a means by which a program can interact with a database server.
- However, under embedded SQL, the SQL statements are identified at compile time using a preprocessor.
- The preprocessor submits the SQL statements to the database system for pre-compilation and optimization; then it replaces the SQL statements in the application program with appropriate code and function calls before invoking the programming-language compiler.
2. Functions and Procedures
Procedures and functions allow “business logic” to be stored in the database, and executed from SQL statements.
Example:
create function dept count(dept name varchar(20))
returns integer
begin
declare d count integer;
select count(*) into d count
from instructor
where instructor.dept name= dept name
return d count;
end
3. Triggers
A trigger is a statement that the system executes automatically as a side effect of a modification to the database. To design a trigger mechanism, we must meet two requirements:
1. Specify when a trigger is to be executed. This is broken up into an event that causes the trigger to be checked and a condition that must be satisfied for trigger execution to proceed.
2. Specify the actions to be taken when the trigger executes.
Object Models
An object typically has two components: state (value) and behavior (operations).
One goal of an ODMS (Object Data Management System) is to maintain a direct correspondence between real-world and database objects so that objects do not lose their integrity and identity and can easily be identified and operated upon.
define type EMPLOYEE
tuple ( Fname: string;
Minit: char;
Lname: string;
Ssn: string;
Birth_date: DATE;
Address: string;
Sex: char;
Salary: float;
Supervisor: EMPLOYEE;
Dept: DEPARTMENT;);
define type DATE
tuple ( Year: integer;
Month: integer;
Day: integer; );
Object Database Modeling Group
The ODMG object model is the data model upon which the object definition language (ODL) and object query language (OQL) are based.
It is meant to provide a standard data model for object databases, just as SQL describes a standard data model for relational databases.
It also provides a standard terminology in a field where the same terms were sometimes used to describe different concepts.
Object Definition Language
The ODL is designed to support the semantic constructs of the ODMG object model and is independent of any particular programming language.
Its main use is to create object specifications—that is, classes and interfaces. Hence, ODL is not a full programming language.
A user can specify a database schema in ODL independently of any programming language, and then use the specific language bindings to specify how ODL constructs can be mapped to constructs in specific programming languages, such as C++, Smalltalk, and Java.
Object Query Language
The object query language OQL is the query language proposed for the ODMG object model.
It is designed to work closely with the programming languages for which an ODMG binding is defined, such as C++, Smalltalk, and Java.
Hence, an OQL query embedded into one of these programming languages can return objects that match the type system of that language.
Additionally, the implementations of class operations in an ODMG schema can have their code written in these programming languages.
The OQL syntax for queries is similar to the syntax of the relational standard query language SQL, with additional features for ODMG concepts, such as object identity, complex objects, operations, inheritance, polymorphism, and relationships.
Transaction
Collections of operations that form a single logical unit of work are called transactions.
A database system must ensure proper execution of transactions despite failures—either the entire transaction executes, or none of it does.
Furthermore, it must manage concurrent execution of transactions in a way that avoids the introduction of inconsistency.
A transaction is a unit of program execution that accesses and possibly updates various data items.
Usually, a transaction is initiated by a user program written in a high-level data-manipulation language (typically SQL), or programming language (for example, C++, or Java), with embedded database accesses in JDBC or ODBC.
Begin Transaction and End Transaction
A transaction is delimited by statements (or function calls) of the form begin transaction and end transaction.
The transaction consists of all operations executed between the begin transaction and end transaction.
Example
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as:
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Properties of Transactions
Atomicity - Either all operations of the transaction are reflected properly in the database, or none are.
Consistency - Execution of a transaction in isolation (that is, with no other transaction executing concurrently) preserves the consistency of the database.
Isolation - Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system.
Durability - After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
States of Transaction
A transaction must be in one of the following states:
Active, the initial state; the transaction stays in this state while it is executing.
Partially committed, after the final statement has been executed.
Failed, after the discovery that normal execution can no longer proceed.
Aborted, after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction.
Committed, after successful completion.
States Diagram of Transactions
- We say that a transaction has committed only if it has entered the committed state.
- Similarly, we say that a transaction has aborted only if it has entered the abortedstate.
- Atransaction is said to have terminated if it has either committed or aborted.
- A transaction starts in the active state.
- When it finishes its final statement, itenters the partially committed state. At this point, the transaction has completedits execution, but it is still possible that it may have to be aborted, since the actualoutput may still be temporarily residing in main memory, and thus a hardwarefailure may preclude its successful completion.
- The database system then writes out enough information to disk that, even inthe event of a failure, the updates performed by the transaction can be re-createdwhen the system restarts after the failure.
- When the last of this information iswritten out, the transaction enters the committed state.
Concurrency Control
When several transactions execute concurrently in the database, however, the isolation property may no longer be preserved.
To ensure that it is, the system must control the interaction among the concurrent transactions; this control is achieved through one of a variety of mechanisms called concurrency control schemes.
In practice, the most frequently used schemes are
- Two-phase locking and
- Snapshot isolation.
Lock Based Protocols
One way to ensure isolation is to require that data items be accessed in a mutually exclusive manner; that is, while one transaction is accessing a data item, no other transaction can modify that data item.
The most common method used to implement this requirement is to allow a transaction to access a data item only if it is currently holding a lock on that item.
Timestamp Based Protocols
The locking protocols that we have described thus far determine the order between every pair of conflicting transactions at execution time by the first lock that both members of the pair request that involves incompatible modes.
Another method for determining the serializability order is to select an ordering among transactions in advance.
The most common method for doing so is to use a timestamp-ordering scheme.
The Two Phase Locking Protocols
One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that each transaction issue lock and unlock requests in two phases:
◦1. Growing phase. A transaction may obtain locks, but may not release any lock.
◦2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests.
Snapshot Isolation
Snapshot isolation involves giving a transaction a “snapshot” of the database at the time when it begins its execution.
It then operates on that snapshot in complete isolation from concurrent transactions.
The data values in the snapshot consist only of values written by committed transactions.
This isolation is ideal for read-only transactions since they never wait and are never aborted by the concurrency manager.
Transactions that update the database must, of course, interact with potentially conflicting concurrent update transactions before updates are actually placed in the database.
Updates are kept in the transaction’s private workspace until the transaction successfully commits, at which point the updates are written to the database.
When a transaction T is allowed to commit, the transition of T to the committed state and the writing of all of the updates made by T to the database must be done as an atomic action so that any snapshot created for another transaction either includes all updates by transaction T or none of them.
Recovery System
A computer system, like any other device, is subject to failure from a variety of causes: disk crash, power outage, software error, a fire in the machine room, even sabotage.
In any failure, information may be lost. Therefore, the database system must take actions in advance to ensure that the atomicity and durability properties of transactions, are preserved.
An integral part of a database system is a recovery scheme that can restore the database to the consistent state that existed before the failure.
Failure Classification
There are various types of failure that may occur in a system, each of which needs to be dealt with in a different manner.
We shall consider only the following types of failure:
• Transaction failure. There are two types of errors that may cause a transaction to fail:
◦Logical error. The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not found, overflow, or resource limit exceeded.
◦System error. The system has entered an undesirable state (for example, deadlock), as a result of which a transaction cannot continue with its normal execution. The transaction, however, can be re-executed at a later time.
• System crash. There is a hardware malfunction, or a bug in the database software or the operating system, that causes the loss of the content of volatile storage, and brings transaction processing to a halt. The content of nonvolatile storage remains intact, and is not corrupted.
The assumption that hardware errors and bugs in the software bring the system to a halt, but do not corrupt the nonvolatile storage contents, is known as the fail-stop assumption. Well-designed systems have numerous internal checks, at the hardware and the software level that bring the system to a halt when there is an error. Hence, the fail-stop assumption is a reasonable one.
• Disk failure. A disk block loses its content as a result of either a head crash or failure during a data-transfer operation. Copies of the data on other disks, or archival backups on tertiary media, such as DVD or tapes, are used to recover from the failure.
Week-7
Revision From Week-1 to Week-6
TEST-1
Week-8
Database System Architecture:
- The architecture of a database system determines its capability, reliability, effectiveness and efficiency in meeting user requirements.
- But besides the visible functions seen through some data manipulation language, a good database architecture should provide:
- a) Independence of data and programs
- b) Ease of system design
- c) Ease of programming
- d) Powerful query facilities
- e) Protection of data
Centralized System:
- Run on a single computer system and do not interact with other computer systems.
- General-purpose computer system: one to a few CPUs and a number of device controllers that are connected through a common bus that provides access to shared memory.
- Single-user system (e.g., personal computer or workstation): desk-top unit, single user, usually has only one CPU and one or two hard disks; the OS may support only one user.
- Multi-user system: more disks, more memory, multiple CPUs, and a multi-user OS. Serve a large number of users who are connected to the system vie terminals. Often called server systems.
Fig: A Centralized Computer System
Client Server Systems:
- A centralized systems act as server systems that satisfy requests generated by client systems.
Fig: General Structure for a Client-Server System
- Database functionality can be divided into:
- Back-end: manages access structures, query evaluation and optimization, concurrency control and recovery.
- Front-end: consists of tools such as forms, report-writers, and graphical user interface facilities.
- The interface between the front-end and the back-end is through SQL or through an application program interface.