13

11. Constraints and Indexes

11.1 Background on General Constraints

11.2 Syntax for General Constraints

11.3 What are Indexes?

11.4 Syntax for Indexes

11.5 Graphical User Interface Tools in Microsoft Access

11.1 Background for General Constraints

1. Here is an example of simple SQL syntax to create a table where nothing besides the fields and their types and sizes are defined. Although it is clear which field should be the primary key, it is not specified in the table definition:

CREATE TABLE Person

(SSN TEXT(9),

lastname TEXT(12),

dob DATE)

In a complete table definition it would be desirable to specify the primary key. Remember that the primary key value has to be a unique identifier for each record in the table and no part of the primary key can be null. These requirements together are formally known as entity integrity.

2. It may be desirable to require that other fields in a table besides the primary key be unique or not null. It is possible to have a situation like this: The Person table is redefined so that it has a personid field which is different from the SSN, but the SSN is still included:

CREATE TABLE Person

(personid TEXT(12),

lastname TEXT(12),

dob DATE,

SSN TEXT(9))

In a situation like this, not only would the personid be unique and not null, because it's the key, but it would also be desirable for the SSN to be unique and probably not null.

It is also possible to have a situation where it is possible for a field to have duplicate values in different records, but you don't want to allow null values. For example, in the Person table, you may not wish to allow entries for people who do not have names.

3. Referential integrity defines the requirements for a primary key to foreign key relationship between two tables. Consider this alternative definition of the Person table:

CREATE TABLE Person

(SSN TEXT(9),

lastname TEXT(12),

motherSSN TEXT(9),

dob DATE)

Let there be a Mother table which also has a primary key named SSN. The motherSSN field in the Person table is known as a foreign key and it refers to the SSN field in the Mother table. Referential integrity states that the motherSSN field in the Person table cannot contain values which do not exist in the SSN field in the Mother table.

4. When including additional specifications or conditions in a table definition, these are known generally as constraints. In general, it is also possible to add constraints to table definitions after the tables have been created. If constraints are named, this makes it possible to refer to them later on, in particular, so that they can be removed from the table. There are various forms of the syntax for constraints. Not all of the forms will be shown below, just a consistent set of forms that should be relatively easy to remember.

11.2 Syntax for General Constraints

1. This example shows the syntax for specifying a primary key in a table definition:

CREATE TABLE Person

(SSN TEXT(9),

lastname TEXT(12),

dob DATE,

CONSTRAINT personpkSSN PRIMARY KEY(SSN))

As usual, the keywords are capitalized. The field name SSN happens to be capitalized too in this example, but that is a coincidence. It is a good idea to give the constraint a descriptive name. The name can't have spaces in it.

If the Person table had been created without a primary key to begin with, the primary key specification could be added later by entering this command:

ALTER TABLE Person

ADD CONSTRAINT personpkSSN PRIMARY KEY(SSN)

You would only want to try and add a primary key constraint later if you had already entered data values into that field for all records, and the data values in each record were unique.

If it were desirable to remove the primary key constraint specified for the Person table above, it could be done as shown below. Note that in order to do this the constraint had to have been named in the first place:

ALTER TABLE Person

DROP CONSTRAINT personpkSSN

It would be unusual for the primary key field of a table to change over its lifetime, but anything is possible. If there were already a primary key field specified, you would want to make sure that the values in the new primary key field were set up first, then remove the old primary key constraint, and then specify the new primary key constraint.

2. Recall that it is possible to have a table with a concatenated key field. This means that the unique identifier for a record in the table is the combination of the values of two different fields in the table. This can happen when there is a many-to-many relationship, and the primary keys of both of the tables in the many-to-many relationship are embedded as foreign keys in a table in the middle. Assuming that there was a Chimpanzee table with chimpid as its primary key, the relationships between chimps could be captured by the table design shown below. The table's primary key would be the concatenation of chimpid1 and chimpid2.

CREATE TABLE Chimprelationships

(chimpid1 TEXT(6),

chimpid2 TEXT(6),

beginningdate DATE,

enddate DATE)

You could specify the primary key by including the following line in the table definition. All you have to do is list the concatenated key fields inside the parentheses, separated by commas:

CONSTRAINT chimppk PRIMARY KEY(chimpid1, chimpid2)

The primary key could also be added after the table was created and removed using the syntax shown with earlier examples.

3. This example shows the syntax for specifying the primary key and also for specifying that another field in the table be unique:

CREATE TABLE Person

(personid TEXT(12),

lastname TEXT(12),

dob DATE,

SSN TEXT(9),

CONSTRAINT personpkpersonid PRIMARY KEY(personid),

CONSTRAINT SSNunique UNIQUE(SSN))

The personid field will be constrained to be unique because it's the primary key. The SSN field will be constrained to be unique by the separate uniqueness constraint on it. As before, the key words are shown capitalized. It's a coincidence that the field SSN is also capitalized.

A uniqueness constraint could be added later by entering this command:

ALTER TABLE Person

ADD CONSTRAINT SSNunique UNIQUE(SSN)

For the field in question, this won't allow non-unique values to be added to the table and it won't allow updates where the new value is not unique. If you try to add a uniqueness constraint after there are already data in the table, the constraint will not be accepted if there are non-unique values in the relevant field.

The uniqueness constraint can be removed by entering this command:

ALTER TABLE Person

DROP CONSTRAINT SSNunique

4. Specifying NOT NULL as a constraint on a table is slightly different from the other constraints, because it is not named. All you have to do is put the constraint after the relevant field in the table definition:

CREATE TABLE Person

(SSN TEXT(9),

lastname TEXT(12) NOT NULL,

dob DATE)

A NOT NULL constraint could be added later by entering this command:

ALTER TABLE Person

ALTER COLUMN lastname TEXT(12) NOT NULL

This won't allow nulls to be added in new records that are added to the table and it won't allow values in this field to be updated to null. However, if the table already contains nulls in the field in question when the command is run, no error message will result, and those null values will remain in the table undisturbed.

Notice that in the ALTER TABLE version, what follows the key word COLUMN is exactly what you see in the version where the constraint is included in the table definition. This is important. You have to give the name, type, and size of the field in question. The ALTER COLUMN command has the capability of changing all of the characteristics of a column. If you don't repeat the column's size in the command, for example, the command will reset the size to its default maximum.

The NOT NULL constraint can be removed by entering this command:

ALTER TABLE Person

ALTER COLUMN lastname TEXT(12) NULL

It is aggravating, but true, that in Microsoft Access SQL this command will run without an error message, but it will not change the field so that it will accept null values. It is apparent that the command is accepted and runs, because if the size of the field is changed in the statement, this does have an effect on the table design. If you want to make this change and make sure that it happens, you have to use Access's graphical user interface. Information on this is given in the last section of these notes.

5. When putting a referential integrity constraint into a database design, the constraint goes into the foreign key table, not the primary key table. Let there be a table named Mother with at least this much in its design:

CREATE TABLE Mother

(SSN TEXT(9),

…,

CONSTRAINT motherpkSSN PRIMARY KEY(SSN))

Then a foreign key constraint in the Person table would be given as shown here:

CREATE TABLE Person

(SSN TEXT(9),

lastname TEXT(12),

motherSSN TEXT(9),

dob DATE,

CONSTRAINT personpkSSN PRIMARY KEY(SSN),

CONSTRAINT personfkmother FOREIGN KEY(motherSSN)

REFERENCES Mother(SSN))

A foreign key constraint can be added later with this command:

ALTER TABLE Person

ADD CONSTRAINT personfkmother FOREIGN KEY(motherSSN)

REFERENCES Mother(SSN)

If this is done after there are data already in the table, and there are values for the motherSSN field in the Person table that don't exist in the Mother table, the command will not be accepted. As long as nulls are allowed in the motherSSN field, it would be possible to change the unacceptable values to null first, and then the command would run. It would then also apply to all new records entered into the table and it would apply to updates of existing data.

Notice that there are two sides to the foreign key constraint. It is not possible to enter new values into the foreign key table, or update values in the foreign key table to ones that don't exist in the primary key table. It would also violate referential integrity if there were changes in the primary key table that left values in the foreign key table without matches in the primary key table.

Referential integrity is so important that the system also protects the database contents from changes in the primary key table. There are two possibilities: 1) If a primary key record is deleted, if it had corresponding foreign key records, they would be orphaned. It is most common in this case to disallow such deletions. This is known as "ON DELETE RESTRICT". 2) If the primary key value is updated, if that value had matches in the foreign key table, they would be orphaned. It is most common in this case to specify that the corresponding foreign key records be updated to match. This is known as "ON UPDATE CASCADE".

This is how the foreign key constraint example would look with these protections explicitly specified:

CREATE TABLE Person

(SSN TEXT(9),

lastname TEXT(12),

motherSSN TEXT(9),

dob DATE,

CONSTRAINT personpkSSN PRIMARY KEY(SSN),

CONSTRAINT personfkmother FOREIGN KEY(motherSSN)

REFERENCES Mother(SSN)

ON DELETE RESTRICT

ON UPDATE CASCADE)

Notice that with these options set, the system is doing a lot of work on behalf of the user, protecting the integrity of the data in the related tables.

11.3 What are Indexes?

1. An index can be described as a construct that supports two-column lookup. Suppose you're interested in words and their locations in a book. You look up the word, and what you find is its page number. This is a somewhat more detailed description of the situation: A) The words in a book don't occur in sorted order. They appear in sentences and paragraphs in an order that is determined by the topic under discussion and the rules of grammar. B) The index of a book consists of the important words in the book sorted in alphabetic order, followed by the page numbers where those words appear. This is your two column lookup. You look up the word, and what you find is the page where it occurs.

2. Being able to look things up is critical to the internal operation of a database management system and the execution of queries. Remember that technically tables are like sets: Their contents do not have to be kept in any particular order. If you want to see the contents of tables in sorted order, you know that you can put the key words ORDER BY in a query, but this doesn't change the order in which the records are stored.

You may have noticed that if you don't specify ORDER BY in a query, the results tend to come out sorted in primary key order. This still doesn't signify that the contents of the table are maintained in that order. It just means that that order may be the default order for results in some cases. It is generally the case that the records in a table are simply stored in the same order that they were entered into the table. Shown below is a very simple example of a situation like this. The records are not sorted on any of the data fields. In order to keep track of what's going on, an extra field is shown which simply indicates which is first, which is second, and so on. The technical term for this extra field is the relative record number, and it is abbreviated RRN:

Simple Person Table
RRN / idno / name / age
1 / 2 / Pam / 20
2 / 4 / Lee / 18
3 / 1 / Ned / 21
4 / 3 / Kim / 22

It should be clear that if you know the RRN for a given record, then it is easy to find that record and all of the data values that go with it. The RRN is analogous to a page number in the book index example. It is important to emphasize that the RRN is invisible to the database user. You never see this as part of a table. However, it is conceptually useful to talk about RRN's when trying to explain what a database index is and how it works.