Object Database Systems Coursework

CO42009

Schema Evolution and

version control in OODB

24/05/2002

Group Members:

Lynne Ward – 98014226

Alistair Hamilton – 97000981

Ben Hall – 98055895

Joe W Falke – 98057278

Kriss Paul – 98016377

Schema Evolution in Object-Oriented Databases

Introduction

Databases frequently have long lives. During a database’s lifetime, the database schema is likely to undergo significant change as new demands are placed on the data. The database schema serves two purposes. First, it defines an interface for programs and users to query the data contained within the database. Second, it determines how the database management system physically stores the data in the disk. When the schema is changed so that the data can be used for a new purpose, this also impacts the way data is physically stored. The goal of schema evolution research is to allow schema definitions is to change while maintaining access to data that has already been stored to disk.

What is a schema? In computer programming, a schema (pronounced SKEE-mah) is the organization or structure for a database.

What is Schema Evolution? The term schema evolution refers to the changes undergone by a database's schema during the course of the database's existence. It refers especially to schema changes that potentially require changing the representations of objects already stored in the database.

There are two major issues involved in schema evolution. The first issue is understanding how a schema has changed. The second issue involves deciding when and how to modify the database to address such concerns as efficiency, availability, and impact on existing code. Most research efforts have been aimed at this second issue and assume a small set of schema changes that are easy to support, such as adding and removing record fields, while requiring the maintainer to provide translation routines for more complicated changes.

Schema Evolution is the terminology used for the change of data structure over time. It is the ability of a database system to respond to changes in the real world by allowing the schema to evolve. In many systems this property also implies retention of past states of the schema. This latter property is necessary if data recorded during the lifetime of one version of the schema is not to be made obsolete as the schema changes. Most database systems at some time or another require a change to their schema, due to either changes in the real world, a change in the application requirements or mistakes during systems analysis or design. In many systems available in the commercially the database administrator must also make decisions on whether the data already held in the database is valid given the new schema. In many cases data is either deleted unnecessarily, misleadingly left in the database or the schema is made unnecessarily complicated by the retention of obsolete attributes.

Example

As an example, consider a salary relation that hold the following fields:

Staff IDPosition CodeSalary

21677G55£33,000

21678G56£37,000

21680A05£45,500

21683A09£65,400

21687G51£32,000

Suppose that the position codes currently existent are to be replaced with new codes based on new domains, for example, a position code based entirely on a domain of four digit integers.

The database administrator has significant problems arising from the retention of the current data such as:

I.Is the position code attribute to be defined as alphanumeric despite the new position codes being numeric?

Is another field required to store the old codes, if so for how long do we retain this field?
What about position histories and retired employees?

This example represents one of the simpler causes for which a change to the schema occurs, in this case simply a domain change for an attribute has resulted in semantic problems for the existent data. In instances like this it is clear if the schema could evolve we could retain the old data as being applicable to an old schema definition and store new data under a new schema definition. More complex schema reorganisations (the deletion of a relation or an amendment to the class lattice) are accompanied by more severe schema evolution problems.

Changes to the class lattice can be categorised as (1) Changes to the contents of the node, (2) Changes to an edge and (3) Changes to a node, ORION allows all three types of change. These changes can be classified further, in particular, changing the contents of a node implies adding or dropping instances variables or methods, or changing the properties of existing instance variables or methods.

Planning Your Schema Evolution

As you develop and implement your schema evolution plan, you must ensure that no unanticipated results affected the data and the evolution was complete.

When planning your schema evolution, it is essential to anticipate all your requirements for the evolution, and to plan your application carefully around the desired outcome. A good rule of thumb is to try your schema evolution application on small databases to investigate the process, determine what works, and locate anything that might introduce complications.

Planning in the development phase of schema evolution is critical, but of equal importance are careful testing and validation of your implementation using a variety of methods. A conservative approach is best, so plan for the stages of your schema evolution project in the following sequence.

Sequence of Planning Your Schema Evolution

Determine if you can use the ossevol utility to update a database's schema, or if you must design a special schema evolution application. You cannot, for example, use ossevol to update a database if the database contains instances of os_Dictionary or os_rDictionary.
Plan the schema evolution model in cases where you require a special application.
Implement your design.
Test your implementation and troubleshoot.
If the facility is to be used to upgrade databases currently in use, obtain some active databases for predeployment validation.
Limit your initial deployment to validated customers, followed by general deployment to all customers.

Schema evolution decision tree

Schema Evolution with ossevol

The ossevol utility modifies a database and its schema so that it matches a revised application schema. It handles many common cases of schema evolution. Running the ossevol utility changes the physical structure of your database, so the importance of backing up your database before running this utility is critical.

Use this utility when you are performing simple operations such as adding or deleting data members that do not require a special evolving application.

Implementing Schema Evolution

Because of the potential complexity of the schema evolution process, it is important to incorporate as many safeguards as possible into your schema evolution application, to test it thoroughly using small databases, and to validate that the schema evolution has been successful. This chapter provides basic guidelines, examples, and validation techniques.

The Schema Evolving Application

What to include

Regardless of the evolution you intend to perform, make sure you plan to include the following in your evolution application:

Before evolution starts, use osverifydb -all to make sure the database returns a 0 result code indicating no errors (that is, you are starting with a clean, error-free database).
Tag any databases with a state block in which the states are, for example, operational, evolving, or validating.
Tag any databases that have version information that is only updated after an evolution has been deemed successful (osverifydb -all returned 0).
In your application, include an instantiation for each os_Dictionary or os_rDictionary instantiation in the database being evolved.
Ensure that an application tests the version information and the state information before resuming normal operations.

What to avoid

Delete os_cursor objects before schema evolution (schema evolution cannot handle os_cursor objects).

Unions require a very complicated custom schema evolution application.

Validation Activities

The following is a checklist of validation tasks you should perform to confirm that the schema evolution actually accomplished your objectives and did not make unexpected alterations to the database.

The first part of the validation stage should rerun osverifydb -all and again return a 0 result code indicating no errors.
Inspect the database in light of the the semantics of the data stored there (as much data as possible should be validated).
If the database is very large, do some statistical probing of the data.

Testing

Some additional testing you can perform to ensure that the database is as you expect includes. Write a test harness to exercise the database completely.

Troubleshooting

If you encounter difficulties when performing or testing the results of any schema evolution operation, you must debug carefully with the assistance of Object Design Technical Support. You must supply support with the, Pre-evolution database, Your schema evolution application and Stack trace of the time of failure

The schema taxonomy is as follows:

Changes to the contents of a node (a class)
Changes to an instance variable
Add a new instance variable to a class
Drop an existing instance variable from a class
Change the Name of an instance variable of a class
Change the Domain of an instance variable of a class
Change the inheritance (parent) of an instance variable (inherit another instance variable with the same name)
Change the default value of an instance variable
Manipulate the shared value of an instance variable
Add a shared value
Change the shared value
Drop the shared value

1.2.Changes to a method

1.2.1.Add a new method to a class

1.2.2.Drop an existing method from a class

1.2.3.Change the Name of a method in a class

1.2.4.Change the inheritance of a method

Changes to an Edge
Make a class S a superclass of C
Remove a class S from the superclass list of a class C
Change the order of superclasses of a class C

Changes to a Node
Add a new class
Drop an existing class
Change the name of a class

Invariants of Schema Evolution:

Any changes to the class definitions and to the structure of the class lattice must preserve these properties:

Class Lattice Invariant: The class Lattice is a rooted and connected directed acyclic graph with labelled edges. The directed acyclic graph (DAG) has exactly one root, the class OBJECT. The DAG is connected; that is, there are no isolated nodes. Edges are labelled such that all edges directed to any given node have distinct labels (the edges are used to aid conflict resolution).

Distinct Name Invariant: All instance variables and methods of a class, whether locally defined or inherited, must have distinct names.

Distinct Identity Invariant: All instance variables and methods of a class have distinct origin.

Full Inheritance Invariant: A class must inherit all instance variables and methods from each of its superclasses. There is no selective inheritance, unless the full inheritance invariant should lead to a violation of the distinctname and distinct identity invariants.

Domain Compatibility Invariant: If an instance variable V2 of a class C is inherited from an instance variable V1 of a superclass C, then the domain of V2 must either be the same as that of V1 or a subclass of V1. For example, if the domain of instance variable Manufacturer in the Vehicle class is the Company class, then the Manufacturer of a MotorizedVehicle can be a Company or a subclass of Company, for example, a MotorizedVehicleCompany.

The invariants of the class lattice hold at every quiet state of the schema, that is, before and after a schema change operation. They guide the definition of the semantics of every meaningful schema change operation by ensuring that the change does not leave the schema in an inconsistent state (one that violates an invariant). Occasionally, however, several meaningful ways of interpreting a schema change will result in a consistent schema.

Version Control in Object-Oriented Databases

Introduction

An object version, as described by Chou and Kim (1986), is ‘a semantically significant screenshot, taken at a given point in time.’

Systems that adopt Object-Oriented Database (OODB) technology, such as Computer-Aided Design (CAD) systems, rely on large numbers of object versions.

Engineering concepts, such as the ones implemented in CAD, concern the design of individual artefacts. The evolution of this artefact is never likely to be an ordered sequence of steps. Instead, the design process will involve a large amount of individual steps where the designer will need to revisit any one, or a combination, of these individual steps.

These steps taken by the designer are separate versions of the design and will, in this context, represent objects.

The resulting combination of these steps causes a large number of object versions to appear, whose interdependencies are of a high degree.

The refinement of these versions over time can be exemplified by looking at a real-world object, for instance a car.

A car consists of multiple top-level objects, for example an engine, a seat, a steering wheel, and many more.

Each of these objects has further combinations of object versions associated with them. These are sub-objects and, collectively, form a configuration for the car. For example, an engine version and a seat version are specific to one car. Furthermore, if a new engine version were to be created, this may constitute the creation of a new car version.

Changes made to object versions, like the ones discussed in this example, must be relayed to the designer who created the versions.

The description supplied above is the motivation for producing methods for implementing Version Control.

Versions

There is a general consensus that version control is on of the most important functions in various data-intensive application domains. Users in these environments often need to generate and experiment with multiple versions of an object before selecting one that satisfies their requirements. There are two types of versions on the basis of the types of operations that may be allowed on them. They are transient versions and working versions.

A transient version has the following properties:

It can be updated by the user who created it.
It can be deleted by the user who created it.
A new transient version may be derived from an existing transient version. The existing transient version then is ‘promoted’ to a working version.

A working version has the following properties:

It can be considered stable and cannot be updated.
It can be deleted by its owner.
A transient version can be derived from a working version.

We impose the update restriction on the working version because it is considered stable, and thus transient versions can be derived form it. If a working version is to be directly updated after one or more transient versions have been derived from it we need a set of careful update algorithms (for insert, delete, update) that will ensure that the derived versions will not see the updates in the working versions.

Version Name Binding

There are two ways to bind an object with another versioned object: static and dynamic. In static binding, the reference to an object versioned includes the name of the object, the object identifier, and the version number. In dynamic binding, the reference needs to specify only the object identifier and may leave the version number unspecified. The system selects the default version number. Clearly, dynamic binding is useful, since transient or working versions that are referenced may be deleted and new versions created. Due to reference differences between models the user must be allowed to specify a particular version on the version-derivation hierarchy as the default version. In the absence of a user-specified default, the system selects the version with the most recent timestamp as the default.

Implementation

Because of the performance in supporting versions, we require the application to indicate whether a class is versionable. When an instance of a versionable class is created, a generic object for that instance is created, along with the first version of that instance. A generic object consists of he following system-defined instance variables:

an object identifier;
a default version number;
a next-version number;
a version count, and
a set of version descriptors, one for each existing version on the version-derivation hierarchy of the object.

The default version number determines which existing version on the version the version-derivation hierarchy should be chosen when a partially specified reference is dynamically bound. The next version number is the version number to be assigned to the next version of the object that will be created. It is incremented after being assigned to the new version.

A version descriptor contains control information for each version on a version derivation hierarchy. It includes:

the version number of the version
the version number of the parent version,
the identifier of the versioned object, and
the schema version number associated with the version.

After a transient version is derived, the user may modify the schema for the transient version. The original version and the transient version will use different schemas. This is the reason for including the schema version number for each versioned object. However, if a transient version is derived from a working version, both versions may use the same version of schema.

A generic object is also an object and as such has an object identifier. Each version of an instance object of a versionable class contains three system defined instance variables. One is the identifier of the generic object. The others are the version number of the version and the version status (transient or working). The generic object identifier is required, so that given a version of an instance object, any other versions of the instance object may be efficiently found. The version number is needed simply to distinguish a version of an instance object from other versions of the instance object. The version status is necessary so that the system may easily reject an update on working versions.

Version Evolution as a Hierarchy

Within an OODB, the evolution of object versions can result in a complex hierarchical structure forming that illustrates the history of an object.

At a basic level, a typical hierarchical structure representing the history of an object will look as it does in Fig 1.1 below.