VI. Migration: Maintaining content, context, and structure

Records systems should be designed so that records will remain authentic, reliable and useable throughout any kind of system change, including format conversion, migration between hardware and operating systems or specific software applications, for the entire period of their retention.

ISO 15489, Part 1, Section 8.3.5 Conversion and migration

Preservation strategies can include copying, conversion and migration of records….

c) Migration involves a set of organized tasks designed to periodically transfer digital material from one hardware/software configuration to another, or from one generation of technology to another. The purpose of migration is to preserve the integrity of the records and to retain the ability for clients to retrieve, display and otherwise use them. Migration may occur when hardware and/or software becomes obsolete or may be used to move electronic records from one file format to another.

ISO 15489, Technical Report, Section 4.3.8.1 Continuing retention

1.0  Distinguishing between born-digital and reformatted records

It is first vital to distinguish between digital records that were created digitally and those that have been made to be digital in a reformatting process, and additionally between digital files representing reformatted records that are destroyed and digital files representing records that still have a physical existence. The evidentiary status of all of these is potentially different without explicit authenticating actions even though the care required to carry them forward into the future may be substantially the same.

1.1 Born-digital objects

The born-digital record requires that its reliability be guaranteed by the creating software environment and its authenticity preserved by abstraction from that environment and subsequent secure handling.

1.2 Reformatted digital records replacing the originals

The reliability of a reformatted record depends directly upon the reliability of the original from which it is reformatted, plus the quality and reliability of the reformatting process, whose accuracy and degree of retention of the evidentiary qualities of the original must be audited by a third party and documented. Once the reformatting takes place and the original is destroyed, whatever the degree of reliability the record has as a result of those two processes will be the best it will ever do. From that time forward, its authenticity is dependent upon its abstraction from the reformatting environment and subsequent secure handling.

1.3 Reformatted digital records existing alongside the originals

If the original record is being preserved in physical form, there is no concern apart from convenience with the reformatted record, unless it is being depended upon as a “copy in form of the original” in any substantial way. If such is the case, then the authenticity of both forms of the record must be a concern.

2.0 Migration definition and varieties of technological threshold

Migration in the context of the present guidelines really means taking electronic records forward into the future in such a way as to preserve not only their reliability and authenticity, not only their content, context, and structure, but also their usability, what Charles Dollar has called their processability. This is an issue in itself, because a record that remains fully processable also remains fully alterable, often without trace. Hence in fact the preservation of a record’s reliability as originally created depends upon making sure that it is no longer editable, which means providing it with a new environment that is not in this important way the same as the original one (this is exactly what people are trying to do when they convert a word-processed file to PDF format). In other words, we need to be able to find things and look at them, but we must not be able to change them.

All workers in the field of digital preservation seem to be in at least tacit agreement that whether it can be read or not, a copy of the original record’s bitstream should be preserved, securely fenced around with a message digest to guarantee that it has not been changed and accompanied by open-standards metadata to guarantee its reliability. The ability to produce this “original record” will then stand as a means of authenticating the custodian so that the custodian can in turn reauthenticate the migrated record, a convention with which traditional archives are very familiar.

Migration is the term that has been applied to this process of carrying records forward over the very long term, because it is generally used to mean generational transfer—transfer across what might be termed a hardware/software paradigm shift or generation gap. But there is an entire continuum of processes for the “carrying forward” task, and their uses all have associated costs and benefits, because the continuum basically instantiates the observation that you can have as much preservation as you are willing to pay for.

2.1 Emulation

The gold standard for carrying digital records forward is emulation. No changes are made to the original record, and software support is provided to make it accessible exactly as it was when created. This means that software “emulators” may have to be written to carry out the functions of the original hardware, the original operating system, and the retrieval and display capabilities of the original hardware and software. It should be pointed out that even the systems on which emulators depend will themselves be subject to the need for conversion and migration. And emulators for the preservation of reliable records must not be perfect, either: full computability should not be preserved, even under emulation.

2.2 Conversion

The next best thing to emulation is never to let the record become non-computable, carrying it forward in little increments by converting it to run with each succeeding version of the original software, what might be termed “serial conversion.” If one is willing to let someone else decide what elements of the original record will be preserved, this is an attractive option, since software is routinely updated over time anyway. Further, it means that the software maker can probably be waited out for a few upgrade iterations, since the original record will remain upward compatible for some time until the software maker decides upon a generational shift to force major new purchases. Since software makers are loath to lose current users, however, it will probably be possible to carry the records forward with two conversions via the last version before the shift, which can probably still read the record and write it in a form that can be read by the new generation. If this chance is missed, however, a migration will be required instead. A major drawback to the practice of serial conversion is that the record, though modified slightly, remains fully computable and can be altered with little or no evidence left behind. This almost demands that it be removed to a secure environment.

2.3 Migration

The term “migration” is generally applied to the third and most drastic activity, which is the carrying forward of records to an entirely different system, either because the original system is entirely obsolete—usually because it has been abandoned by its manufacturer—or because the cost of maintaining many different formats is prohibitive and a decision has been made to carry out a migration to a single neutral format for the sake of economy of maintenance. Migration has been much maligned through its association with digital archaeology, the necessity to write special programs to rescue obsolete files. This is because the only application of large-scale migration procedures with any track record is that of the transfer of active data from one database system to another (which also requires migration of the database infrastructure and operational programming), where full computability, including the ability to change the record, has been achieved in the face even of paradigm shifts like hierarchical to relational or relational to object-relational. If full computability is not required, on the other hand, migration means using a special-purpose software tool to rewrite the record so that it is computable in this limited way. Although commercial tools may be available to do this task, they will also be designed to preserve full computability and may not be satisfactory for this reason.

3.0 Authenticity and varieties of authority thresholds

It has become clear that most users of digital records are prepared to accept minor alterations in the content of the original bitstream, but there are as yet—and may never be—precise, universally applicable specifications as to what the substantive elements of the record are that must be preserved, which parts of the bitstream represent “recordness.” It is certain, however, that if the record does undergo alteration, it will be necessary to document that alteration precisely and define clearly how that alteration may change the reliability of the record. Furthermore, this whole process implies an authority threshold or “Chinese wall” that the record must cross, a file management or archival function separate from the functions that led to the records’ creation, that will constitute a fiduciary agent with respect to the records creator—that is, an agent who will guarantee that the records have provably not been changed in any sense that will alter their evidentiary value, so that the records creator can securely acquit herself of any documentary obligation through the records in question. There are thus several authority thresholds between custodians, and who does what in the migration process will be dependent on when the migration becomes necessary:

1)  Creator custody (e.g., desktop hard drive)

2)  Provisional fiduciary custody (e.g., file server under control of records manager)

3)  Permanent fiduciary custody (e.g., file server under control of archivist)

3.1 Auditing the migration process

No matter who has custody of the record or what degree of perfection is sought in its preservation, the following steps must be performed if the evidentiary value of the record is to be guaranteed. Clearly an economy of scale would argue, therefore, that ideally creators keep the record copy of the records they create for as short a period as possible, and at worst that they provide the record to the provisional fiduciary before any effort at carrying the record into the future is undertaken.

1)  All efforts at carrying the record unto the future, whether they alter the record or not, must be externally audited, even if they take place within a records management or archival environment, and especially if they do not. This kind of auditing is by no means foreign to the IT field as a whole, but is seldom carried out as a matter of routine because it is expensive.

2)  On the basis of such an audit, that guarantees that all and only the specified changes were made to the records in question, the migrated records may be certified as to authenticity.

3)  Both the current migrated version of the record and the original bitstream must be guaranteed by means of message digests or equivalent.

4)  Any conversion on the same platform but simply from one version of software to the successive one must be diagnosed in advance to document precisely what alterations will be effected, and once the conversion is carried out, it should be audited as above.

Draft, 7/15/2002

Patricia Galloway

1