SI 655 Consulting Team 5Archiving the SI Website

Table of Contents

I.The Problem

II.Goals and Objectives

III.Strategies

Policies

Information systems

Management of the recordkeeping function

Procedures and practices

Education and training needs

Other considerations

IV.Implementation issues

Time Frames

Obstacles

Opportunities

V.Methodology

Roles and Responsibilities

The Information Harvest

The Search for Ideas

Strategies considered

VI.Conclusion

VII.Attachments

1.Principles and Best Practices: Kansas Electronic Recordkeeping Strategy: A White Paper.

2.Draft Policy: Archiving of the School of Information Website

3.Metadata “Boot Strap”

4.Assessment and “Recordness” appraisal of the SI Website.

5.Project Plan: SI Website Archiving Team

6.Working Draft: Methodology for deciding whether a website should be preserved in a digital or paper format

Page 1 of 34

SI 655 Consulting Team 5Archiving the SI Website

I.The Problem

The advent of the Internet age has caused much concern for the fragility of digital records. This has led to much research into the storing of electronic records, for data sake, but not into the storing of web “records.” One of the most important questions is defining what should be saved from a website, (e.g.- what is “record” worthy), and how often should it be saved.

The focus of our project is examining the SI Website and its stakeholders, and proposing a solution for archiving the SI Website. We have attempted to speak with all the relevant personnel of the SI Website to garner what they feel should be saved, through what means, and how often. We have also spoken with the Bentley Historical Library to determine what they expect from departments in relation to electronic records. In addition, we have considered “best practices” and attempted to find relevant literature to our dilemma.

Through our initial interviews we have determined a few challenges:

  • The SI Website has never been officially archived
  • SI Computing does not have a current methodology to archive the site in its current form
  • The Bentley is not “really” prepared to accept electronic records
  • The Bentley does not have a dedicated archival server, nor the staff to deal with such items or records
  • The SI Website is extremely complex and dynamic with its use of relational databases and Cold Fusion
  • The Bentley, and all literature, place the responsibility of initial archiving in the hands of the creator, in this case SI
  • The Bentley has not defined what format they wish to receive the information

Through this report we hope to be able to deconstruct these problems into workable parts in order to present a thorough plan for both SI Computing and the Bentley in terms of receiving the SI Website in a regular manner.

II.Goals and Objectives

Charged with developing strategies and policies for archiving the SI Website, we began to explore and examine the current situation from various perspectives. One of our first goals was to expose existing practices, as well as to uncover what official policies and procedures were already in place to govern the creation, management and preservation of the SI Website. Furthermore, we needed to understand the SI Website’s technologies for accessing information, including the software, relational databases, and frequently updated and offloaded content. Our ultimate goal was to develop a workable solution that could be used for archiving this website.

An initial interview with Frank DeSanto, SI Media Production Coordinator, revealed that Frank works closely with Jay Jackson, SI Editor, and with a half-time student to update and maintain the SI Website. This team is responsible for the development and deployment of resources and new web-based information systems. Frank recommended we make contact with two additional people: Jon Leonard, a member of the SI Computing staff who handles server stuff and knows how the website backend is handled; and John Ringold, an SI graduate who administers one of the primary database servers and manages the database driven content. The goal of conducting these interviews was to understand the Website. These interviews clearly indicated no archiving of the SI Website is or has been done. Furthermore, it became obvious that no more than thought had been given to a website preservation plan or policy.

Our investigation into existing University of Michigan policies and guidelines pertaining to the preservation of electronic records led us directly to the U-M Standard Practice Guide (SPG). Specifically, to the SPG ONLINE, Section II Guidelines for the Creation, Maintenance, and Preservation of Electronic Records, which states:

"Offices and Individuals should consult with the University Archives to establish a system of periodic transfer of electronic records which have been deemed appropriate for preservation. Special technical considerations which apply to records in electronic format include…

  • Periodic reformatting of electronic so that they may be accessed by succeeding generations of hardware and software, or alternately, for the storage of electronic records in a machine and software independent electronic format;
  • Timely transfer of historical records in electronic format to the archives;
  • Development of technical identification to establish the provenance of electronic records that have been altered by numerous individuals; and,
  • Provision of adequate software documentation and identification to allow access at a later date."[1]

This information further guided the objective of creating a solution.

The Bentley Historical Library houses the University Archives. The Bentley's University Archives and Records Program (UARP) staff administers, preserves, and services the university's records. Since these individuals not only have key expertise, but also have knowledge of preservation policy and practices, interviewing these staff became another objective for gathering information. UARP staff were delighted by our interest in the SI Website and eager to meet with us to discuss best practices and to share their research analysis. Nancy Bartlett, Head of the UARP, informed us that the University Archives and Records Policy is currently undergoing significant revisions. Although Appendix 8.1 of the Manual provides distinct guidelines for the creation, maintenance, and disposition of paper records, at this time there are no comparable guidelines for electronic records.

Nancy Deromedi, also a member of the UARP, presented us with an unreleased Working Draft document entitled Methodology for deciding whether a website should be preserved in a digital or paper format that she had co-authored with Christine DiBella, a former SI student. Additionally, Nancy shared with us several other draft documents that have yet to be reviewed or adopted. One of these documents outlines a proposed assessment strategy for determining the "recordness" of various web-based documents, while another summarizes procedures for the direct transfer of print and electronic publications to the Bentley.

During the course of our project, the exact goals evolved to include objectives identified after the initial information gathering during the initial phases. However, the ultimate goal of providing a realistic and implementation solution for archiving the SI Website remained paramount.

III.Strategies

This consulting team recommends a combination approach for the archiving of the SI Website, including policies, technology, and organizational practice change.

Policies

Currently, the School of Information does not employ a policy or practice on archiving the website. In fact, the only saving of website information takes place through a daily backup tape for recovery purposes. While SI Computing does save one backup copy of the web server’s data per month for the “long term” (indefinitely), this is not a usable archival copy because the data is not software or hardware independent. Furthermore, retrieval of specified data on these tapes is nearly impossible, and extremely time consuming at best.[2]

While the School of Information is not employing any policies, the Bentley is in the process of creating new guidelines and policies over electronic material, including websites. They have created a working draft for the assessment of website record material, as well as completed a case study of the Law School Website using such methodology.[3] Finally, they also have a policy and procedure for electronic format records, though this policy is eight years old, and under revision.[4] However, the Bentley has had significant difficulty with enforcement of archival policies. In reality, the Bentley makes recommendations to schools and departments about archiving, but only accessions “voluntary” material.[5] The Bentley hopes to accession the significant records created by the School of Information, but has received few documents, especially in the past few years. In fact, the Bentley staff noted that since the renaming of the school, they only receive email newsletters. No other record material has been sent.[6]

In consideration of the current state of archival policy at the School of Information and the Bentley, this consultative team recommends a policy adoption by the School of Information’s Website creators and maintainers. To enforce such a policy, buy-in by such authority figures as the Dean of the School of Information (currently John King), the Editor of the SI Website (currently Frank DeSanto), and the head of SI Computing (currently Rich Boys) will be instrumental in its adoption. To aid the progress of such adoption, the Dean has been contacted, and he says:

“I'm strongly supportive of our determining the right strategy for SI in this area, and I'd stand behind whatever actions we take in consequence. SI is in the interesting position to set a standard for U-M in this regard, so we ought to step up to the plate. This is new territory. There are no ‘best practices’ for us to follow. We have to create them. So, let's do it.”[7]

Rich Boys and Frank DeSanto have been spoken to, and both are in favor of a reasonable policy for archiving the SI Website.[8]

A recent report done by M.R. Talbot for the Library Services Branch Management Meeting for the State Library of South Australia considers the main elements that need to be addressed for the archiving of websites. These concerns include:

  • Definition of preservation
  • Levels of archiving
  • Content
  • Look and Feel of the Website
  • Coding
  • Format for preservation
  • Software retention
  • Copyright on archived software
  • Changing hardware requirements[9]

Each of these concerns is of direct importance in the consideration of a policy for the SI Website. Talbot’s concerns and preservation issues should be considered in the drafting of any Website archiving policy.

Using Talbot’s discussion of policy considerations, the Bentley’s working digital preservation, as well as other resources, the authors of this document recommend an archiving policy that includes the following elements:

  • Schedule for archiving the website
  • Needed levels of archiving
  • “Recordness” criteria
  • Suggested technical methods of archiving, including file format and physical media
  • Technical Obsolescence Solutions
  • Needed metadata or “bootstrap” information to accompany the archived Website

A drafted policy for the Archiving of the SI Website is provided in Attachment 2 of this document for review and editing by the creators of the SI Website, as well as the archivists at the Bentley.

This policy includes the a schedule of archiving three times a year (once a semester); this duration was chosen because there are significant changes in content between semesters, and this frequency is reasonable to the creators. This policy defines the necessary levels of archiving as retaining the user-viewable content as most critical, a secondary importance to the look and feel, and a tertiary importance to the retention of the actual code. In order to assess the “Recordness” the policy recommends adopting the process put forth by the Bentley Working Draft. In terms of technologies, this policy does not require the conversion of the data, or any specific technology use, since the technologies of the website and the archives will change over time. However, the policy recommends recording the website files in a format that is as “software independent” as possible. In terms of archiving strategies, this is the “snapshot” method of preservation (versus a paper copy or archival web server).[10] Any dependent software should be included with the archival copy (such as the web browser). The media utilized should be the recommended format from the Bentley. This team recommends CDs for an initial implementation of archiving the website. This policy makes technological suggestions for migrating the software, including the use of the software “WebZip” (discussed below).

In addition, the metadata information to accompany the archived website is also included as a draft in the Attachment 3 of this document. Because of the potential loss of data about data, this consulting team recommends all metadata be preserved in paper form. The metadata recommended includes the following: name of the website, date of snapshot, URLs included, all file formats, any software used to convert or migrate the files, description of the migration (if any) done on the files, any software required to view the files (e.g. web browser), any software included with the archival files, the operating systems on which the software will run, records included in the website, and those who submitted the archival version with contact information. Also, this form includes more information that the Bentley may want to include, such as a migration/refreshment schedule and indexing information.

Information systems

The SI Website currently uses a combination of information systems to offer a dynamic website. While some of the pages are static HTML, most are dynamically generated through ColdFusion (.cfm) files that pull information from a variety of databases. While this set up offers ease of maintenance, as well as an up to date view for users, this configuration incurs several problems for archiving. First and foremost is the need for the proprietary software in order to view the site. This software includes the web server application, ColdFusion, and the database software. ColdFusion is a proprietary piece of software, which has a potentially limited future given the short lifecycle of new products. However, the SI Website creators plan to continue and expand the use of dynamically generated pages through ColdFusion into the foreseeable future because of its desirable functionality and the cost structure for moving to a new technology.[11]

At the same time, the Bentley’s resources for maintaining a dynamic website is limited. Not only do they not have an archival web server, they also do not have the human resources to maintain such a server. Furthermore, they do not have any of the proprietary software required to enable the interactivity of the website. This puts many limitations on the format and software needed to preserve an archival version of the website.

To accommodate for the dynamic nature of the website, as well as enable viewing and access of the pages without a large amount of proprietary software, the authors of this paper recommend the immediate migration of the website to a more “software independent” format. Furthermore, the website should be usable through an offline source that does not require specific hardware, such as a web server. This can be accomplished through a conversion of the webpage files to standard, static HTML through software tools such as WebZip.[12] WebZip is available online for under $40; it saves and converts entire websites into standard HTML, including dynamically generated pages. This tool will grab any links from a root page, as defined by the user. The evaluative attempts to save the SI Website by this consulting team were relatively successful. Further use by a savvy user with extended knowledge of the SI Website could probably identify mishandled pages, and possibly reconfigure WebZip to handle these pages correctly. While this tool has great potential for saving the SI Website, it will change the actual coded content of the pages; these changes will need to be recorded. Moreover, there may be errors in the conversion that cannot be avoided; there may be loss of information, including records. Finally, this tool may also become obsolete. In this case, the method for migration will have to be reassessed.

True long term preservation of the SI Website will require continuous refreshing and migration. This consulting team recommends the website information be put on a migration and/or refreshment schedule from the initial acquisition in order to prevent any loss through media deterioration or technical obsolescence.

Management of the recordkeeping function

The Bentley Historical Library recommends a methodology for assessing the “recordness” of website files. This working draft methodology was utilized to consider the management of the recordkeeping function. This method includes assessing the following four elements:

  • Record Structure,
  • Record Functionality and Presentation
  • Record Content
  • Risk Assessment[13]

Using this methodology, two aspects of the SI Website emerge as critical elements. The first is functionality, specifically the dynamic generation and hypertext, which provides significant value to the understanding of the information found on the site. The dynamic generation of pages and the database querying of information makes a significant impact on the value and realized content of the website information. Secondly, it is difficult to discern what constitutes a record on the SI Website. The information on the site is linked in many different ways and it becomes hard to determine what is a record and what is not. Traditional archivists view records as solitary objects, but the website creates a new dynamic through the linking of many records.