PURL Pilot: DRAFT

Last revised 2/11/02

Note 1: The procedures written by Renee Chin & Becky Culbertson (UCSD) were used for many parts of this draft.

Note 2: This is draft document. Changes willbe made as we learn more about cooperative cataloging through the PURL server software. See also for answers to Frequently Asked Questions.

Definition

A PURL (Persistent Uniform Resource Locator) is a substitute for a URL. The PURL points to a look-up table (or resolution service) which redirects the query to the URL.

Proposal

That a group of CONSER participants test the concept of cooperative maintenance of URLs for freely-available e-resources through an OCLC-hosted PURL server. The pilot will be conducted with the intention, if successful, to forward a recommendation to PCC regarding use of a PURL server for records maintained by BIBCO/CONSER institutions.

Overview

Each CONSER participant would create at least 20 PURLs using freely-available resources. Participants would submit a brief Monthly Report form. OCLC would email error reports on a weekly basis. Participants would agree to maintain PURLs, based on the error reports, at least monthly. That is participants would review at least 3 reports over a 3-month period (those able to do so would be encouraged to review reports more frequently). Participants would submit brief monthly report form (see below), and review/critique the final report to CONSER due by the end of April, 2002.

Time frame

May-November 2001: Set up PURL server, agree on procedure

November-December 2001:Revise procedure; notify CONSER pilot participants

January-March 2002: Test PURL server/procedure

April 2002:Write evaluation

Evaluation questions

  1. What are the benefits of this strategy? What problems were encountered?
  2. (For administrators) What would it mean for CONSER to endorse the use of a shared PURL server?:
    (a) Additional time required to do cataloging? (b) Maintenance aspects
  3. Would this work as a PCC tool? Should non-PCC catalogers have access to the PURL server, to create PURLs? Should all OCLC users have access to maintain existing PURLS (though not to create them) on PCC records?
  4. CONSER participation: Could CONSER record maintenance through the PURL server be counted towards the institutional commitment (as "maintenance of authenticated records")?
  5. Statistics:
  6. How many PURLs were created?
  7. How many CONSER PURLs were created? non-CONSER?
  8. How many CONSER PURLs required revision during the test period? non-CONSER?
  9. How many CONSER resources were withdrawn? non-CONSER?
  10. What enhancements to the PURL software or configuration would improve the usefulness of a shared PURL server? (e.g., batch-creation of PURLs?) Should there be better integration of of the PURL server software & the OCLC interface for (1) creation of PURLs as part of the cataloging process; or (2) reference to URLs from the PURL server in duplicate detection during record creation

Exit strategy

After the evaluation, if OCLC decides not to support the PURL server, then pilot participants would receive a file of their PURLs in order to replace PURLs with URLs. In addition to an institutional file, a complete file would be sent to the pilot organizer. After three months, all records would be reviewed, to make sure that all PURLs have been reversed to URLs. If necessary, I would revise or report to OCLC any remaining URL reversions needed.

Questions Related to Procedures for the Pilot

  1. Creation of PURLs:
  2. What is the scope of the Pilot?…

1)…In terms of types of resources?
The pilot is limited to freely-available resources, excluding federal documents (which are already covered by a PURL server). Note: PURLs for federal documents are obtained through Theodore deFosse ( )

2)…In terms of types of URLs?
Pilot participants need to be aware that the following characters will always report an error message: #, ~,

3)Note that "/" is a legitimate final character; and it is required in a partial redirect name

4)…In terms of types of records?
The pilot is limited to URLs in bibliographic records. Aside from the limitation to freely-available web sites (excl. federal documents), pilot participants may assign PURLs to:

a)records for CONSER serials, new or existing (as well as serial bib records for related versions);

b)records for monographs, new or existing; or

c)records for integrating resources, new or existing.

  1. What methods of PURL creation are available?

Two methods are available: one for creating single PURLs; another for batch-creation of PURLs.

c.How should subfield $u’s be entered in the 856 field in an OCLC record?
Each 856 should have 2 subfield $u’s. The first subfield $u the PURL; and the second, the original URL. No maintenance need be done on the second URL, it’s just for CORC duplicate detection. (Note: If the PURL pilot were successful, the PURL software could be integrated in the new OCLC interface & the double subfield $u would be discontinued.)

  1. Maintenance of PURLs
  2. How often will the PURL link checking software be run?
    The PURL link-check software will be run once a week during the test period.
  3. How will pilot participants be notified of problem PURLs?
    Pilot participants will receive email messages with the subject line:
    PURL VALIDATION REPORT
    The report will consist of an html file attachment with the following fields:

1)Status code

2)Number of hops

3)"Ownership" (purl maintainer groups): CONSER, PCC, [institution code(s)]: Ideally, the ownership field would include all holdings symbols, so institution with holdings could maintain the PURL

4)PURL

5)URL

  1. What are the expectations for problem resolution?
    Pilot participants will review validation reports at least once monthly to: (1) Correct URL or (2) Redirect URL to "withdrawn URL" page. (Susan, can OCLC set up a "withdrawn URL page" for the participants? ) Note: Weekly review of validation reports by participants is recommended; but at the very least, the reports should be reviewed monthly.
  2. What should be done about web sites that refuse access by robots ("disallow")?
    Web sites that refuse access by robots cannot have PURLs. Please report such sites to the pilot organizer(s) who will notify the inputting library, inactivate the PURL, and send it to the withdrawn page.
  1. Withdrawal of PURLs
    From time to time, someone may create a PURL and later regret it. For example, if a cataloger establishes a PURL for an e-resource and later realizes that the resource is a federal document or a commercial site, then the PURL would need to be withdrawn. In these cases, the URL should be re-entered in the bib record and the PURL should be withdrawn. To withdraw a PURL, go to Select Modify PURLs. Enter ID/Password. Enter/[PURL] in the PURL box. Delete URL. Type in reason for change, e.g., created in error. Click on Modify PURL. Click Confirm. This process willassociate the PURLwith a "null URL." PURLs may be withdrawn in this way; but they should never be deleted. ( )

Example of a page returned by a withdrawn PURL:

The requested PURL (/3314) has been deactivated and cannot be resolved.

Current information

PURL/3314

IDRYANUCSD

Creation DateMon Jun 5 11:49:07 2000

Last ModifiedWed May 16 10:38:30 2001

Change CategoryUser_Edited

History

URL

ModifiedWed Nov 15 11:25:48 2000

Change CategoryUser_Edited

Reason for ChangeNo longer available via LINK

Change CategoryBatch Next

ModifiedMon Jun 5 11:49:07 2000

URL

Please report withdrawn PURLs to the pilot organizer as part of monthly statistics.

  1. Withdrawal of e-resources
    Pilot participants should edit bib records (see under "III. PURL Maintenance: 4. Withdrawn"). In the PURL server, withdraw the PURL by the above process, except modify the PURL to change it to a PURL for an HTML page for withdrawn resources.

SAMPLE

Two examples of PURLs that lead to "withdrawn resource" pages are:

While these examples are based on the California Digital Library PURL server, a similar page would be established for the OCLC PURL server.

  1. Monthly report: What reports should pilot participants submit monthly?
    Each month, pilot participants should submit a web form:
  2. number of PURLs established that month (total & CONSER);
  3. number of PURLs in CONSER records requiring problem resolution;
  4. number of other PURLs requiring problem resolution;
  5. number of PURLs withdrawn;
  6. Comments
  1. Documentation: Where will documentation on the Pilot project be posted?
    Files related to the pilot are be posted to
    The pilot page includes:
  2. Monthly report form
  3. List of names/email addresses for pilot participants
  4. General procedures
  5. PURL server link
  6. Final report
  7. Final statistics

We anticipate that these documents will be available through the CONSER page by May, 2002

Procedures

  1. PURL Creation: See:

Note: Before beginning, open two web browser sessions with

[Put in picture of PURL server menu here]

  1. Logon: For security reasons, CONSER participants must log on with their CONSER authorizations and passwords. Pilot participants will become registered users the first time the server is accessed.

NOTE1: Anyone attempting to register on the server who is not using a CONSER authorization and password will receive an error message.

NOTE2: Once a participant has logged into the PURL server, s/he will be able to create, modify, etc., several PURLs without having to enter an authorization/password separately for each action.

All registered CONSER participants will become members of the CONSER group on the server & will be able to maintain each other participants' PURLs.

  1. Open the Search section of the PURL software to check for duplication. Search for URLs by using URL key words instead of whole URLs. Note that Keywords are case sensitive. (This strategy yields more inclusive results than checking whole URLs. Different catalogers could enter slightly different URLs; if one entered an exact string match, one could easily miss equivalent URLs.)
  1. Go to PURL home page at:
  1. Check for duplication (i.e., check to see if PURL has already been established):
  1. Search PURL resolver to see if a PURL already exists for each URL. Search by segments of a URL (e.g., “psyeta” in for most effective match.
  2. If a PURL already exists, use that PURL
  3. If no PURL exists, create a new PURL
  1. Create a single PURL
  2. Choose "Create single PURL"
  3. Enter ID/password if you have not already logged into the PURL server [note that ID/password are case sensitive]
  4. Type/paste the URL in the "URL" box.
    (NOTE: Since the URL used in the PURL server is not in the MARC record, there is no need for certain substitutions.Do NOT substitute %5F for underscore, or %7E for tilde)

Note:

# will result in an error report by the link-check software

// multiple slashes will be collapsed into a single slash

/ is legitimate as a final character; it is required in the case of partial redirects

  1. Click on "Create PURL" under “PURL Creation and Maintenance”
  2. Click on URL here to verify URL accuracy.
  3. Use back button to get back to browser.
  4. Click on "confirm"
  1. Batch Creation of PURLs:
  2. Click on “Batch Add PURLs” under “PURL Creation and Maintenance”
  3. Use “Add List,” if you choose to add a list of URLs within the window.
  4. At the beginning and end of the list, use <recs> or </recs> respectively
  5. For each PURL, add a line:
    <rec>
    <purl</purl>
    <rec>
  6. Example of complete list:
    (NOTE: Indentation is used just to make reading easier; it is not necessary)
    <recs>
    <rec>

<purl</purl>
<url>

</rec>
<rec>

<purl</purl>

<url>

</rec>

</recs>

  1. Click “Add Batch” button to send batch request
  2. [OCLC PURL server will respond with a list of PURLs & corresponding URLs]
  3. You may wish to copy the list to Notepad or Word, if the PURLs are not to be used immediately.
  1. Bibliographic record:
  2. In the 530 other formats note (single record approach) or 538 mode of access note (separate record approach):

530Also available via the World Wide Web.
538 Mode of access: …

  1. 856 field: Add PURL in the first subfield $u of the corresponding 856 in bib record. Retain the original URL in second subfield $u. E.g.:
    856 41 $u $u
    NOTE: The information in the second subfield $u is intended solely for OCLC use in duplicate record detection and need not be maintained as the URL changes.
  1. Related bibliographic records: If there are multiple OCLC records representing variant versions (most commonly, paper & online), then add the PURL to the 856 field for each version. Example:
    OCLC# 25866594: Society & animals
    OCLC# 46775398: Society & animals (Online)
    Reasons: Might as well! Also: If the paper version ceases, that’s one less thing to think about when doing the title change.
  1. PURL Use

The current version of the PURL software does not have an option for automatic addition to the maintainer group. After someone creates a PURL & enters it in a bib record, there is no automatic way for subsequent users of the record to register with the PURL server.

Paralleling the validation option offered by the OCLC CORC system, participants may choose to add their holdings symbols to a PURL server record that has already been created. For the participant, the benefit would be that an institution would be notified to check the bib record for possible changes (if the URL change signaled more extensive changes to the resource). But the disadvantage would be that some PURLs in the mailed list might already be corrected by the time an institution on a slower maintenance cycle reviewed the problem. For OCLC, the benefit would be that, with more monitoring libraries, a record for a popular (commonly-selected) resource would have a better chance of being maintained.

Or, participants may choose to add holdings symbols only when they are the PURL creator. In this case, participants would commit their institutions to maintaining just the PURLs that they create. If the participant could not continue their commitment, they would arrange with another institution to maintain the PURLs for which they were solely responsible.

NOTE1: Any registered CONSER library will be able to edit any of the PURLs created as part of this pilot, regardless of whether the library is listed as a maintaining institution.

NOTE2: Other libraries may view information in the PURL server, even though they cannot edit the information.

  1. PURL Maintenance:

NOTE on OCLC-initiated reports:

  • For background information on OCLC PURL link validation, see:

(types of errors for HTTP 1.0)
(types of errors for HTTP 1.1)

  • Once a week, OCLC will run link validation and send email to participating institutions. The report will consist of either an affirmation that there were no error messages for that week; or an HTML-formatted message including the appropriate validation problems for that participant. The messagewould exclude URLs with no problems (200 = successful link).
  • The email message will have a subject header of: PURL VALIDATION REPORT and will consist of a text file (?is comma-delimited an option?) email attachment.
  • If your email reader cannot display attachments, then go to the OCLC page ( ) and fill out the form to search for previous validation results by explicit maintainer (institutional symbol).
  1. Review the list for errors. Some common situations are listed below.

a.Recheck codes that may be 'false drops':
3=Connection failed
4=Timed out

10=Too many re-directs (21 is the maximum number of re-directs)

500=Internal server error (server may be down)

  1. Check for URLs that may not be valid for PURL server

24=disallowed by web site (site has entry in /robots.txt )

  1. Find current URL if possible. Update PURL server

400=Bad request

403=Forbidden [this is a catch-all error message]
404=Not found

  1. Modify URL in PURL server: Use form at:
    For explanation, see:
  2. Enter ID, password
  3. Enter PURL
  4. Either enter corrected URL, or for resources that have been withdrawn (as best you can tell), enter the PURL for the "withdrawn" page:
  5. If no valid URL can be found, enter a comment note, listing last known URL (for historical reasons)
  1. Changed resources: Edit OCLC record as appropriate to reflect changes in the description. (The 856 $x information is intended as descriptive information; it need not be maintained)
  1. Withdrawn: Edit OCLC record for resources that have been withdrawn from the Internet or that are permanently unavailable.

Question for CONSER: How should bib records be edited, in the case of e-resources that have been withdrawn? There are many possible conventions, three of which are:

(1)The 856 field could be removed entirely from the bib record & the PURL de-activated;

(2)The PURL could be removed from the 856 field (while leaving the second URL) & the PURL de-activated;

(3)The 856 field could be retained intact (with the PURL), with a note: subfield "$z Link no longer valid as of [date]"

  1. Single record approach: If the bib record describes a tangible resource, then the 856 field is only a note or pointer to the Internet resource. Delete as appropriate:

1)007 field

2)530 field

3)740 field

4)856 field: Use note "$z No longer valid for this resource as of [date]" if 856 field deletion is not desired at this time.

  1. Separate record approach (i.e., E-resource cataloged): In this case, the bib record serves as a historical record of the existence of an e-resource. Therefore, as much information as possible should be retained. But a warning is needed to alert OCLC users and catalogers to the fact that the resource is no longer available.

1)Fixed field:
Monograph:DtSt: m // Dates [beginning date], [end date or uuuu]
Serial:DtSt:d or u // Dates [beginning date, end date]

2)500 note:No longer available online.

3)856:
If URL is no longer valid: 856 $z Link no longer valid as of …
If URL is valid but points to a different resource: 856 $z Link no longer valid for this resource as of ….

4)Modifythe PURL to change itto the PURL for the HTML withdrawn page that will stay constant.

5)Notify project organizer for addition to monthly statistics.