TEI Header
Overview 1
Quality control assurance 2
Data sources 2
TEI header sections 2
Simple data map 3
Boilerplate statements 4
Other TEI elements 5
Overview
The OAAP project is a collaboration between librarians, scholars, and technologists. The team is comprised of archivists, catalogers, language specialists, programmers, researchers, and subject experts. A unique challenge for this project is to bring together the expertise, experiences and knowledge of all these individuals in meaningful and effective ways in order to develop high quality digital resources.
Another goal of the OAAP project is to provide federated searching of digital resources across technically diverse infrastructures. Rice’s collection is digitally housed in DSpace, open source application. This institutional repository provides for the sharing of global metadata, permanent identifiers and long term preservation of digital resources. The federated search tool will gather metadata directly from the DSpace item-level records. Therefore the primary location of metadata for digital resources is the DSpace repository.
Though information in the TEI Header will not be used in partnership level discovery tools and the TEI P5 guidelines provide for minimum level headers, it is believed that providing a fuller TEI header will allow TEI documents to stand alone (or when accessed outside the repository) and align more closely to the recommendations of Level 4 of the TEI in Libraries Guidelines.
For the OAAP project, primary metadata is created by catalogers and archivists, in accordance with established standards and best practices in the art of bibliographic information. As a consequence, one of the objectives for this project is to ensure that any bibliographic data provided in the TEI header is consistent with data provided in the DSpace item-level record.
Given the above background, the role of the TEI Header in the Americas project is to:
- Identify the digital resource
- Provide general transcription and encoding practices in the creation of the TEI document
- Provide bibliographic information regarding the original archival document
Quality control assurance
Automation of TEI Header will:
- Ensure consistency of the type of information populated in TEI headers across all documents in the project
- Ensure accuracy and consistency between formal metadata record (DSpace) and TEI header (xml file)
- Enrich TEI header information with taxonomy and other bibliographic information
- Reduce manual data entry
Feedback from scholars, researchers and transcribers
During the process of transcription and or translation work, project staff may make suggested changes for titles, names and place terms as initially assigned by librarians.
Data sources
The general premise for automating TEI headers is to use the same information used to populate the DSpace item record for generating TEI headers and supplemented with information unique to the TEI process. The consolidated metadata used to populate TEI Headers will come from 4 sources:
- Metadata spreadsheet – descriptive and administrative metadata created by librarians.
- Technical metadata – specifications generated through the scanning process regarding the page images
- Google tracking spreadsheets – provides responsibly party information for manuscript and translation activities
- Boilerplate statements provided by the TEI team describing general encoding practices
TEI header sections
The TEI P5 guidelines provide for a wide range and depths in which to populate TEI Headers. Outlined below are the basic premises used to populate the main sections of the TEI header for the OAAP project. These premises are within the general TEI P5 guidelines. The main objective is to provide consistency within each section while also providing a level of detail per item that is readily available from existing metadata and current project tracking spreadsheets and methods.
- File description
· titleStmt, publicationStmt and notesStmt tags contain information regarding the digital resource
· sourceDesc contains information about the original source document using the <bibl> tag
- Encoding description
· boilerplate statements describing the project markup practices and
· list of taxonomies used to describe each resource
- Profile Description - assigned controlled vocabulary terms per digital resource
- Revision Description
· Significant changes to previously published TEI documents will contain a description of the revision. Published TEI documents are considered to be those documents ingested into the production instance of DSpace, as these documents have been previously shared with the general public.
Simple data map
The below table shows a simplified map between the qualified Dublin core elements used to populate DSpace item record and the TEI tags used to populate the Header section of the TEI document.
· Metadata elements shown in green represent data that is directly hard coded into the TEI/XML template and are deem to be same for every record of the project. All other data may vary per resource and therefore is being pulled from a consolidated metadata spreadsheet.
· Data Definitions for any elements that begin with a dc notation can be found in the Application Profile documentation off the project web site.
TEI / Metadata elements /section / tags / Spreadsheet columns /
fileDesc
titleStmt / <title> / dc.title + tei.titleVersion
<funder> / dc.contributor.funder
respStmt / respCreation of digital images:</resp / tei.images
respCreation of transcription:</resp / tei.Name
respCreation of translation:</resp / dc.contributor.translator
respConversion to TEI-conformant markup:</resp / tei.Markup
respParsing and proofing:</resp / tei.Dept
respSubject analysis and assignment of taxonomy terms:</resp / tei.Cataloger
publicationStmt / <publisher> / dc.publisher
pubPlace / dc.pubPlace
<date> / tei.date.digital
idno / dc.identifier.digital
<availability> / Boilerplate
notesStmt / <note> / Boilerplate
<note> / dc.description.translation
sourceDesc
<bibl> / <title> / dc.title
<title type="alt"> / dc.title.alt
<title type="sub"> / dc.relation.isPartof [series title]
<author> / dc.contributor.author
<editor> / dc.contributor.editor
<edition> / dc.relation.isVersionof.edition
<date when="" / dc.date.issued
</date> / dc.date.original || dc.date.issued
idno / dc.source.collection
<note> / dc.source.provenance
<note> / dc.description
encodingDesc
projectDesc / Boilerplate
editorialDecl
<interpretation> / Boilerplate
<correction> / Boilerplate
<quotation> / Boilerplate
<normalization> / Boilerplate
classDecl / hard coded
profileDesc
<language ident="spa"Spanish</language> / dc.language.iso
textClass / <keywords scheme="AAT" / dc.format.medium
<keywords scheme="LCSH" / dc.subject.lcsh
<keywords scheme="TGN" / dc.coverage.spatial
revisionDesc
<change> / <name> / tei.revName
<date> / tei.revDate
<list<item> / tei.revItem1
<list<item> / tei.revItem2
Boilerplate statements
File Description
funder
Funding for the creation of this digitized text is provided by a grant from the Institute of Museum and Library Services. – only applicable for Rice collection
availability> (part of publication statement)
This digital text is publicly available via the Americas Digital Archive through the following Creative Commons attribution license: "You are free: to copy, distribute, display, and perform the work; to make derivative works; to make commercial use of the work. Under the following conditions: By Attribution. You must give the original author credit. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above."
note>
Page images of the original document are included for surrogate documents. Images exist as archived TIFF files, JPEG versions for general use, and thumbnail GIFs.
Encoding Description
projectDesc
This digitized text is part of the Our Americas Archive Partnership (OAAP) project.
Editorial Declarations:
interpretation
<p>This text has been encoded based on recommendations from Level 4 of the TEI in Libraries Guidelines./p>
<p> Any comments on editorial decisions for this document are included in footnotes within the document with the author of the note indicated.</p>
</interpretation>
correction
<p>All digitized texts have been verified against the original document.</p>
</correction>
quotation
<p>Quotation marks have been retained.</p>
</quotation>
normalization
<p>For printed documents: Original grammar, punctuation, and spelling have been preserved.
No corrections or normalizations have been made, except that hyphenated, non-compound words that appear at the end of lines have been closed up to facilitate searching and retrieval.</p>
<p>For manuscript documents: Original grammar, punctuation, and spelling have been preserved. We have recorded normalizations using the reg element to facilitate searchability,
but these normalizations may not be visible in the reading version of this electronic text</p>
Other TEI elements
· tei.titleVersion – title for TEI document is the same as formal bibliographic title plus the text “Digitized Version” per TEI guidelines
· respStmt – Responsibility names are tracked at various activities through out the digitization process, including: scanning, transcription, XML encoding, proofing and bibliographic control work.
o For in house (manually process) works, the transcriber or translator may be different from the XML encoder. For printed text, both the encoding and transcription work is assigned to Vendor
o QC practices for Parsing and proofing responsibilities are assigned at the level (rather than individual name).
o tei.catalogers library staff who assigned controlled vocabulary terms and subject analysis at the document level.
· classDecl - declared controlled vocabularies used in this project (e.g. LCSH, AAT , TGN)
· tei.edition – Edition is not an official separate qualified field in Dublin core schema. This information is typically provided either directly as part of the title or with in a description field. Melissa Torres, metadata librarian has graciously volunteered to manually extracting this information and parse out for TEI header purposes.
· revisionDesc - Required for all pilot documents as these have been previously published online for at least 2 years, so changes need to be documented within the TEI/XML coding.
Example:
revisionDesc
<change>
<name>Dr. Lisa Spiro, Director of the Digital Media Center, Rice University</name>
date>2009</date>
list<item> This electronic text has been converted from P4 to P5</item</list>
</change>
</revisionDesc
- 3 -