Lecture Notes in Computer Science5

Subject-Oriented Work: Lessons Learned from an Interdisciplinary Content Management Project

Joachim W. Schmidt, Hans-Werner Sehring, Michael Skusa, and Axel Wienberg

Technical University TUHH Hamburg, Software Systems Institute,
Harburger Schloßstraße 20, D-21073 Hamburg, Germany

{j.w.schmidt, hw.sehring, skusa, ax.wienberg}@tuhh.de

Abstract. The two broad cases, data- and content-based applications, differ substantially in the fact that data case applications are abstracted first before they cross any system boundary while for content cases it is the system itself which has to map application content into some data-based technology. Through application analysis and software design we are aware of the difficulties of such mappings. In an interdisciplinary project with our Art History colleagues who are working in the subject area of “Political Iconography” we are gaining substantial insight into their Subject-Oriented Working (SOWing) needs and into initial requirements for a SOWing environment. In this paper we outline the project, its basic models, their generalization as well as our initial experiences with prototypical SOWing implementations. We emphasizes the conceptual and terminological aspects of our approach, sketch some of the technical requirements of a generic SOWing software platform and relate our work to various XML-based activities.

1 Introduction

As a result of advanced and extensible database technology now being available as off-the-shelf products, a substantial part of database research and development work has generalized into work on models and systems for multimedia content management. R&D in content management includes a range of models and systems concentrating on services for the following three lines of work:

-  content production and publication work using multimedia documents;

-  classification and retrieval work based on document content;

-  management and control of such work for communities of users differentiated by their roles and rights, interests and profiles, etc.

The work reported in this paper is based on an interdisciplinary project with a partner from the humanities with strong semantic and weak formal commitment. Our project partner specializes on work in icon- and text-based content from the subject area of “Political Iconography”. This content is organized as a paper- and drawer-based subject index (PI-“Bildindex”, BPI, see fig. 1) and is used for Art History research and education.

Iconographic work has a long tradition in Art History, dating back into 19th century “Christian Iconography”, and is based on integrated experience from three sources:

-  Art: multimedia content;

-  Art History: process knowledge;

-  Library Sciences: subject-oriented content classification and retrieval.

In our context we use the notion of subject-orientation very much in the sense of library science, as, for example, stated by Elaine Svenonius: “In a subject language the extension of a term is the class of all documents about what the term denotes, such as all documents about butterflies.” This understanding differs substantially from natural language where “the extension, or extensional meaning, of a word is the class of entities denoted by that word, such as the class consisting of all butterflies” [Sve2000]. And both understandings are in clear contrast to the semantics of terms in programming languages and database models.

· 

The interdisciplinary project “Warburg Electronic Library (WEL)” models and computerizes BPI content and services, and the WEL prototype allows interdisciplinary experiments and insights into multimedia content management and applications.

The overall goal of the WEL project is

-  the generalization of our subject-oriented working experience,

-  a work plan for R&D in subject-oriented content management, and

-  a generic Subject-Oriented Working environment (SOWing environment).

Currently, many contributions to such R&D are based on XML as a syntactic framework which provides a structural basis as well as some form of implementation platform. The main reasons for XML’s powerful position are its strong structural commitment and its semantic neutrality.

Successful content management requires that the three lines of work

-  content production and publication work by multimedia documents;

-  classification and retrieval work based on document content;

-  management and control of such work for communities of users differentiated by their roles and rights, profiles and interests, etc.

are not supported in isolation but in a coherent and cooperative working environment covering the entire space spanned by all three dimensions.

The main reason why XML-based work on content management often falls short can be stated as a corollary of XML’s strength (see above): its weak semantic and exclusively structural commitment. Much of the XML-based R&D contributes to the above three lines of work only individually. Examples include [ ], [ ], …[ ].

The paper is structured as follows. Section two introduces the two projects involved, the “Bildindex für Politische Ikonography (BPI)” and the “Warburg Electronic Library (WEL)”. In section three the WEL model is generalized towards a generic “Work Explication Language”. A system’s view of the WEL prototype is described in section four and the first contours of a generic Subject-Oriented Working environment (SOWing platform) are outlined. Related work, in particular work in the XML context, is discussed in section six. The paper concludes with a short summary and a presentation of future work.

Warburg Electronic Library: An Interdisciplinary Content Management Project

The development of the currently predominant data management models was heavily influenced by application requirements from the business and banking world and their bookkeeping experience: the concepts of record, tabular view, transaction etc. are obvious examples. Data model development had to go through several generations – record-based file management, hierarchical databases and network models – until the relational data model reached a widely accepted level of abstraction for database structuring and content-based data operation.

For traditional relational data management we basically assume that content is “values of quantified variables” from business domains operated by transactions and laid out as tables with rows and columns. The question arises:

-  How can we generalize from data management to the area of content management where content domains are not ”just dates and dollars”, content operation goes beyond “debit-credit transactions” and content layout means multimedia documents?

-  What are key application areas beyond bookkeeping which help us understand, conceptualize and finally implement the core set of requirements for multimedia content management in terms of domain modelling, content-oriented work support as well as content (re-) presentation?

Figure 1: Working with the Index for Political Iconography

in the Warburg House, Hamburg

The work presented here is based on an interdisciplinary R&D-project between Computer Science and Art History, the “Warburg Electronic Library” project. The application area was chosen because of Art History’s long-term working experience with content of various media. The project itself is founded on extensive material and user experience from the area of “Political Iconography”.

2.1  Subject-Oriented Work in Political Iconography

Political iconography basically intends to capture the semantics of key concepts of the political realm under the assumption that political goals, roles, values, means etc. requires mass communication which is implemented by the iconographic use of images. Our partner project in Art History, the “Bildindex zur Politischen Iconographie (BPI)”, was initiated in 1982 by the art historian Martin Warnke [ ] and consists of a roughly 1,500 named political concepts (subject terms, “Schlagworte”) and more than 300,000 records on iconographic works relevant to the BPI. In 1990 Warnke’s work was awarded the Leibniz-Preis, one of the most prestigious research grants in Germany.

Figure 2: BPI “Bildkarte” St. Moritz (media card) describing art work by attribute aggregation

Starting with this experience, BPI-work essentially relies on an art historian’s knowledge of (documents refering to) political acts in which images play an active role. Art historians interpret “acts” as encompassing aspects of

-  “projects” (who initiated and contributed to an act? the when and where of an act? etc.);

-  “products” (what piece of art did the project produce? on what medium? place of current residence etc.); and finally, the

-  “concepts” behind the act (what political goals, roles, institutions etc. are addressed? what iconographic means are used by the artist? etc.).

On this knowledge level, BPI work identifies political concepts and names them individually by subject terms – e.g., by “ruler”, “prince”, “pope”, “equestrian statue”.

Subject term semantics is methodologically captured and systematically represented in the BPI by the following steps:

  1. designing a conceptual (mostly mental), prototypical [see SOWA ref.] and representative model for each subject term, e.g., a prototypical equestrian statute;
  2. giving value to the relevant variables or facets of such prototypes by reference to the art historian’s knowledge of “good cases”, i.e., political acts with an iconographic dimension. Each such variable or facet is represented by a BPI-entry (“Bildkarte”, “Textkarte”, Videokarte” – “media card” etc.) which holds a description of a “good case” for that facet, see for example, St. Moritz, see fig. 2.;
  3. collecting all BPI entries on the same prototype into a single extent (“Bildkartenstapel”, …, “pile of media cards”, see fig. 3) thus defining the semantics of a subject term. Additional fine structure may be imposed on subject term extents (order, “neighborhood”, named subextents, general association/navigation etc.);
  4. maintaining a (“completion”) process aiming at the “best possible” definition of the subject area at hand by

-  “representative” subject terms covering the subject area at hand;

-  “qualifying” prototypes for each subject term;

-  “complete” sets of facets for prototype description;

-  “good” cases for facet substantiation.

This makes it quite clear that the BPI is by no means just an index for accessing an image repository. The BPI uses images only in their rather specific role as icons and for the specific purpose of contributing to the description of cases and thus to the semantics of subject terms. In this sense, images represent the iconographic vocabulary of BPI documents just as keywords contribute to the linguistic vocabulary of text documents.

Figure 3: BPI subject term semantics (e.g. equestrian statue) by media cards classification (image, text, video etc.)

The BPI has essentially two groups of users:

-  a few highly experienced BPI editors for content maintenance and

-  various user communities which access BPI content for research and education purposes.

Being implemented on paper technology, the traditional BPI shows severe conceptual and technical shortcomings:

-  conceptually: the above attributes “representative”, “good”, “complete”, etc. are highly subjective and, therefore, “completion semantics” is hard to meet even within a “single-person-owned” subject index;

-  technically: severe representational limitations are obvious and range from single subsumption of BPI entries to the lack of online and networked BPI access.

In the subsequent section we outline two contribution of the “Warburg Electronic Library” project which addresses such conceptual and technical shortcomings through an advanced Digital Library project which, as a prime application, is now hosting the “Index for Political Iconography”.

2.2  A Subject-oriented Working Environment: Warburg Electronic Library

Viewed from our Computer Science perspective which shifted in recent years from basic research in “persistent database programming” towards R&D in “software systems for content management (online, multimedia, …)”, the WEL project addresses a range of highly relevant and interrelated content management issues:

-  content representation by multiple media: images, texts, data, …;

-  content structuring, navigation and querying, content presentation;

-  content work exploiting subjects and ontologies: classification, indexing, …;

-  utilization of different referencing mechanisms: icon, index, symbol;

-  cooperative projects on multimedia content in research and education.

The WEL is an interdisciplinary project between the Art History department of Hamburg University (Research Group on Political Iconography, Warburg Haus, Hamburg) and the Software Systems Institute of the Technical University, TUHH, Hamburg. It began in 1996 as a 5-year project and will be extended into an interdisciplinary R&D-framework involving several Hamburg-based institutions.

For a short WEL overview we will concentrate on two project contributions:

-  semantic modelling principles for WEL-design;

-  personalized digital WEL libraries based on project-specific prototypes and their use in Art History education.

2.2.1 WEL Semantic Modelling Principles

The WEL design is based – as is already the BPI design – on the classical semantic data modelling principles [ ], [ ]: aggregation, classification, generalization / specialization and association / navigation (see figs. 2 and 3).

Figure 4: Media card associated with (multiple) subject terms and with information on classification work

However, it is important to note that the semantics of subject classes and their entries originate form different semantic sources and, therefore, concepts which go beyond classical data modelling (see sec. 4.2):

-  object semantics: seen from a data modelling point of view, subject class entries are also entities of some object classes in the sense of object-oriented modelling. However, a subject class extent may be heterogeneous because its entries may describe documents of different media – texts, images, videos etc. Therefore, subject class entries viewed as objects may belong to different object classes – text, image, video classes etc.

-  content semantics: furthermore, all content described by the entries of the same subject class shares some semantic key elements. All BPI documents referring, for example, to the subject class “ruler” make use of graphical key icons such as swords, crowns, scepters, horses etc.; similarly, the text documents associated with a certain subject class contain overlapping sets of subject-related textual keywords. Such sets of key icons capture essential parts of content semantics and, thus, of subject term definition. Note that specialization of subject classes goes along with extension and union of subject-related key icon sets while generalization relies on reduction and intersection.

-  completion semantics: in section 2.1 we referred to the soft constraint of achieving best possible subject definitions as “completion semantics”. Although this may be considered more as an issue of class pragmatics than class semantics it implies formal constraints on subject class extents. Since users of subject definitions act under the assumption that the subject owner established a subject extent which represents all relevant aspects of the owners subject prototype, any change of that extent is primarily monotonic, i.e., extents of subject terms are only changed by adding or replacing its entries. Therefore, references to subject class entries should not become invalid.

Figure 4 shows a media card for St. Moritz together with the (multiple) subject terms to which St. Moritz contributes. The example also links St. Moritz to details of the classification process by which this card entered the WEL. This information is essential for the realization of project-oriented views on subject terms, or reference libraries: