IMAGE RETRIEVAL (1997)

Theoretical analysis and empirical user studies on accessing information in images.

Susanne Ornager

Royal School of Librarianship, Birketinget 6, 2300 Copenhagen S., Denmark.

Abstract

The paper touches upon indexing and retrieval for effective searches of digitized images. Different conceptions of what subject indexing means are described as a basis for defining an operational subject indexing strategy for images. The methodology is based on the art historian Erwin Panofsky, and his work on renaissance paintings. On the basic of works of art he develops a theory about ways in which one analyses representational images. Panofsky describes three levels of meaning in a work of art which indicate a difference in presupposed knowledge i.e. nothing (or only practical experience), special knowledge about image codes, and special knowledge about history of ideas. The semiologist Roland Barthes has established a semiology for pictorial expressions based on advertising photos. Barthes uses the concepts denotation/connotation where denotations can be explained as the sober expression of signs and connotation as meanings relating to feelings or associations. A joint methodology is suggested between the two researchers and the methodology is implemented in analyzing press photos. Fields of application discussed include the messages in an image and the linking between information running from text, image to object. An empirical study, based on 17 newspaper archives, demonstrates user group requirements including archivists (creators), journalists (immediate users), and newspaper readers (end-users). A word association test is completed and the terms are used to build a user interface. The empirical analysis demonstrates how the results can be applied as the foundation for a semantic model.

INTRODUCTION

The increasing number of image databases and the Internet access to images have emphasized the need for research in other areas apart from the technical. We know how to capture, store, and transmit images but we are only at the exploratory stage when it comes to indexing and retrieval (Crehange, 1989; Lesk, 1990; Leung, 1991, Enser, 1995). This paper will touch upon theoretical image analysis and report on an empirical study about criteria for analysis and indexing digitized images, and the different types of user queries done in newspaper image archives in Denmark. As the subject of this paper is image we attempt to define the concept. We use the definition: A two-dimensional visual representation accessible to the naked eye and generally on an opaque backing. Used when a more specific term e.g. art original, photograph, study print etc. is not appropriate (AACR, 1978; Friis-Hansen, 1996). As implied in the definition of image the word photograph is mentioned as a more specific term to image. To distinguish the photograph from the press photograph we agree with the prerogative that a press photo is never without a written commentary (Barthes, 1961). It should be noted that the words image and picture are treated as synonyms in this paper.

INDEXING

Different people has different conceptions of what subject indexing means. As a basis for defining an operational subject indexing strategy for images we say that the main purpose of indexing is to construct representations of published items. Indexing is used as a matter of convenience to refer to all activities of subject classification (Lancaster, 1991, p. 16) i.e. the process of deciding what some item is about and of giving it a label to represent this decision. When we talk about subject information we talk about two types:

·  The information which is explicit i.e. information which is expressed in the terminology applied by the author of the document [1].

·  The information which is implicit i.e. information which is not directly expressed by the author, but which is readily understood by a (human) reader of a document.

The methodology generally recommended as good practice for subject analysis and indexing is a two step exercise:

·  One analyses a subject matter and expresses the conceived information (the aspect(s) of the topic) in a concrete statement, for instance in the form of an index term.

·  One translates the indexing term to a controlled vocabulary [2] of indexing terms. The preferred term acts as a surrogate for the concept.

While this distinction might look trivial, some explanation is necessary for ensuing discussions. In theory, the first step requires only the identification of the topics contained in the documents. This should not require any value judgment. Conversely, the second step involves a choice, that is a decision which needs criteria to be validated. In practice, however, the first step already implies a choice because of what was said before: an intuitive judgment is exerted to choose the most important aspects. Scanning a document to decide what it is about is the key operation in subject analysis relying on both explicit and implicit information in documents for determining adequate representations. Although the main purpose of indexing is defined as a representation of an item the main topic of indexing is the analysis of linguistic data for a specific purpose: document retrieval.

When analyzing images the same procedure is used i.e. the subject information given in an accompanying text (explicit) and the information in the image (implicit). One can ask, why do we try to verbalize the wordless? How can access to one medium be provided through another? How can we use the medium of language for subject analysis and indexing of images? The answer is that maybe we need the language as a tool allowing us to describe and thereby retrieve the photos - at least for the time being. It has been brought up by Svenonius (1994) that difficulties arise when an attempt is made to extend the scientific model of aboutness to domains that use a non-verbal symbolism. Svenonius limits the definition of scientific aboutness to “the subject of the document, what the document is about” thereby dissociating from Hutchins’ assertion that “we should never talk of the subject of a document. As we have seen, the judgements of subject content (by authors, readers and indexers) are influenced by so many factors that any particular statement of a document’s content should never be regarded as anything other than just one of many possible such statements. In other contexts and from other perspectives the same document may have other, quite different “subjects””(Hutchins, 1975). This observation is pinned down by Maron who distinguish between “subjective aboutness, objective aboutness and retrieval aboutness” (Maron, 1977). If we speak of visual aboutness we need to define the aboutness we are communicating about. We talk about a subject of a photo as that which a picture depicts - in doing so we use “subject” as the objective aboutness. We claim that the photo is a copy or the pure and simple denotation of reality. In most photos what is depicted is a subject in the sense that it can be named and indexed. The retrieval aboutness is seen as the information searching behavior of a class of individuals combining both the objective and the subjective aboutness or searching them as individuals. Of the different understanding of aboutness we regard the visual aboutness closest to Maron’s definition allowing us to index “what the photo is about” and to consider indexing as an answer to requests.

IMAGE METHODOLOGY

Among the several studies concerned with image research we will discuss the theories brought forward by the art historian Erwin Panofsky and the semiologist Roland Barthes. Panofsky describes three levels of meaning in a work of art (Panofsky, 1962). He uses the terms pre-iconography for the primary level, iconography for the secondary level, and the third level he calls iconology. Panofsky assumes that for the “reader” to be equipped to explain the first level it is necessary for him/her to be able to describe the motifs on the basis of his/hers practical experience, (s)he needs to be familiar with objects and events. The iconographical analysis presupposes much more than that familiarity with objects and events which we acquire by practical experience. It presumes a familiarity with specific themes or concepts as transmitted through literary sources. The third level involves the elucidation of intrinsic meaning or content i.e. the symbolic values or familiarity with the essential tendencies of the human mind. Panofsky’s theory about analysis of ways in which one perceives and interprets experience has been adapted first by Markey (1983) in her analysis of representational images and later by Shatford (1986). The latter amplifies the first two modes by distinguishing between what a picture is of and what it is about. The pre-iconographic level consists of an objective (factual) description of the picture and a subjective (expressional) description. A factual description of a picture of woman and child, for example, would concentrate on the ordinary, the recognizable, everyday element in the picture such as "Woman and child". The expressional element can be the way the picture presents itself, what it conveys, e.g. "Defeatist attitude". The iconographic level describes the cultural background of the picture. In order to identify that a man who raises his hat is greeting, one has to know of western tradition and culture. Panofsky's two first levels can according to Shatford be further subdivided corresponding to the following criteria: What the picture represents (Of) and What it expresses/is about (About). On the pre-iconographic level, Of covers the factual, whereas About refers to the expressional, i.e. an objective and subjective angle. On the iconographic level, Of may cover an objective view often expressed as a specific angle (for instance a person mentioned by name), whereas About represents mythical or abstract contents. The above statements may be summarized by saying that each of Panofsky's two first levels contains two aspects expressed by Of and About.

Ronald Barthes has established a semiology for all forms of expression including the pictorial. In his terminology, Barthes uses the set of concepts, denotation/connotation (Barthes, 1964). In brief, denotation can be explained as the sober expressions of the signs which are, however, a product of the meaning assigned to it by a given system of language within a given culture group. Connotations can briefly be defined as connotative meanings relating to feelings, associations, and aesthetic overtones. An example of a denotation would be "a girl", "a fire" and the connotation for this “freedom”, “heroin” and "Joan of Arc".

On the basis of for example advertising photographs, Barthes concludes that the photograph conveys three messages: a linguistic; a literal; and a symbolic. The linguistic message lies in the text, if any, annexed to the photograph. The purpose of this is to emphasize the meaning of the photograph. As a photograph often holds many meanings, the linguistic message will anchor the contents, that is pin down one single meaning. On the denotative level, the anchoring acts as an answer to the question "what is this?" The literal message has a purely descriptive function as it identifies the objects photographed. The symbolic message belongs to the connotation level. It reflects the subjective element of the photograph, and the individual or cultural experience and knowledge. We claim that in a photo the denotative level can be subdivided into a linguistic and a literal level while the connotative level only has one subgroup the symbolic see figure 1. below.

Figure 1. Different messages in a photograph

Photograph

ú¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ú

Denotative level Connotative level

ú ú ú

Linguistic level Literal level Symbolic level

Barthes distinguishes the photograph from the press photograph by claiming that a press photo is never without a written commentary (Barthes, 1961) i.e. the structure of the press photograph is not an isolated structure; it is in communication with at least one other structure, namely the text. In the text the substance of the message is made up of words while in the photo it is made up of lines, surface, shades etc. It is only when the study of each structure has been exhausted that it will be possible to understand the manner in which they complement one another. Of the two structures, one is already familiar, that of language, while almost nothing is known about the other, that of the photograph. What does the photo transmit? The scene itself Barthes claims but Barthes already stated that the information in the press photo was carried by two different structures. To the naked eye the photo is (not the reality but) an analogical reproduction of reality, however, a second meaning or a supplementary message is usually developed when we look at the photo. Barthes refers to this as the “culture” of the society receiving the message. Its signs are gestures, attitudes, expressions, colors, or effects, endowed with certain meanings by virtue of the practice of this “culture”. The photographic “copy” is taken as the pure and simple denotation of reality. To find the code of connotation could be to isolate elements of the photograph from the readers cultural situation but this task will take us a long way indeed. Another way is to let the reading depend on the readers culture and the knowledge of the world. Actually a good press photo makes ready play with the supposed knowledge of its readers.

Panofsky’s three levels indicate a difference in presupposed knowledge i.e. nothing (or only practical experience), special knowledge about image codes and literary sources, and special knowledge about the history of ideas i.e. the culture. Barthes also employs levels in the reading of a picture. The first level does not demand any specific learning about image codes or other codes. To grasp this first level in the image we only need the knowledge attached to our perception or to use Panofsky’s words our familiarity with objects and events. To read the content of the image it is necessary to learn the semantic picture codes or as Panofsky puts it to have knowledge about image codes. According to Barthes that is closely attached to the culture from where the image is seen, and according to Panofsky one needs to know about tradition and the history of ideas to be able to read the image. Although Panofsky bases his methodology on a study of Renaissance paintings and Barthes uses advertising photographs we claim that the two methodologies can be connected to a certain level.