Modality Based Working Memory

James Sulzen

School of Education
Stanford University

April 1, 2001

Abstract

This study tested a hypothesis that working memory is primarily modally organized. A free recall task was performed by presenting randomized stimuli sequences in seven presentation modalities (visual (V), auditory (A), haptic (H), kinesthetic (K), linguistic-auditory (LA), linguistic-visual (LV), and spatial-auditory (SA)). The same number of stimuli was presented in each modality on any given trial run. Results showed recall was linearly dependent upon the number of items in each modality up to a limit of about three items presented for a modality and then leveled out thereafter. Recency and primacy effects indicated that at least several of the modality recall sequences operated with differing underlying processes indicating further support for the independent modalities working memory hypothesis.

Introduction

"The study of models of memory often seems like a backwater in the overall study of memory. Models do not have a prominent place in experimental studies of memory and they are not used or examined by most researchers in the field... Recent development of models of long-term memory has proceeded relatively independently of other areas of memory research." (Ratcliff & McKoon, 2000, p. 571)

Studies of human short term and working memory have a very rich and long history (Ebbinghaus, 1885; James, 1890; Miller, 1956; and for surveys: Crowder, 1993; Bower, 2000; Baddeley, 2000 ). A number of models of human memory and working memory have been proposed and tested over time, especially those involving verbal or visual elements. There have also been a number of studies demonstrating various modal forms of short term memory (STM) such as for haptic and olfactory capacities (Schurman, 1973; White, 1997). Baddeley and Hitch’s (1974) classic modal model of working memory combining a spatio-visual, phonological, and executive control system was an initial attempt to articulate perceived modal-related sub-components of working memory. Since then, it seems reasonable to suppose that working memory is in fact fractionated among a number of modular systems as evidence accumulates for the existence of more and more different components (Weiskrantz, 1987; Baddeley, 2000). Recently, fMRI evidence has started to accumulate for a neurological basis for the phonological loop (Paulesu, Frith, & Frackowiak, 1993; Awh et al., 1996) and even for a modal basis of representing categories of objects such as living things (Schill-Thompson et al., 1999).

In addition to the mounting evidence that both working memory and perhaps long term memory (LTM) are organized along modal lines, there is strong evidence to indicate that the modal systems highly interact with each other. In the Schill-Thompson study (1999), it appears that visual centers are always activated whenever a subject is asked to think about any aspect of a living thing (even such as parts of or the food of living things – i.e., “are snails edible”). This is taken to indicate that the category of living things seems to have a primary visual element which seems principally responsible for triggering other modalities, and brain damage to a modal visual area might therefore well impair retrieval of the associated memories in the other modalities. Cross-modal priming is a fairly clear example of interaction. McKone (McKone & Dennis, 2000) found that auditory or visual stimuli acted to prime stimuli in the other modality. Perhaps of more interest in terms of the current writing, they found that same modality priming has a greater effect than cross modality priming, and that visual versus auditory priming of non-words is different (auditory performs better). McKone interprets these results as indicating a perceptual basis locus for priming with some form of weak re-encoding occurring to effect the cross-modal priming.

There is also evidence for non-sensory based, but modal storage. Penney (Penney, 1989) reviewed the literature on auditory and visual modality effects and concluded that auditory and visually presented words were re-encoded in a phonological store accessible from either, and that the auditory and visual channels represent two separate processing streams. Her argument is based upon five points:

1)  Improved ability to perform two concurrent verbal tasks when different input modalities are employed relative to the single-mode situation;

2)  Improved memory when different items are presented to two sensory modalities rather than one;

3)  Selective interference effects within, but less so across, modalities;

4)  Subjects' preference for, and greater efficiency of, recall organized by modality than by time of presentation; and

5)  The presence of short-term memory deficits that appear to be specific to the auditory or visual modalities.

Additionally, Penney showed that bilingual speakers prefer to organize recall tasks by modality of presentation, as opposed to organizing recall by language of presentation, time of presentation, or category of item.

Another bilingual study (Dehaene et al., 1999) showed that precise arithmetic calculations are carried out in one’s native language (i.e., the language in which arithmetic was presumably learned), whereas approximate arithmetic calculations are carried out via visual and spatial means. This finding in conjunction with the concept of the independent phonological store, leads to an implication of language, or perhaps rather a linguistic capability, existing independently of any of the standard modalities.

On an informal, but perhaps intuitively satisfying basis, as far back as 1890 William James (James, 1890) provides an elegant example of cross-modal encoding of knowledge. Holding open the lips prior to thinking of any word with labials or dentals such as "bubble" or "toddle" distinctly affects most people's recall process. (“Is your image under these conditions distinct? To most people the image is at first ‘thick’ as the sound would be if they tried to pronounce it with lips parted.” p. 63). This would seem to be an example of interfering with a cross-modal retrieval across at least the haptic (touch), kinesthetic (sensiomotor), visual, and verbal systems.

Given the evidence for both some sort of modularized sub-specialization of working memory, some of which certainly seems to organize along modal lines, it seems reasonable to suppose that each modal sensory system may have its own working memory component. Goldstone and Barsalou (1998) have argued that there are many reasons to believe that much of cognition is perceptually based and proceeds via perceptual representation processes. They argue along the following lines:

1)  That many if not in fact all of the properties associated with amodal symbol systems can be achieved with perceptually-based systems (such as productivity);

2)  Raw perceptual processing is often much more powerful for certain tasks than an equivalent amodal system;

3)  Perception naturally supports similarity;

4)  Perception can be readily tuned to conceptual demands;

5)  Perceptual simulation occurs in conceptual tasks and which have no explicit perceptual demands (for example, Maxwell’s imagining microscopic spinning spheres in dialectrics when developing his Electrodynamics equations (Nersessian, unpublished), or Einstein utilizing his visualizations of space-time when developing relativity).

Countering these claims and conjectures, have been theories of episodic, semantic, and other memory organization (Baddeley, 2000). There is also strong evidence that people can organize working memory around categories – that is to say that structuring items by category in effect seems to create something of a “separate” short term memory for each category leading to a two to three fold improvement in working memory capacity (Watkins & Peynircioglu, 1983; Bower, et al., 1969). These category effects even show recency and primacy effects. We will address these issues of categorization and non-modal organization in the discussion section.

Modality-Organized Cognition

The evidence for multiple, modality-related working memory components leads to a supposition that perhaps each modality has its own working memory and some level of cognitive processing capability. If each modality has its own working memory and processing capability, then why not its own long term memory and its own deeper cognitive processing capability? Following these conjectures to some sort of logical conclusion leads to a possible memory and cognitive functional organization as illustrated by Figure 1.

Figure 1 illustrates that a certain number of modal units interact with each other to create the experience of cognition. Some of these are “first-level” modal processing loci, each directly connected to its own sensory system via the sensory registers. There are also a number of “second-level modal loci” each with its own specialization. In this model, every modality loci (hereafter referred to as modalities) is connected to and capable of stimulating or receiving stimuli from any other modality. This interaction probably operates through or in conjunction with the type of centralized switching network referred to as a “central executive” (Baddeley & Hitch, 1974). The second-level stimuli have no direct connection to external sensory registers and so must receive their sensory inputs only by first-level restimulation.

The set of modalities represented here were selected because experimental evidence indicated a functional nexus for each and because they seem to represent a minimal set that spans many cognitive phenomena. There may also be “tertiary” or other modalities serving to organize social cognition, personality or other functions, but the above model does not address such possibilities. The model provides an organizing framework for representing relatively low-levels of cognition involving perception and knowledge representation.

Figure 1 – Modality-organized cognition

The rest of this writing will use the single-letter abbreviations listed in Figure 1 to identify each of the modal systems. When it is necessary or useful to distinguish which first level modality is interacting with a given second level one, the two letters are combined, so “LV” means a visually presented linguistic item, while “SA” means an auditorily presented spatial stimulus.

Figure 1 should be interpreted in light of the following:

-  Representational Systems: Each modality should be thought of as a “representational system” which represents processes, knowledge, perceptions, and sensory experience in its own particular way. V represents knowledge in pictures and images, A in sounds, and so on. K is the kinesthetic sensiomotor system. L is a pure linguistic system that represents knowledge and does its processing in terms of sequenced and syntactically ordered symbols. S is a system that represents spatial knowledge and performs spatial processing. E controls our affective memories and processing. The other modalities should be self-explanatory.

-  Completeness of each modality: In this model, each modality is a complete cognitive processing system with its own working memory, long term memory, and processing capabilities. The type and manner of internal organization is probably very specific to the given modality (i.e. S is probably very differently organized than E or than V, for example). This helps explain some of the modality differences observed in the literature such as the slight superiority of recalling auditorily presented words as opposed to visually presented ones during free recall tasks.

-  Cross-stimulation and multi-modal representations: Each modal system is constantly stimulating each other system with its outputs, including stimulating itself with its own outputs (i.e. feedback). This cross stimulation probably provides a capacity for feedback loops and re-encoding of stimuli, as well as higher level organizations of cognition.

The question arises as to how these separate systems combine or interact, and why is it not more obvious that such separate systems exist? Following evidence from Schill-Thompson (1999), it seems probable that it may often require several cross-stimulating modal systems to meaningfully represent concepts and various sorts of knowledge. Consider the category of 'living things', which, according to their data, appears to have a necessary visual component, but which also has elements in other modalities to define its representation. If the visual portion of the ‘living things’ representation were impaired via a lesion for example, then the other elements that make up the 'living things' representation would still be intact, but not be capable of being stimulated. Therefore the person loses knowledge of what a 'living thing' is, even though most of the knowledge is still available (and indeed may be accessible via other cue paths.) The concept of ‘living things’ cannot be kicked into gear because the necessary visual element is missing from the stimulus chain. In a similar vein, James’ (1890) example with ‘bubble’ and ‘toddle’ could therefore be understood as indicating that the meaning or knowledge of these words is encoded across the L, H, K, and V modalities; and that interfering with one modality (K, when the lips are parted,) interferes with the retrieval process and the associated retrieval V image is changed.

As for it not being more obvious that these hypothesized internal systems have a distinct existence, the explanation might be that the extent of interactivity makes the whole seem like a monolithic entity making it tremendously difficult to discern the individual elements. Consider, as an analogy, aborigines trying to discern the internal structure of an automobile by being able to examine only its external appearance and perhaps drive it only in very limited and controlled circumstances. With neither the concepts nor useful tools for investigating internal combustion engines, they would have little chance of deducing internal electrical, carburetion, fuel, cooling, exhaust and other internal systems (although they might be able to deduce the existence of some systems such as steering and brakes that have relatively easily observed external correlates.) Similarly, with the human cognitive system - there is a tendency to regard memory as one large undifferentiated system with perhaps some salient subsystems such as vision, auditory, or spatial processing.

The rest of this paper can be found at:

http://ldt.stanford.edu/~jsulzen/psy264/Modality-expt-paper2.doc

References

Baddeley, Alan D. & Hitch, Graham (1993). The recency effect: Implict learning with explicit retrieval?. Memory and Cognition, 21(2), 146-155.

Baddeley, Alan (2000). Short-Term and Working Memory. In E. Tulving & F. I. M Craik (Eds.) (eds.) Oxford Handbook of Working Memory. Oxford : NY.

Bower, Gordon H., Clark, Michar C., Lesgold, ALan M., and Winzenz, David (1969). Hierarchical retrieval schemes in recal of categorized word lists. Journal of Verbal Learning & Verbal Behavior, 8(3), 323-343.

Bower, Gordon H. (2000). A Brief History of Memory Research. In E. Tulving and F. I. M. Craik (eds.) (eds.) The Oxford Handbook of Memory. Oxford : NY.

Cofer, C. N., Bruce, D. R., and Reicher, G. M. (1966). Clustering in free recall as a function olf certain methodological variables. Journal of Experimental Psychology, 71, 858-866.