Laborious Object Recognition

Dallenbach, K. M. (1951). A puzzle-picture with a new principle of concealment. American Journal of Psychology, 64, 181-191.

Gray, C. M., Koenig, P., Engel, A. K., & Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338, 334-337.

Hebb, D. O. (1949). The organization of behavior. Wiley.

Hubel, D. H., & Wiesel (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195, 215-243.

Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition, Psychological Review, 99, 480-517.

Kolers, P. A., & Roediger, H. L. (1984). Procedures of Mind. Journal of Verbal Learning and Verbal Behavior, 23, 425-449.

Kreiter, A. K., & Singer, W. (1996). Stimulus-dependent sychronization of neuronal responses in the visual cortex of awake macaque monkey. Journal of Neuroscience, 16, 2381-2396.

Logothetis NK, Pauls J, Poggio T. 1995. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol, 5:552-63

McClelland, J. L., & Rumelhart, D.E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.

Moran, J., & Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex. Science, 229, 782-784.

Perrett, D. I., Smith, P. A. J., Potter, D. D., Mistlin, A. J., Head, A. D., Jeeves, M. A. (1984). Neurones responsive to faces in the temporal cortex: Studies of functional organization, sensitivity to identity and relation to perception. Human Neurobiology., 3, 197-208

Rodriguez, E., George, N., Lachaux, J-P, Martinerie, J., Renault, B., & Varela, F. J. (1999). Perception's shadow: Long-distance synchronization of human brain activity. Nature, 397, 403-433.

Rzempoluck, E. J. (1998). EEG changes index camoflaged object identification: A pilot study, Biological Psychology, 47, 181-191.

Singer, W. (1995). Development and plasticity of cortical processing architectures, Science, 270, 758-764.

Tovee, M. J., Rolls, E. T., & Ramachandran, V. S. (1996). Rapid visual learning in neurones of the primate temporal visual cortex. Neuroreport, 7, 2757-2760.

Yu, K., & Blake, R. (1992). Do recognizable figures enjoy an advantage in binocular rivalry. Journal of Experimental Psychology: Human Perception and Performance, 4, 1158-1173.

Dolan, R. J., Fink, G. R., Rolls, E., Booth, M., Holmes, A., Frackowiak, R. S. J., & Friston, K. J. (1997). How the brain learns to see objects and faces in an impoverished context. Nature, 389, 596-599.

Perception is commonly delineated into "bottom-up" and "top-down" processes. Bottom-up processes are those that begin with low-level perceptual features derived from a stimulus, and compose them together into larger and larger units until a coherent perceptual interpretation of an entire scene is constructed. Via top-down processes, an oberserver's expectations, knowledge, and experience influence how the individual elements of a scene are interpreted. These two types of processes are not mutually exclusive, and there are formal models that provide an account of how top-down and bottom-up processing can each have a simultaneous influence on the other (McClelland & Rumelhart, 1981). Typically, expectations and stimulus information will mutually determine the perceptual interpretation given to an object. Still, one striking phenomenon that demonstrates a contribution of experience-driven expectations to object perception is the subjective difference between perceiving a degraded image of an object before and after the true interpretation of the object has been revealed.

As originally described by Dallenbach (1951), when observers are shown degraded images such as Figure X, they frequently cannot determine the object being represented, even though the object comprises the major part of the image and is depicted in a canonical perspective. When the object is pointed out to an observer, the observer frequently has an "Aha" reaction in which the degraded image is readily interpreted. Once interpreted, it is difficult for the observer to return to their naive state of seeing the image as a set of unorganized blotches. This phenomenon suggests a powerful role of experience-driven expectations because the physical information contained in a degraded image is the same before and after its interpretation has been revealed (pre- and post-revelation). The subjective difference in perception of the degraded image comes from perceptual learning that requires only a single presentation of the original, undegraded image.

The subjectively different perceptual experiences associated with pre- and post-revelation degraded images are reliably associated with differences in brain activity. Magnetic Resonance Imagery (MRI) has revealed that post-revelation images produce higher activity in parietal and inferior temporal regions than do pre-revelation images (Dolan et al, 1997). The inferior temporal area is known to be associated with object recognition (Moran & Desimone, 1985), particularly for the recognition of familiar objects (Logothetis, Pauls, & Poggio, 1995). This MRI evidence is consistent with single-cell recordings of neurons in the inferior temporal region of macaque monkeys. Degraded face images produced higher neuron firing rates when they were presented after the original , undegraded face images were revealed than before revelation (Tovee, Rolls, & Ramachadran, 1996). Finally, there is evidence from eletroencephalogram (EEG) recordings in humans that one of the brain difference between images that are coherently interpreted and those that are not is that the former causes more synchronized neural activity in at 34-40 HZ (in the Gamma frequency range). Rodriguez et. al. (1999) showed their participants degraded images of faces, and separately analyzed those trials where participants did and did not perceive faces. Participants who interpreted upright degraded faces as depicting faces showed greater synchronized neural activity between left parieto-occipital and frontotemporal regions at 250 milliseconds after the onset of the stimulus than did participants who did not interpret inverted degraded faces as depicting faces.

These explorations of neural activity suggest two accounts for what occurs when an image is given a meaningful interpretation. First, detectors in a particular region may signal the interpretation of an object. Such an account is consistent with work suggesting the existence of neurons that are selectively activated not only by simple stimulus features such as lines moving at particular orientation, but also of by complex stimulus configurations such as hands or faces (Hubel & Wiesel, 1968). The specicity of cells in the inferotemporal regions is at least partially learned given that it is especially pronounced for familiar faces (Perrett et al., 1984). Second, a coherent interpretation of an image may be the result of binding together neural activity caused by parts of the image that come from the same object. By this account, coherent objects are represented by dynamically forming assemblies of neurons (Hebb, 1949). One of the main candidates for "labeling" neural activity that is to be bound together is by synchronizing the electrical discharges between neurons within an assembly (Gray et al, 1989; Singer, 1995). The strength of response synchronization between neurons reflects perceptual constraints such as the Gestalt laws of organization, including continuity, proximity, similarity, colinearity, and common fate (Kreiter & Singer, 1996; Singer, 1995). However, given the results of Rodriguez et al., top-down interpretability, as well as bottom-up stimulus properties, may determine the synchronization of neural activity. One advantage of representing objects by the synchronized neural activity rather than the firing rate of individual neurons is that a complex scene can be decomposed into several objects, with the neurons responding to different parts firing with different phases (Hummel & Biederman, 1992).

Ascending from neural-level considerations to a behavioral analysis, there are four revealing properties associated with changing the subjective perception of a degraded image by previously revealing its original form. First, once revealed, the correct interpretation of a degraded image persists even if the image is presented only after delay (see our Experiment X). Pilot work in our laboratory suggests that even after delays of two months, there is a strong influence of revelation on degraded image interpretation. Second, once the degraded image has been revealed, it is hard to look at the degraded image and not interpret it, or to give it an alternative interpretation. Although alternative interpretations are frequent pre-revelation, they seem to be inhibited by the propert interpretation. Third, providing an interpretation of an image by verbally presenting its category (e.g. saying "look for a cow") is much less effective in changing subjective organization than is either showing the original version of the image, or a simplified drawing of it (Dallenbach, 1951). Fourth, presenting the original version of an image facilitates interpretation of the degraded image much more if they are presented at roughly the same time (see our Experiment X). The original picture is much more effective if it actively used to interpret the degraded picture, rather than simply being a passive prime. Together, these properties suggest that exposure to an original picture acts to prime the procedure of segmenting an image into objects and background. A hallmark of a strong "aha" effect (i.e. large difference between pre- and post-revelation subjective experience) is that figure-ground segmentation cues in the image conflict with the actual segmentations required to correctly interpret the object. For example, in Figure X.... . Simultaneous exposure to an original and degraded image allows people to tune their figure-ground segmentation processes to create the correct segmentation of the degraded picture. By stressing procedural priming, rather than semantic, strategic, or episodic priming, we are claiming that the large difference between pre- and post-revelation perception of degraded images stems from altering the segmentation routines that take relatively unprocessed inputs and produce structured figure/ground organizations.

Open questions: when/how early? Is pathway "greased" just like it is when an object becomes familiar? That is, is a post-revealed object just like an object that has been presented many times before?

Yu and Blake - show some people what the dalmatian really is a picture of - revelation. Results: dog predominates more even if it is not revealed.

Information about structural configuration is registered early. object superiority effect - Pomerantz et al. Weisstein, N., & Harris, C. S. (1974). visual detection of line segments: An object-superiority effect. Science, 186, 752-755.

Degraded object perception as a model of agnosia. Like agnosics, people can see the degraded object fine, and could reconstruct it quite well. They just can't combine the parts together to create a coherent interpretation- exactly what apperceptive agnosics complain of.

ACCESSION NUMBER

1996-09110-003

DOCUMENT TYPE

Journal-Article

TITLE

Identification of fragmented pictures under ascending versus fixed presentation in young and elderly adults: Evidence for the

inhibition-deficit hypothesis.

AUTHOR

Lindfield,-Kimberly-C.; Wingfield,-Arthur; Bowles,-Nancy-L.

SOURCE

Aging-and-Cognition.1994 Dec; Vol 1(4): 282-291.

ISSN0928-9917

PUBLICATION YEAR

1994

ABSTRACT

Hypothesized that (1) older adults have deficient inhibitory processes and (2) poorer performance in ascending than in fixed

presentations of fragmented stimuli is due to residual activation interference. 24 60-86 yr old volunteers and 24 17-22 yr old

college students were tested for the ability to identify degraded pictures that were presented using either an ascending (AC) or

a fixed (FC) condition. In the AC, Ss identified the pictures at each level of increasing completeness until correct identification

was achieved. In the FC, Ss identified degraded pictures that were presented once at an intermediate level of visual

completeness. An ANOVA confirmed that accuracy was higher in the FC than the AC, and that the main effect of age was not

significant. When Ss were equated on a pretest for performance on the AC, a marginal trend was found for the elderly Ss only.

Additional evidence for reduced inhibitory processes in older Ss was seen in the Ss' correct response latencies. Results are

TITLE

Perceptual/sensory information versus performance level as indicators of competitive activation in an object identification task:

Evidence from aging.

AUTHOR

Lindfield,-Kimberly-C.; Wingfield,-Arthur

SOURCE

Brain-and-Cognition.1998 Jun; Vol 37(1): 24-27.

ISSN0278-2626

PUBLICATION YEAR

1998

ABSTRACT

Young and elderly adults were tested for the ability to identify degraded pictures presented either in a series of incremental

steps with each step increasing the completeness of the visual information (ascending condition) or in one single exposure

(fixed condition). The probability of correct identification in the fixed condition was better than the ascending condition once

the amount of visual information shown reached a certain level of completeness. This was the case for both age groups tested

even though the performance of older adults was lower than young adults. Findings are consistent with the competitive

Cortical dynamics of three-dimensional figure-ground perception of two-dimensional pictures.

AUTHOR

Grossberg,-Stephen

SOURCE

Psychological-Review.1997 Jul; Vol 104(3): 618-658.

ISSN0033-295X

PUBLICATION YEAR

1997

ABSTRACT

Develops the FACADE theory of 3-dimensional (3-D) vision and figure-ground separation to explain data concerning how

2-dimensional pictures give rise to 3-D percepts of occluding and occluded objects, and how geometrical and contrastive

properties of a picture cooperate or compete when forming the boundaries and surface representations that subserve conscious

percepts. Spatially long-range cooperation and spatially short-range competition work together to separate the boundaries of

occluding figures from their occluded neighbors, and this process is sensitive to image T junctions at which occluded figures

contact occluding figures. These boundaries control the filling-in of color within multiple depth-sensitive surface

representations. Feedback between surface and boundary representations strengthens consistent boundaries while inhibiting

inconsistent ones. Both the boundary and the surface representations of occluded objects may be amodally completed, while the

surface representations of unoccluded objects become visible through modal completion. Functional roles for conscious modal

and amodal representations in object recognition, spatial attention, and reaching behaviors are discussed. Model interactions are

Do recognizable figures enjoy an advantage in binocular rivalry?

AUTHOR

Yu,-Karen; Blake,-Randolph

SOURCE

Journal-of-Experimental-Psychology:-Human-Perception-and-Performance.1992 Nov; Vol 18(4): 1158-1173.

ISSN0096-1523

PUBLICATION YEAR

1992

ABSTRACT

Five experiments examined whether recognizable stimuli predominate in binocular rivalry. It was found that a face

predominated more than did a pattern equated for spatial frequency, luminance, and contrast; an objective reaction time

(RT) procedure confirmed predominance of the face. The face was still liable to fragmentation as stimulus size increased.

Observers tracked exclusive dominance of a picture of a camouflaged figure (a Dalmatian dog) prior to and then

following discovery of the figure's presence; control observers received the same protocol with a scrambled version of the

dog stimulus. Compared with control results, predominance of the dog picture was higher even before observers knew of

the camouflaged figure. Inversion of the dog figure reduced its predominance. Binocular rivalry is sensitive to

Recognition of computer-generated pictures on monochrome monitors.

AUTHOR

Baker,-Patti-R.; Belland,-John-C.; Cambre,-Marjorie-A.

SOURCE

Journal-of-Computer-Based-Instruction.1985 Fal; Vol 12(4): 104-107.

ISSN0098-597X

PUBLICATION YEAR

1985

ABSTRACT

Examined whether 64 2nd-4th graders could recognize computer-generated pictures on monochrome monitors. Ss were

randomly assigned to 1 of 2 conditions. Ss in the 1st treatment were asked to identify on a monochrome monitor a figure

that was initially presented in its original form and then as a redesigned, more distinguishable figure. The redesigned

figure had greater figure^ground contrast because color substitutions were made that used pixel patterns to provide contrast

in the monochromatic display. The order of picture presentation was reversed for Ss in the 2nd treatment. Ss also

completed the Children's Embedded Figures Test to assess their field independence^dependence. Results indicate that

regardless of grade or field independence^dependence characteristics, Ss were unable to discern critical features of a color

graphic displayed on a monochromatic monitor unless it was designed to enhance figure^ground separation. Implications for

Evoked potential correlates of figure and ground.

AUTHOR

Landis,-Theodor; Lehmann,-D.; Mita,-T.; Skrandies,-W.

SOURCE

International-Journal-of-Psychophysiology.1984 Jun; Vol 1(4): 345-348.

ISSN0167-8760

PUBLICATION YEAR

1984

ABSTRACT

Brain potentials averaged during the viewing of an alternating positive and negative "hidden man" puzzle picture were

averaged from 8 Ss before and after they learned to recognize the figure. After vs before recognition, there was

significantly more evoked positivity at 64/96 msec latency and more negativity at 224/256 msec and 352-480 msec latency

over parietal areas during the viewing of the positive picture (recognizable as a face). It is hypothesized that separate

physiological changes might reflect learned meaningfulness of the figure (which entails increased attention) and figure

Is visual image segmentation a bottom-up or an interactive process?

AUTHOR

Vecera,-Shaun-P.; Farah,-Martha-J.

SOURCE

Perception-and-Psychophysics.1997 Nov; Vol 59(8): 1280-1296.

ISSN0031-5117

PUBLICATION YEAR

1997

ABSTRACT

Visual image segmentation is the process by which the visual system groups features that are part of a single shape. In Exps 1

and 2, Ss were presented with two overlapping shapes and were asked to determine whether two probed locations were on the

same shape or on different shapes. The availability of top-down support was manipulated by presenting either upright or

rotated letters. Ss were fastest to respond when the shapes corresponded to familiar shapes--the upright letters. In Exp 3, a

variant of this segmentation task was used to rule out the possibility that Ss performed same/different judgments after

segmentation and recognition of both letters. Exp 4 ruled out the possibility that the advantage for upright letters was merely

due to faster recognition of upright letters relative to rotated letters. Results suggest that the previous effects were not due to

faster recognition of upright letters; stimulus familiarity influenced segmentation. The results are discussed in terms of an