Laborious Object Recognition
Dallenbach, K. M. (1951). A puzzle-picture with a new principle of concealment. American Journal of Psychology, 64, 181-191.
Gray, C. M., Koenig, P., Engel, A. K., & Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338, 334-337.
Hebb, D. O. (1949). The organization of behavior. Wiley.
Hubel, D. H., & Wiesel (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195, 215-243.
Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition, Psychological Review, 99, 480-517.
Kolers, P. A., & Roediger, H. L. (1984). Procedures of Mind. Journal of Verbal Learning and Verbal Behavior, 23, 425-449.
Kreiter, A. K., & Singer, W. (1996). Stimulus-dependent sychronization of neuronal responses in the visual cortex of awake macaque monkey. Journal of Neuroscience, 16, 2381-2396.
Logothetis NK, Pauls J, Poggio T. 1995. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol, 5:552-63
McClelland, J. L., & Rumelhart, D.E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.
Moran, J., & Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex. Science, 229, 782-784.
Perrett, D. I., Smith, P. A. J., Potter, D. D., Mistlin, A. J., Head, A. D., Jeeves, M. A. (1984). Neurones responsive to faces in the temporal cortex: Studies of functional organization, sensitivity to identity and relation to perception. Human Neurobiology., 3, 197-208
Rodriguez, E., George, N., Lachaux, J-P, Martinerie, J., Renault, B., & Varela, F. J. (1999). Perception's shadow: Long-distance synchronization of human brain activity. Nature, 397, 403-433.
Rzempoluck, E. J. (1998). EEG changes index camoflaged object identification: A pilot study, Biological Psychology, 47, 181-191.
Singer, W. (1995). Development and plasticity of cortical processing architectures, Science, 270, 758-764.
Tovee, M. J., Rolls, E. T., & Ramachandran, V. S. (1996). Rapid visual learning in neurones of the primate temporal visual cortex. Neuroreport, 7, 2757-2760.
Yu, K., & Blake, R. (1992). Do recognizable figures enjoy an advantage in binocular rivalry. Journal of Experimental Psychology: Human Perception and Performance, 4, 1158-1173.
Dolan, R. J., Fink, G. R., Rolls, E., Booth, M., Holmes, A., Frackowiak, R. S. J., & Friston, K. J. (1997). How the brain learns to see objects and faces in an impoverished context. Nature, 389, 596-599.
Perception is commonly delineated into "bottom-up" and "top-down" processes. Bottom-up processes are those that begin with low-level perceptual features derived from a stimulus, and compose them together into larger and larger units until a coherent perceptual interpretation of an entire scene is constructed. Via top-down processes, an oberserver's expectations, knowledge, and experience influence how the individual elements of a scene are interpreted. These two types of processes are not mutually exclusive, and there are formal models that provide an account of how top-down and bottom-up processing can each have a simultaneous influence on the other (McClelland & Rumelhart, 1981). Typically, expectations and stimulus information will mutually determine the perceptual interpretation given to an object. Still, one striking phenomenon that demonstrates a contribution of experience-driven expectations to object perception is the subjective difference between perceiving a degraded image of an object before and after the true interpretation of the object has been revealed.
As originally described by Dallenbach (1951), when observers are shown degraded images such as Figure X, they frequently cannot determine the object being represented, even though the object comprises the major part of the image and is depicted in a canonical perspective. When the object is pointed out to an observer, the observer frequently has an "Aha" reaction in which the degraded image is readily interpreted. Once interpreted, it is difficult for the observer to return to their naive state of seeing the image as a set of unorganized blotches. This phenomenon suggests a powerful role of experience-driven expectations because the physical information contained in a degraded image is the same before and after its interpretation has been revealed (pre- and post-revelation). The subjective difference in perception of the degraded image comes from perceptual learning that requires only a single presentation of the original, undegraded image.
The subjectively different perceptual experiences associated with pre- and post-revelation degraded images are reliably associated with differences in brain activity. Magnetic Resonance Imagery (MRI) has revealed that post-revelation images produce higher activity in parietal and inferior temporal regions than do pre-revelation images (Dolan et al, 1997). The inferior temporal area is known to be associated with object recognition (Moran & Desimone, 1985), particularly for the recognition of familiar objects (Logothetis, Pauls, & Poggio, 1995). This MRI evidence is consistent with single-cell recordings of neurons in the inferior temporal region of macaque monkeys. Degraded face images produced higher neuron firing rates when they were presented after the original , undegraded face images were revealed than before revelation (Tovee, Rolls, & Ramachadran, 1996). Finally, there is evidence from eletroencephalogram (EEG) recordings in humans that one of the brain difference between images that are coherently interpreted and those that are not is that the former causes more synchronized neural activity in at 34-40 HZ (in the Gamma frequency range). Rodriguez et. al. (1999) showed their participants degraded images of faces, and separately analyzed those trials where participants did and did not perceive faces. Participants who interpreted upright degraded faces as depicting faces showed greater synchronized neural activity between left parieto-occipital and frontotemporal regions at 250 milliseconds after the onset of the stimulus than did participants who did not interpret inverted degraded faces as depicting faces.
These explorations of neural activity suggest two accounts for what occurs when an image is given a meaningful interpretation. First, detectors in a particular region may signal the interpretation of an object. Such an account is consistent with work suggesting the existence of neurons that are selectively activated not only by simple stimulus features such as lines moving at particular orientation, but also of by complex stimulus configurations such as hands or faces (Hubel & Wiesel, 1968). The specicity of cells in the inferotemporal regions is at least partially learned given that it is especially pronounced for familiar faces (Perrett et al., 1984). Second, a coherent interpretation of an image may be the result of binding together neural activity caused by parts of the image that come from the same object. By this account, coherent objects are represented by dynamically forming assemblies of neurons (Hebb, 1949). One of the main candidates for "labeling" neural activity that is to be bound together is by synchronizing the electrical discharges between neurons within an assembly (Gray et al, 1989; Singer, 1995). The strength of response synchronization between neurons reflects perceptual constraints such as the Gestalt laws of organization, including continuity, proximity, similarity, colinearity, and common fate (Kreiter & Singer, 1996; Singer, 1995). However, given the results of Rodriguez et al., top-down interpretability, as well as bottom-up stimulus properties, may determine the synchronization of neural activity. One advantage of representing objects by the synchronized neural activity rather than the firing rate of individual neurons is that a complex scene can be decomposed into several objects, with the neurons responding to different parts firing with different phases (Hummel & Biederman, 1992).
Ascending from neural-level considerations to a behavioral analysis, there are four revealing properties associated with changing the subjective perception of a degraded image by previously revealing its original form. First, once revealed, the correct interpretation of a degraded image persists even if the image is presented only after delay (see our Experiment X). Pilot work in our laboratory suggests that even after delays of two months, there is a strong influence of revelation on degraded image interpretation. Second, once the degraded image has been revealed, it is hard to look at the degraded image and not interpret it, or to give it an alternative interpretation. Although alternative interpretations are frequent pre-revelation, they seem to be inhibited by the propert interpretation. Third, providing an interpretation of an image by verbally presenting its category (e.g. saying "look for a cow") is much less effective in changing subjective organization than is either showing the original version of the image, or a simplified drawing of it (Dallenbach, 1951). Fourth, presenting the original version of an image facilitates interpretation of the degraded image much more if they are presented at roughly the same time (see our Experiment X). The original picture is much more effective if it actively used to interpret the degraded picture, rather than simply being a passive prime. Together, these properties suggest that exposure to an original picture acts to prime the procedure of segmenting an image into objects and background. A hallmark of a strong "aha" effect (i.e. large difference between pre- and post-revelation subjective experience) is that figure-ground segmentation cues in the image conflict with the actual segmentations required to correctly interpret the object. For example, in Figure X.... . Simultaneous exposure to an original and degraded image allows people to tune their figure-ground segmentation processes to create the correct segmentation of the degraded picture. By stressing procedural priming, rather than semantic, strategic, or episodic priming, we are claiming that the large difference between pre- and post-revelation perception of degraded images stems from altering the segmentation routines that take relatively unprocessed inputs and produce structured figure/ground organizations.
Open questions: when/how early? Is pathway "greased" just like it is when an object becomes familiar? That is, is a post-revealed object just like an object that has been presented many times before?
Yu and Blake - show some people what the dalmatian really is a picture of - revelation. Results: dog predominates more even if it is not revealed.
Information about structural configuration is registered early. object superiority effect - Pomerantz et al. Weisstein, N., & Harris, C. S. (1974). visual detection of line segments: An object-superiority effect. Science, 186, 752-755.
Degraded object perception as a model of agnosia. Like agnosics, people can see the degraded object fine, and could reconstruct it quite well. They just can't combine the parts together to create a coherent interpretation- exactly what apperceptive agnosics complain of.
ACCESSION NUMBER
1996-09110-003
DOCUMENT TYPE
Journal-Article
TITLE
Identification of fragmented pictures under ascending versus fixed presentation in young and elderly adults: Evidence for the
inhibition-deficit hypothesis.
AUTHOR
Lindfield,-Kimberly-C.; Wingfield,-Arthur; Bowles,-Nancy-L.
SOURCE
Aging-and-Cognition.1994 Dec; Vol 1(4): 282-291.
ISSN0928-9917
PUBLICATION YEAR
1994
ABSTRACT
Hypothesized that (1) older adults have deficient inhibitory processes and (2) poorer performance in ascending than in fixed
presentations of fragmented stimuli is due to residual activation interference. 24 60-86 yr old volunteers and 24 17-22 yr old
college students were tested for the ability to identify degraded pictures that were presented using either an ascending (AC) or
a fixed (FC) condition. In the AC, Ss identified the pictures at each level of increasing completeness until correct identification
was achieved. In the FC, Ss identified degraded pictures that were presented once at an intermediate level of visual
completeness. An ANOVA confirmed that accuracy was higher in the FC than the AC, and that the main effect of age was not
significant. When Ss were equated on a pretest for performance on the AC, a marginal trend was found for the elderly Ss only.
Additional evidence for reduced inhibitory processes in older Ss was seen in the Ss' correct response latencies. Results are
interpreted as support for the inhibition-deficit hypothesis. ((c) 1997 APA/PsycINFO, all rights reserved) .
TITLE
Perceptual/sensory information versus performance level as indicators of competitive activation in an object identification task:
Evidence from aging.
AUTHOR
Lindfield,-Kimberly-C.; Wingfield,-Arthur
SOURCE
Brain-and-Cognition.1998 Jun; Vol 37(1): 24-27.
ISSN0278-2626
PUBLICATION YEAR
1998
ABSTRACT
Young and elderly adults were tested for the ability to identify degraded pictures presented either in a series of incremental
steps with each step increasing the completeness of the visual information (ascending condition) or in one single exposure
(fixed condition). The probability of correct identification in the fixed condition was better than the ascending condition once
the amount of visual information shown reached a certain level of completeness. This was the case for both age groups tested
even though the performance of older adults was lower than young adults. Findings are consistent with the competitive
activation model of perceptual interference in picture identification (C. R. Luo and J. G. Snodgrass, 1994). ((c) 1998
APA/PsycINFO, all rights reserved) .
Cortical dynamics of three-dimensional figure-ground perception of two-dimensional pictures.
AUTHOR
Grossberg,-Stephen
SOURCE
Psychological-Review.1997 Jul; Vol 104(3): 618-658.
ISSN0033-295X
PUBLICATION YEAR
1997
ABSTRACT
Develops the FACADE theory of 3-dimensional (3-D) vision and figure-ground separation to explain data concerning how
2-dimensional pictures give rise to 3-D percepts of occluding and occluded objects, and how geometrical and contrastive
properties of a picture cooperate or compete when forming the boundaries and surface representations that subserve conscious
percepts. Spatially long-range cooperation and spatially short-range competition work together to separate the boundaries of
occluding figures from their occluded neighbors, and this process is sensitive to image T junctions at which occluded figures
contact occluding figures. These boundaries control the filling-in of color within multiple depth-sensitive surface
representations. Feedback between surface and boundary representations strengthens consistent boundaries while inhibiting
inconsistent ones. Both the boundary and the surface representations of occluded objects may be amodally completed, while the
surface representations of unoccluded objects become visible through modal completion. Functional roles for conscious modal
and amodal representations in object recognition, spatial attention, and reaching behaviors are discussed. Model interactions are
interpreted in terms of visual, temporal, and parietal cortices. ((c) 1997 APA/PsycINFO, all rights reserved) .
Do recognizable figures enjoy an advantage in binocular rivalry?
AUTHOR
Yu,-Karen; Blake,-Randolph
SOURCE
Journal-of-Experimental-Psychology:-Human-Perception-and-Performance.1992 Nov; Vol 18(4): 1158-1173.
ISSN0096-1523
PUBLICATION YEAR
1992
ABSTRACT
Five experiments examined whether recognizable stimuli predominate in binocular rivalry. It was found that a face
predominated more than did a pattern equated for spatial frequency, luminance, and contrast; an objective reaction time
(RT) procedure confirmed predominance of the face. The face was still liable to fragmentation as stimulus size increased.
Observers tracked exclusive dominance of a picture of a camouflaged figure (a Dalmatian dog) prior to and then
following discovery of the figure's presence; control observers received the same protocol with a scrambled version of the
dog stimulus. Compared with control results, predominance of the dog picture was higher even before observers knew of
the camouflaged figure. Inversion of the dog figure reduced its predominance. Binocular rivalry is sensitive to
object-related, configural properties of a stimulus. ((c) 1997 APA/PsycINFO, all rights reserved) .
Recognition of computer-generated pictures on monochrome monitors.
AUTHOR
Baker,-Patti-R.; Belland,-John-C.; Cambre,-Marjorie-A.
SOURCE
Journal-of-Computer-Based-Instruction.1985 Fal; Vol 12(4): 104-107.
ISSN0098-597X
PUBLICATION YEAR
1985
ABSTRACT
Examined whether 64 2nd-4th graders could recognize computer-generated pictures on monochrome monitors. Ss were
randomly assigned to 1 of 2 conditions. Ss in the 1st treatment were asked to identify on a monochrome monitor a figure
that was initially presented in its original form and then as a redesigned, more distinguishable figure. The redesigned
figure had greater figure^ground contrast because color substitutions were made that used pixel patterns to provide contrast
in the monochromatic display. The order of picture presentation was reversed for Ss in the 2nd treatment. Ss also
completed the Children's Embedded Figures Test to assess their field independence^dependence. Results indicate that
regardless of grade or field independence^dependence characteristics, Ss were unable to discern critical features of a color
graphic displayed on a monochromatic monitor unless it was designed to enhance figure^ground separation. Implications for
the design of instructional software that incorporates microcomputer-generated graphics are discussed. (14 ref) ((c) 1997
APA/PsycINFO, all rights reserved) .
Evoked potential correlates of figure and ground.
AUTHOR
Landis,-Theodor; Lehmann,-D.; Mita,-T.; Skrandies,-W.
SOURCE
International-Journal-of-Psychophysiology.1984 Jun; Vol 1(4): 345-348.
ISSN0167-8760
PUBLICATION YEAR
1984
ABSTRACT
Brain potentials averaged during the viewing of an alternating positive and negative "hidden man" puzzle picture were
averaged from 8 Ss before and after they learned to recognize the figure. After vs before recognition, there was
significantly more evoked positivity at 64/96 msec latency and more negativity at 224/256 msec and 352-480 msec latency
over parietal areas during the viewing of the positive picture (recognizable as a face). It is hypothesized that separate
physiological changes might reflect learned meaningfulness of the figure (which entails increased attention) and figure
extraction from ground. (10 ref) ((c) 1997 APA/PsycINFO, all rights reserved) .
Is visual image segmentation a bottom-up or an interactive process?
AUTHOR
Vecera,-Shaun-P.; Farah,-Martha-J.
SOURCE
Perception-and-Psychophysics.1997 Nov; Vol 59(8): 1280-1296.
ISSN0031-5117
PUBLICATION YEAR
1997
ABSTRACT
Visual image segmentation is the process by which the visual system groups features that are part of a single shape. In Exps 1
and 2, Ss were presented with two overlapping shapes and were asked to determine whether two probed locations were on the
same shape or on different shapes. The availability of top-down support was manipulated by presenting either upright or
rotated letters. Ss were fastest to respond when the shapes corresponded to familiar shapes--the upright letters. In Exp 3, a
variant of this segmentation task was used to rule out the possibility that Ss performed same/different judgments after
segmentation and recognition of both letters. Exp 4 ruled out the possibility that the advantage for upright letters was merely
due to faster recognition of upright letters relative to rotated letters. Results suggest that the previous effects were not due to
faster recognition of upright letters; stimulus familiarity influenced segmentation. The results are discussed in terms of an