Chapter 10-Perceiving Depth and Size

How does the visual system locate objects in three dimensions?

Perceiving position in two dimensions – vertical and horizontal – is easy.

Assume all objects are at the same distance from the observer, so that the brain only has to determine horizontal and vertical locations.

Position of objects in space nicely correspondsto location of activity on the retina and tolocation of activity in the visual cortex. So differences in location of activity in the brain nicely signal differences in object location in vertical and horizontal dimensions.

This means thatlocation of activity in the cortexcould serve as an indicator of spatial location of objects if all objects were the same distance from us.

Perception of location in 3 dimensions – a problem.

Now assume that objects can vary in distance from the observer – x and y in the figure below.

Points x and y project to exactly the same place on the retina. So they’ll also project to the same neurons in area V1. So how are we able to know that they are in different places in space?

So the issue is: How does the brain identify the location of objects in 3 dimensions when the location of activity on the two-dimensional retina gives an ambiguous indication of location?

Cue theory A GeneralTheory of perception of depth / distance G9 p 228

Depth is synthesizedor figured out through the combination by brain processes of many imperfect cues.

People with this perspective believe that the visual world contains many indicators of distance.

However, each single indicator by itself ambiguously represents distance of an object.

So, each of the many cues is an imperfect indicator, giving a little bit of information about distance, but not the whole picture.

People holding this perspective believe that the many ambiguous or imperfect indicators must be synthesizedby brain processes into a perception of distance by higher order cortical processes.

So the key aspects of this view are

1) Individual cues for distance are ambiguous. No one cue gives a perfect indication of distance.

2) The ambiguous cues are integrated/synthesized by higher order cortical processes.

3) The perception of distance is an inference based on the integration/synthesis of many cues.

A byproduct of this view is the realization that our brain is continually performing computations of which we are completely unaware – distance computations in this example.

Table of the 15 generally recognized Cues . . .

Monocular (available to 1 eye) / Binocular (requires 2 eyes)
Oculomotor / Accommodation / Convergence
Static – Size/Position based / Occlusion
Relative height
Familiar Size
Relative Size
Texture Gradients
Linear Perspective
Static - Lighting based / Atmospheric Perspective
Shading
Cast Shadows
Dynamic / Motion parallax
Optic flow
Deletion/Accretion
Neural / Binocular Disparity

Oculomotor (Eye Muscle) Cues

Accommodation – Changing the shape of the lens to keep attended-to object in focus

A cue available to either eye alone, so it’s a monocular cue

Convergence – Changing the directions at which the two eyes point to keep the image of the attended-to object at corresponding points in the two retinas.

Issue: What exactly is the signal? Two possibilities . . .

1) signals from muscles of the eye after they’ve contracted or expanded are the cues,or

2) copies of signals sent to the eye muscles are the cues. (Called corollary discharges – recall the Corollary Discharge Theory of movement perception)

The signals actually used are probably corollary discharge signals.

Both convergence and accommodation give depth information for objects up to about 6 feet from us.

G9 p233 Table 10.1.
Static Size/Position BasedCues(All monocular)

1. Static Cues . . . Available in a painting or regular photograph.

Partial Occlusion – If the image of one object occludes (prevents us from seeing) part of another object, the occluding object is perceived as being closer.

Relative height - Below the horizon: Objects lower in the visual field (1,2,3 below) are perceived as closer.

Above the horizon, those higher (4,5) in the visual field are perceived as farther away.

Familiar size - Comparison of the size of the retinal image of familiar objects at typical distances allows us to judge their distance in unfamiliar situations.

Relative size – Comparison of retinal sizes of images of objects known to be equal in physical size allows us to judge their relative distances from us.

Texture gradients – The larger /rougher the texture the closer the object

Linear perspective – Apparent convergence of parallel lines


Static – Lighting Based Cues (All monocular)

Atmospheric perspective – Distant objects appear cloudier than close-up objects.

Shading – Darker objects appear to be farther away

Cast shadows – Y1 p 199 .

A major piece of information to take away from this is that these stimulus characteristics are automatically integrated in the formation of the experience of distance. So there is brain circuitry doing this all the time.

Have you thanked your brain recently?

Some videos related to depth perception

*Monocular depth cues:

Student project; on a field; illustrate using their placement on the grassy field

*Pitting cues for depth perception against each other:

Simple, short; Cute

Dynamic Cues

A. Motion / movement parallax

The differences in speeds of movement of images of objects on the retina as those objects move at the same physical speed.

Objects which appear to move more rapidly across the field of vision are judged to be closer.

A monocular cue. Requires only one eye.

Requires that the observer be able to perceive motion. This means that some people may not be able to perceive depth generated through motion parallax because they can’t perceive motion.

Demonstration

Good:

Neural Correlates of Motion Parallax

There is mounting evidence that there are neurons or neuron clusters that respond to motion parallax.

.

“ . . .many neurons in the middle temporal area (area MT) signal the sign of depth (near versus far) from motion parallax in the absence of other depth cues. To achieve this, neurons must combine visual motion with extra-retinal (non-visual) signals related to the animal's movement. Our findings suggest a new neural substrate for depth perception and demonstrate a robust interaction of visual and non-visual cues in area MT. Combined with previous studies that implicate area MT in depth perception based on binocular disparities, our results suggest that area MT contains a more general representation of three-dimensional space that makes use of multiple cues . .. .”
2. Optic Flow – the relative motions of objects and surfaces as you move forward or backward through a scene.

Play m4h01018 Optic flow demo.mp4 in

C:\Users\Michael\Desktop\Desktop folders\Class Videos\M4H01017 Optic flow demo.MP4

Note that objects in the center of the visual scene move very little.

Objects in the periphery move a lot.

Automatic neural processing integrates the different rates and directions of movement of pieces of the visual scene into a continuous experience of movement through the 3rd dimension.

3. Deletion and Accretion

Deletion – the gradual occlusion of an object as it moves behind a closer object

Accretion – the gradual uncovering of an object as it comes from behind a closer object.

Play C:\Users\Michael\Desktop\Desktop folders\Class Videos\Deletion Accretion.m2ts

G9 p 233 Table 10.1 – Ranges of Effectiveness of Selected Depth Cues

Binocular Disparity

Disparity: Differences between the images of the visual scene on the left and the right retina.

The left eye “sees” a slightly different scene than the right eye.

Stereoscopic Image – A picture with two views of the same scene. On the left is the view that would be seen by one eye. On the right is the view that would be seen by the other eye.

Stereopsis: the perception of depth and 3-dimensional structure obtained on the basis of visual information deriving from two eyes by individuals with normally developed binocular vision.

Example of a Stereoscopic image

The left imageis the right eye view and the right image is the left eye view.

They’re reversed to make it easier to fuse them by simply crossing your eyes.

Stereoscopic Image

Right Eye ViewLeft Eye View

Note that the differencesbetween the images are not huge. You must inspect the two figures to discover the small differences.

Yet the visual system integrates the two views into a single experience of the scene in which the differences between the two views have been translated into differences in distance. Ask an auto engineer designing automobile “vision” systems how difficult it is to do this.

(A simple demonstration of the different images seen by each eye can be obtained by holding a finger up in front of your eyes and viewing a scene alternatively with the left eye and then the right eye. )

Optical Details of Binocular Disparity G9 p 236

Horopter: An arc surrounding us with the following characteristic: The images of all objects on the horopter are projected to corresponding points on the retina.

All images on the horopterproject to corresponding pointson the two retinas.

So A and A’, B and B’, and C and C’ are three pairs of corresponding images.

Images of objects beyond the fixation point project todisparate points.

Images of objects closer than the fixation pointalsoproject to disparate points.

So A and ~A and C and ~C are pairs of disparate points on the retinas.

How are the features of binocular images matched? G9 p 241

Binocular neurons: 1000s of neurons respond to simultaneous stimulation of both eyes and do not respond if only one eye is stimulated.

Many such neurons respond only when corresponding images (zero disparity) are displayed.

Others, however, respond when disparate images are displayed.

Ways of creating binocular disparity: Presenting different images to each eye. How Avatar could have been presented. G9 p 239

There are several ways of presenting separate, different images to each eye. The goal is to create a situation in which the left eye receives only the left-eye image and the right eye receives only the right-eye image.

1. The natural way. View any scene with the two eyes open. The left eye image is slightly different from the right eye scene by virtue of the fact that the eyes are separated. Each eye sees only its own scene.

2. Use a stereoscope to view a stereoscopic image. The stereoscope is a device which holds two images (usually photos) and forces the left eye to see one image and the right eye to see the other.

Demonstrate this.

3. Use polarized lenses to view two superimposed images. (The IMAX/ Avatar method.)

The left eye view is projected using vertically polarized light. The right eye view is projected using horizontally polarized light. Both views are projected at the same place in the visual field.

The left eye is covered with a vertically polarized filter. This allows only the vertically polarized light to strike the left eye. Conversely, the right eye is covered with a horizontally polarized filter, allowing only horizontally polarized light to strike the right eye.

This process results in the left eye “seeing” only the left-eye view and the right eye “seeing” only the right eye view.

4. Use colored classes to view anaglyph. 3D(that’s anaglyph)

The left-eye view is created using a predominantly blue image. The right eye view is created using a predominantly red image.

The left eye is covered with a red lens. This reflects the predominantly red image but passes the predominantly blue image, allowing the left eye to “see” the blue image. Vice versa for the right eye, covered by a blue lens.

5. Use special surfaces that allow one image to be viewed by the left eye and another image to be viewed by the right eye.

View from the top

6. View a “backwards” stereoscopic imagewith crossedeyes so that the left eye and right eye double images converge to one. See image on page 9 of these notes.

7. Create a regular stereoscopic image with diverged eyes, with the left eye view to the left and the right eye view to the right. Diverge the eyes so that the left eye and right eye double images are experiences as one image. I can’t do this.

8. Use specially created glasses electronically synchronized with the display. Used in 3-D TVs now for sale. The left eye glass becomes clear and the right eye glass is made opaque for 1/30 sec while the left eye image is displayed. Then the left eye glass become opaque and the right eye glass is made clear the next 1/30 sec as the right eye image is displayed.

Creating anaglyph 3D

You need colored glasses for these demonstrations. Red lens over the left eye. Blue lens over the right.

The red lens blocks the red image and passes the blue – sending the blue image to the left eye.

The blue lens blocks the blue image and passes the red – sending the red image to the right eye.

1. Two squares. Lefteye ~ Red lensRight eye ~ Blue lens

You should see the square in front of the page.

2. Now reverse the lenses:Left eye: Blue lensRight eye: Red lens

You should now see the square behind the page.

3. Text: Left eye ~Red lensRight eye ~Blue lens

The center rectangle of text should appear in front of the page.

Perception of Depth- 110/1/2018

Gorilla at large: (1954 Movie starring Cameron Mitchell, Lee J Cobb, Anne Bancroft, Raymond Burr, and Lee Marvin)

Left eye: Red lensRight eye: Blue lens

Perception of Depth- 110/1/2018

Where is the object in Stereopsis?

Question: Does either eye alone have to be able to recognize the objects that differ in depth for stereopsis to occur?

Two possibilities . . .

1.Object Recognition First. Object recognition occurs for the image in the left eye. Object recognition occurs for the image in the right eye. The two sets of objects are then matched upsomewhere in the brain.

2. Pieces First. Selected features – edges, colors, of the two views of the object, but not the actual objects -are matched and then object recognition occurs once for the matched features.

The answer was provided in the 1960s by Bela Julesz who created . . .

Random Dot Stereograms.- stereoscopic images in which the objects that differ in depth were not visible to either eye alone.

Consider the following square . . .

It can be seen by either eye. It can be identified. But now consider the same square embedded in a larger square

Where is the original square? It’s there, because I cut it out of the larger square and pasted it onto this page. But neither eye, nor both together can identify the object in question – the square.


Now consider the following. The square object above is in both sides of the following figure.

But the above square is offset slightly in the left view as opposed to the right view – by about ½ mm.

When viewing the two larger squares normally, most people cannot see the smaller square within each.

But if the left half of the above figure is presented to the left eye and the right half to the right eye, most people with normal binocular vision can identify the smaller square within the larger squares and they will see it at a slightly different distance than the larger squares.

Since 1) the object in the top figure is not seen with either eye alone, and 2) the object IS seen when both eyes are used, the conclusion is that little pieces of the images in each eye are matched up and then objection recognition occurs on the matched pieces. So possibility 2 above is correct.

The above display requires a stereo viewer. The anaglyph below, which is based on the same principle can be viewed with colored lenses. Note, however, that in the anaglyph, you can identify the object that will be displaced with either eye. That’s because the elements of the object (the digits) are so large.

Perception of SizeG9 p 243

Most obvious visual scene cues for object size is retinal imageof the object.

Large Object has largeretinal imageSmall object has smallretinal image.

The problem with retinal image size as an indicator of size of the external object is that the retinal image size also depends on distance of the object.