Visual Cues for Depth Perception: Shading, Shadows, Perspective

 38 min video

 3 min read

YouTube video ID: gaZTffZYyrg

Source: YouTube video by MIT OpenCourseWareWatch original video

PDF

The visual system must extract three‑dimensional structure from the two‑dimensional images that the eyes receive. Humans rely on a variety of visual cues, while some animals use sonar instead. Depth perception depends on stereopsis from two eyes and on internalized assumptions about the physics and geometry of the world.

Shape from Shading

Variations in luminance are interpreted as cues to surface shape. Most surfaces behave approximately Lambertian, meaning reflected light intensity depends on the angle between the light source and the surface normal. The visual system has a strong prior that illumination comes from above, allowing it to infer curvature from shading. Bas‑relief sculptures exploit this cue to create convincing depth illusions.

Shadows as Depth Cues

Drop shadows convey powerful information about the spatial relationship between objects and supporting surfaces. Shadows must be dark; light‑colored regions are not perceived as shadows. The visual system requires local consistency of shadows to infer depth, without needing a full global model of the scene.

Geometric Regularities

Objects that are farther away appear higher in the visual field and project smaller retinal images. Texture gradients—systematic changes in repeated elements—signal receding depth. The visual system assumes textures are uniform; when a texture is warped on a flat surface, the brain may perceive a curved surface instead.

Emmert's Law and Size Perception

Emmert's Law describes the relationship between perceived size, perceived distance, and visual angle: perceived size is proportional to perceived distance when retinal size is fixed. The visual system often assumes objects have a constant, familiar size and uses that assumption to infer distance, maintaining size constancy across varying depths.

Aerial and Linear Perspective

Aerial perspective causes distant objects to appear blurrier, lower in contrast, and more bluish because of light scattering in the atmosphere. Linear perspective makes parallel lines in the world converge in the image. Humans frequently construct environments with parallel lines—such as rectangles—and the visual system uses this regularity as a depth cue.

Ambiguity and Bistability

Depth interpretation is an ill‑posed problem because many different 3‑D structures can generate the same 2‑D image. Bistable images like the Necker cube or the duck‑rabbit illustration illustrate this ambiguity; the visual system may sample from the posterior probability of possible interpretations or switch between them due to neural adaptation. Bistable percepts often flip every five to ten seconds.

Mechanisms Behind the Cues

When lighting direction is fixed, reflected light intensity varies with surface orientation, enabling the visual system to infer curvature from shading. Emmert's Law, expressed as Size ∝ Distance × Visual Angle, lets the brain scale perceived size to compensate for changes in perceived distance. Bistability arises when the posterior probability distribution over world states is multimodal, prompting the brain to alternate between equally likely interpretations.

“The extraction of three dimensional structure from the two‑dimensional images that the eyes receive is important for lots of things.”
“The visual system seems to have a prior that favors illumination from above.”
“Perception is kind of encapsulated and sometimes cognitively impenetrable.”
“The visual system has internalized the regularities of the world and uses those to solve this ill‑posed problem of depth perception.”
“We see in 3D… what you perceive is your inference of the three‑dimensional structure of the world.”

  Takeaways

  • Depth perception relies on stereopsis and internalized assumptions about world physics to infer three‑dimensional structure from two‑dimensional retinal images.
  • The visual system assumes illumination comes from above, using shape‑from‑shading cues to interpret surface curvature and create depth illusions.
  • Drop shadows, texture gradients, aerial perspective, and linear perspective each provide consistent local information that the brain uses to gauge distance and shape.
  • Emmert's Law links perceived size, distance, and visual angle, allowing the brain to maintain size constancy despite changes in retinal image size.
  • Bistable images reveal that depth interpretation is ambiguous; the brain samples from multiple plausible interpretations, often switching every few seconds.

Frequently Asked Questions

Why does the visual system assume illumination comes from above?

The visual system has a strong prior that light typically originates from overhead, likely because natural lighting—sunlight and indoor ceiling lights—generally falls from above. This assumption simplifies shape‑from‑shading calculations, letting the brain infer surface orientation from luminance gradients.

How does Emmert's Law explain size constancy across different distances?

Emmert's Law states that perceived size is proportional to perceived distance when retinal size stays constant. When an object appears farther away, the brain scales up its perceived size to compensate for the smaller visual angle, preserving the impression that familiar objects retain a constant physical size.

Who is MIT OpenCourseWare on YouTube?

MIT OpenCourseWare is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF