SLIDE 1 1
COMP 546
Lecture 15
Cue combinations, Bayesian models
SLIDE 2 Visual Cues:
image properties that can tell us about scene properties
Image
texture
shading binocular disparities motion (from moving observer) defocus blur
Scene
depth gradient
surface curvature depth
2
SLIDE 3 π π½ = π π = π‘ )
- Probability of measuring image π½ = π, when the scene is π = π‘.
(called βlikelihoodβ of scene π = π‘, given the image π½ = π).
- Maximum likelihood method:
Choose π = π‘ that maximizes π π½ = π π = π‘ )
Last lecture: Likelihood
3
SLIDE 4 4
This lecture: How to combine cues ?
π π½1, π½2 π )
SLIDE 5 5
Example:
stereo only texture and stereo
[Hillis 2004]
texture only (monocular)
SLIDE 6 6
π π½1, π½2 π ) = π π½1 π ) π π½2 π )
Assume likelihood function is βconditionally independentβ: e.g. π½1 is texture. π½2 is binocular disparity.
SLIDE 7 7
π = s
π π½2 π ) π π½1 π )
Assume π π½1 = π1 π = π‘ ) and π π½2 = π2 π = π‘ ) are Gaussian shaped.
SLIDE 8 8
π = s
π π½2 π ) π π½1 π )
Assume π π½1 = π1 π = π‘ ) and π π½2 = π2 π = π‘ ) are Gaussian shaped. Their maxima might occur at different values of π‘. Why ? π‘1 π‘2
SLIDE 9 We want to find the π‘ that maximizes:
π π½1 | π = π‘ π π½2 | π = π‘ = π
β π‘ β π‘1 2 2 π12
π
β π‘ β π‘2 2 2 π22
SLIDE 10 We want to find the π‘ that maximizes: So, we want to find the π‘ that minimizes:
π π½1 | π = π‘ π π½2 | π = π‘ = π
β π‘ β π‘1 2 2 π12
π
β π‘ β π‘2 2 2 π22
SLIDE 11
The lecture notes show that the solution π = π‘ is where π‘ = π₯1π‘1 + π₯2π‘2 π₯1 + π₯2 = 1 0 < π₯π < 1
βLinear Cue Combinationβ
SLIDE 12
The lecture notes show that the solution π = π‘ is where Thus, less reliable cue (larger π) get less weight. π₯1 = π22 π12 + π22 π₯2 = π12 π12 + π22 π‘ = π₯1π‘1 + π₯2π‘2 π₯1 + π₯2 = 1 0 < π₯π < 1
SLIDE 13 13
Example:
stereo only
[Hillis 2004]
texture only (monocular)
Measure slant discrimination thresholds for cues in isolation. Estimate likelihood function parameters (π‘1, π1 , π‘2, π2).
SLIDE 14 14
texture and stereo
β¦ then
- present cues together
- measure thresholds for π
- convert thresholds to likelihood parameters (π‘ , Ο)
SLIDE 15 15
texture and stereo
β¦ then
- present cues together
- measure thresholds for π
- convert thresholds to likelihood parameters (π‘ , Ο)
- examine if these values are consistent with the model*
*Model also makes prediction about Ο in combined case.
π‘ = π₯1π‘1 + π₯2π‘2
SLIDE 16 16
π = s
π π½2 π ) π π½1 π )
π‘1 π‘2
Experimenter can manipulate π‘1 , π‘2 , π1 , π2 and predict effect on perception of slant.
texture and stereo
SLIDE 17 17
COMP 546
Lecture 15
Cue combinations, Bayesian models
SLIDE 18 π π½ = π π = π‘) β π π = π‘ π½ = π)
18
Likelihood of scene π‘, given image π Probability of scene π‘, given image π What is the crucial difference ?
SLIDE 19
SLIDE 20 wire frame with independently chosen depths regular solid cube flat drawing
All scenes above have the same likelihood π( π½ = π | π = π‘ ). Why do we prefer the regular solid cube?
[Kersten & Yuille 2003]
SLIDE 21
Some scenes may have a larger probability π(π = π‘ ). The marginal probably π(π = π‘) is called the "prior".
SLIDE 22 π π½ π ) β‘
π(π½, π ) π(π)
π π π½ ) β‘
π (π½, π ) π(π½)
π π½ π ) π π = π π π½ ) π π½
Thus,
SLIDE 23 π π½ π ) β‘
π(π½, π ) π(π)
π π π½ ) β‘
π (π½, π ) π(π½)
π π π½ ) = π π½ π ) π π π π½
Thus,
Bayes Theorem
posterior likelihood scene prior image prior
SLIDE 24 Maximum βa Posterioriβ (MAP)
Given an image, π½ = π, find the scene π = π‘ that maximizes π( π = π‘ | π½ = π ).
π π π½ ) = π π½ π ) π π π π½
posterior likelihood scene prior image prior
SLIDE 25 Maximum βa Posterioriβ (MAP)
Given an image, π½ = π, find the scene π = π‘ that maximizes π( π = π‘ | π½ = π ).
π π π½ ) = π π½ π ) π π π π½
posterior likelihood scene prior image prior
We don't care about π( π½ = π ). Why not ?
SLIDE 26 If the prior p(S) is uniform then maximum likelihood gives the same solution as maximum posterior (MAP). Interesting cases arise when the prior is non-uniform. π π π½ ) = π π½ π ) π π π π½
posterior likelihood scene prior image prior constant
SLIDE 27
likelihood prior
SLIDE 28 http://www.youtube.com/watch?v=Ttd0YjXF0no
Ames Room
https://www.youtube.com/watch?v=gJhyu6nlGt8
SLIDE 29 Priors (βNatural Scenes Statisticsβ)
- intensity
- rientation of image lines, edges
- disparity
- motion
- surface slant, tilt
SLIDE 30
- rientation π of lines, edges
[Girshick 2011]
π(π = π)
People are indeed better at discriminating vertical and horizontal orientations than oblique orientations. Why? Because they use a prior ?
SLIDE 31 surface slant π and tilt π
Here we represent (slant, tilt) using a concave hemisphere. See next slide.
floor ceiling
SLIDE 32 π(π = (π, π))
[Adams & Elder 2016]
represent slants and tilts using a concave
Each disk shows π(π, π) for surfaces visible over a range of viewing direction elevations, relative to line of sight.
SLIDE 33
π(π = (π, π))
SLIDE 34
π(π = (π, π))
SLIDE 35
Maximum a Posteriori (MAP)
= β Choose the S = (slant,tilt) that maximizes the posterior.
π( π ) π(π½ = π | π) π π π½ = π )
posterior likelihood prior
SLIDE 36 i.e. convex or concave ?
π(π½ = π | π)
(slant, tilt)
Likelihood functions can have more than one maximum.
SLIDE 37 Depth Reversal Ambiguity and Shading
(see Exercise) A valley illuminated from the right produces the same shading as a hill illuminated from the left.
π(π½ = π | π) Likelihood (slant, tilt)
SLIDE 38
What βpriorsβ does the visual system use to resolve such twofold ambiguities ? Letβs look at a few related examples.
SLIDE 39
You can perceive the center point as a hill or a valley. When you see it as a hill, you perceive the tilt as 180 deg (leftward). But when you see it as a valley, the slant is 0 (rightward).
SLIDE 40
We tend to see the center as a hill. Why ?
SLIDE 41
We tend to see the center as a valley. Why ?
SLIDE 42 The visual system uses three priors to resolve the depth reversal ambiguity:
- surface orientation: p(floor) > p(ceiling)
- light source direction:
p( above) > p( below)
- βglobalβ surface curvature: p(convex) > p(concave)
SLIDE 43
Example in which all three priors assumptions are met
light from above viewpoint from above (floor) shape is convex
SLIDE 44
Example in which all three prior assumptions fail
shape is concave viewpoint from below (ceiling) light from below
SLIDE 45
floor ceiling
Convex shape, illuminated from above the line of sight
SLIDE 46
floor ceiling
Concave shape, illuminated from below the line of sight
SLIDE 47 We showed how people combined the three different "priors": Percent correct in judging local "hill" or "valley": = 50 +/- 10 floor vs. ceiling +/- 10 light from above vs. below +/- 10 globally convex/concave
[Langer and Buelthoff, 2001]
SLIDE 48 Best (80%) Worst (20%)
SLIDE 49
These look weird, but in different ways. How ?
SLIDE 50 Reminder
- A2 is due tonight
- Midterm (optional) is first class after Study Break