Lecture 18: Depth estimation 1 Announcements PS9 out tonight: - - PowerPoint PPT Presentation

lecture 18 depth estimation
SMART_READER_LITE
LIVE PREVIEW

Lecture 18: Depth estimation 1 Announcements PS9 out tonight: - - PowerPoint PPT Presentation

Lecture 18: Depth estimation 1 Announcements PS9 out tonight: panorama stitching New grading policies from UMich (details TBA) Final presentation will take place over video chat. - Well send a sign-up sheet next week 2


slide-1
SLIDE 1

1

Lecture 18: Depth estimation

slide-2
SLIDE 2
  • PS9 out tonight: panorama stitching
  • New grading policies from UMich (details TBA)
  • Final presentation will take place over video chat.
  • We’ll send a sign-up sheet next week

2

Announcements

slide-3
SLIDE 3

Today

  • Stereo matching
  • Probabilistic graphical models
  • Belief propagation
  • Learning-based depth estimation

3

slide-4
SLIDE 4

Basic stereo algorithm

For each epipolar line For each pixel in the left image

  • compare with every pixel on same epipolar line in right image
  • pick pixel with minimum match cost

Improvement: match windows

Source: N. Snavely

4

slide-5
SLIDE 5

Stereo matching based on SSD

SSD dmin d Best matching disparity

Source: N. Snavely

5

slide-6
SLIDE 6

Window size

W = 3 W = 20

Source: N. Snavely

6

slide-7
SLIDE 7

Stereo as energy minimization

  • What defines a good stereo correspondence?

1. Match quality

  • Want each pixel to find a good match in the other image

2. Smoothness

  • If two pixels are adjacent, they should (usually) move about the

same amount

Source: N. Snavely

7

slide-8
SLIDE 8

Stereo as energy minimization

  • Find disparity map d that minimizes an energy

function

  • Simple pixel / window matching

Squared distance between windows I(x, y) and J(x + d(x,y), y)

=

Source: N. Snavely

8

slide-9
SLIDE 9

Stereo as energy minimization

I(x, y) J(x, y)

y = 141

C(x, y, d); the disparity space image (DSI)

x d

Source: N. Snavely

9

slide-10
SLIDE 10

Stereo as energy minimization

y = 141 x d

Simple pixel / window matching: choose the minimum of each column in the DSI independently:

Source: N. Snavely

10

slide-11
SLIDE 11

Greedy selection of best match

Source: N. Snavely

11

slide-12
SLIDE 12

Stereo as energy minimization

  • Better objective function

{ {

match cost smoothness cost

Want each pixel to find a good match in the other image Adjacent pixels should (usually) move about the same amount

Source: N. Snavely

12

slide-13
SLIDE 13

Stereo as energy minimization

match cost: smoothness cost:

4-connected neighborhood 8-connected neighborhood : set of neighboring pixels

Source: N. Snavely

13

slide-14
SLIDE 14

Smoothness cost

“Potts model” L1 distance How do we choose V?

Source: N. Snavely

14

slide-15
SLIDE 15

Probabilistic interpretation

exp(E(d)) = exp(Ed(d) + λEs(d))

Exponentiate:

exp(E(d)) Z = 1 Z exp(Ed(d) + λEs(d))

Z = X

d0

exp E(d0)

where Normalize: (make it sum to 1) Rewrite:

P(d | I)

Example adapted from Freeman, Torralba, Isola

15

slide-16
SLIDE 16

Probabilistic interpretation

P(d | I)

“Local evidence” “Pairwise compatibility” How good are the matches? Is the depth smooth?

Example adapted from Freeman, Torralba, Isola

16

slide-17
SLIDE 17

Probabilistic interpretation

P(d | I)

Local evidence: Pairwise compatibility:

Example adapted from Freeman, Torralba, Isola

17

slide-18
SLIDE 18

Probabilistic graphical models

P(d | I)

Graph structure:

  • Open circles for latent variables xi
  • di in our problem
  • Filled circle for observations yi
  • Pixels in our problem
  • Edges between interacting variables
  • In general, graph cliques for 3+ variable

interactions

Example adapted from Freeman, Torralba, Isola

18

slide-19
SLIDE 19

Probabilistic graphical models

P(d | I)

Why formulate it this way?

  • Exploit sparse graph structure for fast inference,

usually using dynamic programming

  • Can use probabilistic inference methods
  • Provides framework for learning parameters

19

slide-20
SLIDE 20

Directed graphical model Also know as Bayesian network (Not covered in this course)

Probabilistic graphical models

Undirected graphical model. Also known as Markov Random Field (MRF).

20

slide-21
SLIDE 21

Marginalization

What’s the marginal distribution for x1? i.e. what’s the probability of x1 being in a particular state?

  • But this is expensive: O(|L|^N)
  • Exploit graph structure!

21

slide-22
SLIDE 22

Marginalization

22

slide-23
SLIDE 23

Message passing

Message that node x3 sends to node x2 Message that x2 sends to x1 Can think of “local evidence” message passing

23

slide-24
SLIDE 24
  • Message mij is the sum over all states of all nodes in the subtree

leaving node i at node j

  • It summarizes what this node “believes”.
  • E.g. if you have label x2, what’s the probability of my subgraph?
  • Shared computation! E.g. could reuse m32 to help estimate p(x2 | y).

Message passing

24

slide-25
SLIDE 25

Belief propagation

  • Estimate all marginals p(xi | y) at once!

[Pearl 1982]

  • Given a tree-structured graph, send

messages in topological order
 Sending message from j to i:

  • 1. Multiply all incoming messages

(except for the one from i)

  • 2. Multiply the pairwise compatibility
  • 3. Marginalize over xj

25

slide-26
SLIDE 26

General graphs

  • Vision problems often are often on grid graphs
  • Pretend the graph is tree-structured and do belief

propagation iteratively!

  • Can also have consistency with N > 2 variables
  • But complexity is exponential in N!

Loopy belief propagation:

  • 1. Initialize all messages to 1
  • 2. Walk through the edges in an

arbitrary order (e.g. random)

  • 3. Apply the messages updates

26

slide-27
SLIDE 27

Finding best labels

Marginal: “Max marginal” instead:

= max

x2,x3

b(x1 | ~ y) =

27

argmax

max

x1,x2,x3

Often want to find the labels that jointly maximize probability:

max

xj

This is called maximum a posteriori estimation (MAP estimation).

slide-28
SLIDE 28

Application to stereo

[Felzenzwalb & Huttenlocher, “Efficient Belief Propagation for Early Vision”, 2006]

28

slide-29
SLIDE 29

Deep learning + MRF refinement

[Zbontar & LeCun, 2015]

Left Right Positive Negative Query patch CNN-based matching + MRF refinement

29

slide-30
SLIDE 30

Learning to estimate depth without ground truth

[Mildenhall*, Srinivasan*, Tanick*, et al., Neural radiance fields, 2020]

3D scene Viewpoints

Learn “volume”: color + occupancy

30

slide-31
SLIDE 31

Learning to estimate depth without ground truth

A good volume should reconstruct the input views

[Mildenhall*, Srinivasan*, Tanick*, et al. 2020]

31

slide-32
SLIDE 32

Learning to estimate depth without ground truth

[Mildenhall*, Srinivasan*, Tanick*, et al. 2020]

32

slide-33
SLIDE 33

Learning to estimate depth without ground truth

[Mildenhall*, Srinivasan*, Tanick*, et al. 2020]

Inserting virtual objects View synthesis

33

slide-34
SLIDE 34

Next class: motion

34