Video Recognition / Optical Flow Various slides from previous - - PowerPoint PPT Presentation

video recognition optical flow
SMART_READER_LITE
LIVE PREVIEW

Video Recognition / Optical Flow Various slides from previous - - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Video Recognition / Optical Flow Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays


slide-1
SLIDE 1

CS4501: Introduction to Computer Vision

Video Recognition / Optical Flow

Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays (Brown / Georgia Tech), A. Berg (Stony Brook / UNC), D. Samaras (Stony Brook) . J. M. Frahm (UNC), V. Ordonez (UVA), Steve Seitz (UW).

slide-2
SLIDE 2
  • Optical Flow / Video Recognition

Today’s Class

slide-3
SLIDE 3

Optical Flow

Most slides by Juan Carlos Niebles and Ranjay Krishnan Stanford’s Vision Class

slide-4
SLIDE 4

Optical Flow

Most slides by Juan Carlos Niebles and Ranjay Krishnan Stanford’s Vision Class

slide-5
SLIDE 5

From images to videos

  • A video is a sequence of frames captured over time
  • Now our image data is a function of space (x, y) and time (t)
slide-6
SLIDE 6

Why is motion useful?

slide-7
SLIDE 7

Why is motion useful?

slide-8
SLIDE 8

Optical flow

  • Definition: optical flow is the apparent motion of

brightness patterns in the image

  • Note: apparent motion can be caused by lighting

changes without any actual motion

  • Think of a uniform rotating sphere under fixed lighting
  • vs. a stationary sphere under moving illumination

GOAL: Recover image motion at each pixel from

  • ptical flow

Source: Silvio Savarese

slide-9
SLIDE 9

Picture courtesy of Selim Temizer - Learning and Intelligent Systems (LIS) Group, MIT

Optical flow

Vector field function of the spatio-temporal image brightness variations

slide-10
SLIDE 10

Estimating optical flow

  • Given two subsequent frames, estimate the apparent motion field

u(x,y), v(x,y) between them

  • Key assumptions
  • Brightness constancy: projection of the same point looks the same in

every frame

  • Small motion: points do not move very far
  • Spatial coherence: points move like their neighbors

I(x,y,t–1) I(x,y,t)

Source: Silvio Savarese

slide-11
SLIDE 11

Key Assumptions: small motions

slide-12
SLIDE 12

Key Assumptions: spatial coherence

* Slide from Michael Black, CS143 2003

slide-13
SLIDE 13

Key Assumptions: brightness Constancy

* Slide from Michael Black, CS143 2003

slide-14
SLIDE 14

Taylor Series Expansion

f (x) =

slide-15
SLIDE 15

I(x +u, y + v,t) ≈ I(x, y,t −1)+ Ix ⋅u(x, y)+ Iy ⋅v(x, y)+ It

  • Brightness Constancy Equation:

I(x, y,t −1) = I(x +u(x, y), y + v(x, y),t)

Linearizing the right side using Taylor expansion: I(x,y,t–1) I(x,y,t) » + × + ×

t y x

I v I u I Hence,

Image derivative along x

→ ∇I ⋅ u v

[ ]

T + It = 0

I(x +u, y + v,t)− I(x, y,t −1) = Ix ⋅u(x, y)+ Iy ⋅v(x, y)+ It

Source: Silvio Savarese

The brightness constancy constraint

slide-16
SLIDE 16

Filters used to find the derivatives

!" !# !$

slide-17
SLIDE 17
  • How many equations and unknowns per pixel?

The component of the flow perpendicular to the gradient (i.e., parallel to the edge) cannot be measured edge (u,v) (u’,v’) gradient (u+u’,v+v’)

If (u, v ) satisfies the equation, so does (u+u’, v+v’ ) if

  • One equation (this is a scalar equation!), two unknowns (u,v)

∇I ⋅ u' v'

[ ]

T = 0

Can we use this equation to recover image motion (u,v) at each pixel?

Source: Silvio Savarese

The brightness constancy constraint

∇I ⋅ u v

[ ]

T + It = 0

slide-18
SLIDE 18

The aperture problem

Actual motion

Source: Silvio Savarese

slide-19
SLIDE 19

The aperture problem

Perceived motion

Source: Silvio Savarese

slide-20
SLIDE 20

The barber pole illusion

http://en.wikipedia.org/wiki/Barberpole_illusion

Source: Silvio Savarese

slide-21
SLIDE 21

The barber pole illusion

http://en.wikipedia.org/wiki/Barberpole_illusion

Source: Silvio Savarese

slide-22
SLIDE 22
  • Optical flow
  • Lucas-Kanade method

Reading: [Szeliski] Chapters: 8.4, 8.5

[Fleet & Weiss, 2005] http://www.cs.toronto.edu/pub/jepson/teaching/vision/2503/opticalFlow.pdf

  • B. Lucas and T. Kanade. An iterative image registration

technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981.

slide-23
SLIDE 23

Solving the ambiguity…

  • How to get more equations for a pixel?
  • Spatial coherence constraint:
  • Assume the pixel’s neighbors have the same (u,v)
  • If we use a 5x5 window, that gives us 25 equations per pixel
  • B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo
  • vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–

679, 1981.

Source: Silvio Savarese

slide-24
SLIDE 24
  • Overconstrained linear system:

Lucas-Kanade flow

Source: Silvio Savarese

slide-25
SLIDE 25
  • Overconstrained linear system

The summations are over all pixels in the K x K window

Least squares solution for d given by

Source: Silvio Savarese

Lucas-Kanade flow

slide-26
SLIDE 26

Conditions for solvability

  • Optimal (u, v) satisfies Lucas-Kanade equation

Does this remind anything to you?

When is This Solvable?

  • ATA should be invertible
  • ATA should not be too small due to noise

– eigenvalues l1 and l 2 of ATA should not be too small

  • ATA should be well-conditioned

– l 1/ l 2 should not be too large (l 1 = larger eigenvalue)

Source: Silvio Savarese M = ATA is the second moment matrix ! (Harris corner detector…)

slide-27
SLIDE 27

Errors in Lukas-Kanade

  • When our assumptions are violated

– Brightness constancy is not satisfied – The motion is not small – A point does not move like its neighbors

  • window size is too large
  • what is the ideal window size?

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

slide-28
SLIDE 28

– Can solve using Newton’s method (out of scope for this class) – Lukas-Kanade method does one iteration of Newton’s method

  • Better results are obtained via more iterations

Improving accuracy

  • Recall our small motion assumption
  • This is not exact

– To do better, we need to add higher order terms back in:

  • This is a polynomial root finding problem

It-1(x,y) It-1(x,y) It-1(x,y)

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

slide-29
SLIDE 29

When do the optical flow assumptions fail?

In other words, in what situations does the displacement of pixel patches not represent physical movement of points in space?

  • 1. Well, TV is based on illusory motion

– the set is stationary yet things seem to move

  • 2. A uniform rotating sphere

– nothing seems to move, yet it is rotating

  • 3. Changing directions or intensities of lighting can make things seem to move

– for example, if the specular highlight on a rotating sphere moves.

  • 4. Muscle movement can make some spots on a cheetah move opposite direction of motion.

– And infinitely more break downs of optical flow.

slide-30
SLIDE 30

Action Classification from Video

Recommended Paper to Read:

slide-31
SLIDE 31

Action Classification from Video

CNN + LSTM over sequence of frames

Figure from Carreira & Zisserman, 2018

slide-32
SLIDE 32

Recurrent Neural Network Cell

!"

#$$

ℎ& ℎ"

slide-33
SLIDE 33

Recurrent Neural Network Cell

!"

#$$

ℎ& ℎ" ℎ" = tanh(-

..ℎ& + - .0!")

slide-34
SLIDE 34

Recurrent Neural Network Cell

!"

#$$

ℎ& ℎ" ℎ" = tanh(-

..ℎ& + - .0!")

ℎ" 2" 2" = softmax(-

.8ℎ")

slide-35
SLIDE 35

Recurrent Neural Network Cell

!"

#$$

ℎ& ℎ" ℎ" '"

slide-36
SLIDE 36

Recurrent Neural Network Cell

!""

#$ = [0 0 1 0 0] ℎ+ = [0 0 0 0 0 0 0 ] ,$ = [0.1, 0.05, 0.05, 0.1, 0.7] ℎ$ = [0.1 0.2 0 − 0.3 − 0.1 ] ℎ$ = [0.1 0.2 0 − 0.3 − 0.1 ]

a b c d e

e (0.7)

c

slide-37
SLIDE 37

Recurrent Neural Network Cell

!"

#$$

ℎ& ℎ" ℎ" '"

slide-38
SLIDE 38

Recurrent Neural Network Cell

!"

#$$

ℎ& ℎ" ℎ"

slide-39
SLIDE 39

(Unrolled) Recurrent Neural Network

!"

#$$

ℎ& ℎ" ℎ" !'

#$$

ℎ' ℎ' !(

#$$

ℎ( ℎ( c a t a t <<space>> )" )' )(

slide-40
SLIDE 40

(Unrolled) Recurrent Neural Network

!"

#$$

ℎ& ℎ" ℎ" !'

#$$

ℎ' ℎ' !(

#$$

ℎ( ℎ( the cat likes cat likes eating )" )' )(

slide-41
SLIDE 41

(Unrolled) Recurrent Neural Network

!"

#$$

ℎ& ℎ" !'

#$$

ℎ' !(

#$$

ℎ( ℎ( the cat likes positive / negative sentiment rating )

slide-42
SLIDE 42

Action Classification from Video

3D CNN of consecutive frames across time

Figure from Carreira & Zisserman, 2018

slide-43
SLIDE 43

Action Classification from Video

Two Stream CNN: Images + Flow Map

Figure from Carreira & Zisserman, 2018

slide-44
SLIDE 44

Action Classification from Video

Figure from Carreira & Zisserman, 2018

Two Stream 3D CNN: Images + Flow Map

slide-45
SLIDE 45

Action Classification from Video

Figure from Carreira & Zisserman, 2018

Results on UCF101 actions

slide-46
SLIDE 46

Questions?

46