Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie - - PowerPoint PPT Presentation

mean shift tracker
SMART_READER_LITE
LIVE PREVIEW

Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie - - PowerPoint PPT Presentation

Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Mean Shift Algorithm A mode seeking algorithm Fukunaga &


slide-1
SLIDE 1

Mean-Shift Tracker

16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

slide-2
SLIDE 2
slide-3
SLIDE 3

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

slide-4
SLIDE 4

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Find the region of highest density

slide-5
SLIDE 5

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Pick a point

slide-6
SLIDE 6

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Draw a window

slide-7
SLIDE 7

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-8
SLIDE 8

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-9
SLIDE 9

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-10
SLIDE 10

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-11
SLIDE 11

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-12
SLIDE 12

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-13
SLIDE 13

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-14
SLIDE 14

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-15
SLIDE 15

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-16
SLIDE 16

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-17
SLIDE 17

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-18
SLIDE 18

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-19
SLIDE 19

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-20
SLIDE 20

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-21
SLIDE 21

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Compute the mean

slide-22
SLIDE 22

Mean Shift Algorithm

Fukunaga & Hostetler (1975)

A ‘mode seeking’ algorithm

Shift the window

slide-23
SLIDE 23

Kernel Density Estimation

16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

slide-24
SLIDE 24

Kernel Density Estimation

Approximate the underlying PDF from samples Put ‘bump’ on every sample to approximate the PDF

To understand the mean shift algorithm …

slide-25
SLIDE 25

probability density function

1 2 3 4 5 6 7 8 9 10

cumulative density function p(x)

slide-26
SLIDE 26

randomly sample

1

slide-27
SLIDE 27

randomly sample

1

slide-28
SLIDE 28

randomly sample

1

samples

slide-29
SLIDE 29

samples

place Gaussian bumps on the samples…

slide-30
SLIDE 30

samples

slide-31
SLIDE 31

samples

slide-32
SLIDE 32

samples

slide-33
SLIDE 33

1 2 3 4 5 6 7 8 9 10

samples

Kernel Density Estimate approximates the

  • riginal PDF
slide-34
SLIDE 34

Kernel Density Estimation

Approximate the underlying PDF from samples from it Put ‘bump’ on every sample to approximate the PDF p(x) = X

i

cie− (x−xi)2

2σ2 Gaussian ‘bump’ aka ‘kernel’

slide-35
SLIDE 35

K(x, x0)

Kernel Function

a ‘distance’ between two points

slide-36
SLIDE 36

Epanechnikov kernel Uniform kernel Normal kernel K(x, x0) = c exp ✓1 2kx x0k2 ◆ K(x, x0) = ⇢ c kx x0k2  1

  • therwise

K(x, x0) = ⇢ c(1 kx x0k2) kx x0k2  1

  • therwise

Radially symmetric kernels

slide-37
SLIDE 37

Radially symmetric kernels

K(x, x0) = c · k(kx x0k2)

profile …can be written in terms of its profile

slide-38
SLIDE 38

Connecting KDE and the Mean Shift Algorithm

slide-39
SLIDE 39

Mean-Shift Tracker

16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

slide-40
SLIDE 40

{xs}S

s=1

Mean-Shift Tracking

Given a set of points: and a kernel: Find the mean sample point:

K(x, x0) =

xs ∈ Rd

x

slide-41
SLIDE 41

Mean-Shift Algorithm

While Initialize m(x) = P

s K(x, xs)xs

P

s K(x, xs)

  • 1. Compute mean-shift

x

  • 2. Update

Where does this algorithm come from? v(x) = m(x) − x v(x) > ✏ x ← x + v(x)

slide-42
SLIDE 42

While Initialize x Where does this algorithm come from?

Where does this come from?

Mean-Shift Algorithm

  • 2. Update

v(x) > ✏ x ← x + v(x)

  • 1. Compute mean-shift

m(x) = P

s K(x, xs)xs

P

s K(x, xs)

v(x) = m(x) − x

slide-43
SLIDE 43

Kernel density estimate

(radially symmetric kernels)

P(x) = 1 N c X

n

k(kx xnk2) Gradient of the PDF is related to the mean shift vector How is the KDE related to the mean shift algorithm? rP(x) / m(x)

The mean shift is a ‘step’ in the direction of the gradient of the KDE

Recall: We can show that:

slide-44
SLIDE 44

In mean-shift tracking, we are trying to find this which means we are trying to…

slide-45
SLIDE 45

We are trying to optimize this:

x = arg max x P(x) = arg max x 1 N c X

n

k(||x − xn||2)

usually non-linear

How do we optimize this non-linear function?

non-parametric

slide-46
SLIDE 46

We are trying to optimize this:

x = arg max x P(x) = arg max x 1 N c X

n

k(||x − xn||2)

How do we optimize this non-linear function?

compute partial derivatives, gradient descent usually non-linear non-parametric

slide-47
SLIDE 47

P(x) = 1 N c X

n

k(kx xnk2)

Compute the gradient

slide-48
SLIDE 48

P(x) = 1 N c X

n

k(kx xnk2) rP(x) = 1 N c X

n

rk(kx xnk2)

Gradient Expand the gradient (algebra)

slide-49
SLIDE 49

P(x) = 1 N c X

n

k(kx xnk2) rP(x) = 1 N c X

n

rk(kx xnk2) rP(x) = 1 N 2c X

n

(x xn)k0(kx xnk2)

Gradient Expand gradient

slide-50
SLIDE 50

P(x) = 1 N c X

n

k(kx xnk2) rP(x) = 1 N c X

n

rk(kx xnk2) rP(x) = 1 N 2c X

n

(x xn)k0(kx xnk2)

Gradient Expand gradient Call the gradient of the kernel function g

k0(·) = −g(·)

slide-51
SLIDE 51

P(x) = 1 N c X

n

k(kx xnk2) rP(x) = 1 N c X

n

rk(kx xnk2) rP(x) = 1 N 2c X

n

(x xn)k0(kx xnk2) rP(x) = 1 N 2c X

n

(xn x)g(kx xnk2) k0(·) = −g(·)

Gradient change of notation

(kernel-shadow pairs)

Expand gradient

keep this in memory:

slide-52
SLIDE 52

rP(x) = 1 N 2c X

n

(xn x)g(kx xnk2) rP(x) = 1 N 2c X

n

xng(kx xnk2) 1 N 2c X

n

xg(kx xnk2)

multiply it out too long! (use short hand notation)

rP(x) = 1 N 2c X

n

xngn 1 N 2c X

n

xgn

slide-53
SLIDE 53

rP(x) = 1 N 2c X

n

xngn 1 N 2c X

n

xgn rP(x) = 1 N 2c X

n

xngn ✓P

n gn

P

n gn

◆ 1 N 2c X

n

xgn rP(x) = 1 N 2c X

n

gn ✓P

n xngn

P

n gn

x ◆

multiply by one! collecting like terms… Does this look familiar?

slide-54
SLIDE 54

rP(x) = 1 N 2c X

n

gn ✓P

n xngn

P

n gn

x ◆ m(x) = ✓P

n xngn

P

n gn

x ◆ = rP(x)

1 N 2c P n gn

The mean shift is a ‘step’ in the direction of the gradient of the KDE

{

mean shift! mean shift

Gradient ascent with adaptive step size

v(x)

{

constant

slide-55
SLIDE 55

Mean-Shift Algorithm

While Initialize m(x) = P

s K(x, xs)xs

P

s K(x, xs)

  • 1. Compute mean-shift

x

  • 2. Update

v(x) = m(x) − x v(x) > ✏ x ← x + v(x)

rP(x)

1 N 2c P n gn

gradient with adaptive step size

slide-56
SLIDE 56

Everything up to now has been about distributions over samples…

slide-57
SLIDE 57

Dealing with images

Pixels for a lattice, spatial density is the same everywhere! What can we do?

slide-58
SLIDE 58

Consider a set of points: {xs}S

s=1

xs ∈ Rd Sample mean: Mean shift: m(x) − x Associated weights: w(xs) m(x) = P

s K(x, xs)w(xs)xs

P

s K(x, xs)w(xs)

slide-59
SLIDE 59

Mean-Shift Algorithm

While Initialize

  • 1. Compute mean-shift

x

  • 2. Update

v(x) = m(x) − x v(x) > ✏ x ← x + v(x) m(x) = P

s K(x, xs)w(xs)xs

P

s K(x, xs)w(xs)

slide-60
SLIDE 60

For images, each pixel is point with a weight

slide-61
SLIDE 61

For images, each pixel is point with a weight

slide-62
SLIDE 62

For images, each pixel is point with a weight

slide-63
SLIDE 63

For images, each pixel is point with a weight

slide-64
SLIDE 64

For images, each pixel is point with a weight

slide-65
SLIDE 65

For images, each pixel is point with a weight

slide-66
SLIDE 66

For images, each pixel is point with a weight

slide-67
SLIDE 67

For images, each pixel is point with a weight

slide-68
SLIDE 68

For images, each pixel is point with a weight

slide-69
SLIDE 69

For images, each pixel is point with a weight

slide-70
SLIDE 70

For images, each pixel is point with a weight

slide-71
SLIDE 71

For images, each pixel is point with a weight

slide-72
SLIDE 72

For images, each pixel is point with a weight

slide-73
SLIDE 73

For images, each pixel is point with a weight

slide-74
SLIDE 74

Finally… mean shift tracking in video

slide-75
SLIDE 75

Frame 1 Frame 2

‘target’

x center coordinate

  • f target

center coordinate

  • f candidate

y Goal: find the best candidate location in frame 2 Use the mean shift algorithm to find the best candidate location

‘candidate’

there are many ‘candidates’ but only one ‘target’

slide-76
SLIDE 76

Non-rigid object tracking

slide-77
SLIDE 77

Target Compute a descriptor for the target

slide-78
SLIDE 78

Target Candidate Search for similar descriptor in neighborhood in next frame

slide-79
SLIDE 79

Target Compute a descriptor for the new target

slide-80
SLIDE 80

Target Candidate Search for similar descriptor in neighborhood in next frame

slide-81
SLIDE 81

How do we model the target and candidate regions?

slide-82
SLIDE 82

Modeling the target

q = {q1, . . . , qM}

M-dimensional target descriptor

A normalized color histogram (weighted by distance)

Kronecker delta function function of inverse distance (weight) Normalization factor

(centered at target center)

qm = C X

n

k(kxnk2)δ[b(xn) m]

a ‘fancy’ (confusing) way to write a weighted histogram

sum over all pixels quantization function bin ID

slide-83
SLIDE 83

Modeling the candidate

M-dimensional candidate descriptor p(y) = {p1(y), . . . , pM(y}

(centered at location y)

pm = Ch X

n

k

  • y − xn

h

  • 2!

δ[b(xn) − m]

bandwidth

y0

a weighted histogram at y

slide-84
SLIDE 84

Similarity between the target and candidate

Bhattacharyya Coefficient

Just the Cosine distance between two unit vectors

ρ(y) ≡ ρ[p(y), q] = X

m

p pm(y)qu

ρ(y) = cos θy = p(y)>q kpkkqk = X

m

p pm(y)qm

θ

p(y)

q

d(y) = p 1 − ρ[p(y), q]

Distance function

slide-85
SLIDE 85

Now we can compute the similarity between a target and multiple candidate regions

slide-86
SLIDE 86

target similarity over image image ρ[p(y), q] p(y) q

slide-87
SLIDE 87

target similarity over image image

we want to find this peak

ρ[p(y), q] p(y) q

slide-88
SLIDE 88

Objective function

Assuming a good initial guess

ρ[p(y0 + y), q]

Linearize around the initial guess (Taylor series expansion)

derivative function at specified value

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + 1 2 X

m

pm(y) r qm pm(y0) max y ρ[p(y), q] min y d(y)

same as

slide-89
SLIDE 89

Remember definition of this?

pm = Ch X

n

k

  • y − xn

h

  • 2!

δ[b(xn) − m]

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + 1 2 X

m

pm(y) r qm pm(y0)

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + 1 2 X

m

( Ch X

n

k

  • y − xn

h

  • 2!

δ[b(xn) − m] ) r qm pm(y0)

Linearized objective Fully expanded

slide-90
SLIDE 90

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + Ch 2 X

n

wnk

  • y − xn

h

  • 2!

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + 1 2 X

m

( Ch X

n

k

  • y − xn

h

  • 2!

δ[b(xn) − m] ) r qm pm(y0)

wn = X

m

r qm pm(y0)δ[b(xn) − m] where

Does not depend on unknown y Weighted kernel density estimate

qm > pm(y0)

Weight is bigger when

Fully expanded linearized objective Moving terms around…

slide-91
SLIDE 91

OK, why are we doing all this math?

slide-92
SLIDE 92

max y ρ[p(y), q]

We want to maximize this

slide-93
SLIDE 93

max y ρ[p(y), q]

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + Ch 2 X

n

wnk

  • y − xn

h

  • 2!

wn = X

m

r qm pm(y0)δ[b(xn) − m] where

Fully expanded linearized objective

We want to maximize this

slide-94
SLIDE 94

max y ρ[p(y), q]

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + Ch 2 X

n

wnk

  • y − xn

h

  • 2!

wn = X

m

r qm pm(y0)δ[b(xn) − m] where

Fully expanded linearized objective

doesn’t depend on unknown y

We want to maximize this

slide-95
SLIDE 95

max y ρ[p(y), q]

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + Ch 2 X

n

wnk

  • y − xn

h

  • 2!

wn = X

m

r qm pm(y0)δ[b(xn) − m] where

Fully expanded linearized objective

doesn’t depend on unknown y

We want to maximize this

  • nly need to

maximize this!

slide-96
SLIDE 96

max y ρ[p(y), q]

ρ[p(y), q] ≈ 1 2 X

m

p pm(y0)qm + Ch 2 X

n

wnk

  • y − xn

h

  • 2!

wn = X

m

r qm pm(y0)δ[b(xn) − m] where

Fully expanded linearized objective

doesn’t depend on unknown y what can we use to solve this weighted KDE?

Mean Shift Algorithm!

We want to maximize this

slide-97
SLIDE 97

Ch 2 X

n

wnk

  • y − xn

h

  • 2!

the new sample of mean of this KDE is

(this was derived earlier)

y1 = P

n xnwng

  • y0−xn

h

  • 2◆

P

n wng

  • y0−xn

h

  • 2◆

(new candidate location)

slide-98
SLIDE 98

Mean-Shift Object Tracking

  • 1. Initialize location


Compute
 Compute

  • 2. Derive weights
  • 3. Shift to new candidate location (mean shift)
  • 4. Compute
  • 5. If return


Otherwise and go back to 2

y0 q p(y0) wn y1 p(y1) ky0 y1k < ✏ y0 ← y1 For each frame:

slide-99
SLIDE 99

Target Compute a descriptor for the target q

slide-100
SLIDE 100

Target Candidate Search for similar descriptor in neighborhood in next frame

max y ρ[p(y), q]

slide-101
SLIDE 101

Target Compute a descriptor for the new target q

slide-102
SLIDE 102

Target Candidate Search for similar descriptor in neighborhood in next frame

max y ρ[p(y), q]

slide-103
SLIDE 103