Multi-object tracking (MOT): visual and audio-visual Daniel - - PowerPoint PPT Presentation

multi object tracking mot visual and audio visual
SMART_READER_LITE
LIVE PREVIEW

Multi-object tracking (MOT): visual and audio-visual Daniel - - PowerPoint PPT Presentation

Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland Outline MOT using Particle Filters


slide-1
SLIDE 1

Multi-object tracking (MOT): visual and audio-visual

Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland

slide-2
SLIDE 2

Outline

MOT using Particle Filters Our work Visual MOT with Distributed Partitioned Sampling

[Smith et al, BMVC’04]

Audio-Visual MOT [Gatica et al, in preparation] Conclusion

slide-3
SLIDE 3

MOT as Bayesian inference

the problem: given image observations a state-space MO representation

  • bject state:

geometric transformations discrete indices: head pose, speak compute posterior or filtering distribution

0: 1:

(x | y )

t t

p ,

I R

N N i i t t

k x

  • 1:

y t

1:

(x | y )

t t

p

1 1 1: 1:

( ,..., ,..., ) (

x , , )

K M K M t t t t t t t

k k x x k x

slide-4
SLIDE 4

Joint state space representation

M objects: a joint state formal MO joint configuration:

} {

j

t

x

1 1 1 1 1

( , , , ) x u v

  • 2

2 2 2 2

( , , , ) x u v

  • 3

3 3 3 3

( , , , ) x u v

  • )

,..., , , ( ) , (

2 1 : 1 M t t t M t

x x x M x M X

t

  • )

,..., , ( ) , (

1 : 1 M t t M t t

x x M x M X

  • (

, , , )

j j j j j

x u v

  • bject state vector:

spk/no-spk translation scaling

slide-5
SLIDE 5

The basic MOT joint tracker

1 1 t

x

  • 1

t

x

1 1 t

x

  • 2

1 t

x

  • 2

t

x

2 1 t

x

  • 1

t

y

  • t

y

1 t

y

  • 1

2 1 1 2 2 1:2 0: 1: n n-1 n n-1 n 1

(x ,y ) ( ) ( ) ( | ) ( | ) ( | )

t t t n n

p p x p x p x x p x x p y x

  • assumptions:

each object has its own dynamics marginally independent, but conditionally dependent given

  • bservations (explaining away)
slide-6
SLIDE 6

Particle Filters for MOT

Filtering distribution approximated with particle set by

} ,..., 1 ), , {(

) ( ) (

N i w x

i t i t

  • ( )

( ) 1: 1

ˆ ( | ) ( )

N i i N t t t t t i

p x y w x x

  • 1

1 1 : 1 1 1 : 1

) | ( ) | ( ) | ( ) | (

t

x t t t t t t t t t

dx y x p x x p x y p y x p

) | (

: 1t t y

x p

1:

ˆ ( | )

N t t

p x y

t+1

  • 1. resample
  • 2. prediction
  • 3. likelihood

z z t t t 1

( | x ) ( | x )

M t z

p y p y

  • z

z t t-1 t-1 1

(x | x ) (x | x )

M t z

p p

slide-7
SLIDE 7

3

N

Complexity for Joint State Space

More objects: cost increases exponentially Solution: sample more efficiently

M M

N N

1

  • 1

N

2

N

slide-8
SLIDE 8

Distributed Partitioned Sampling (DPS) for visual MOT

slide-9
SLIDE 9

Partitioned Sampling (PS)

A

x

B

x 1 1

B Ax

x Q

Q Q 1 . ’

  • Reduces size of

search space

Searches each

  • bjects state

sequentially

Samples moved

to areas of high likelihood

  • Example: 2 one-

dimensional objects’ configuration space

0.2 0.5

[MacCormick, Isard, Blake, ECCV 2000]

slide-10
SLIDE 10

Partitioned Sampling (PS)

~ ~g ) | (

1 : 1 1

  • t

t

Y X p ) | ’ (

1

  • t

t x

x p … ) | (

t t X

Y p ) | (

: 1t t Y

X p prior resampling dynamics weighted resampling likelihood posterior Block repeats for M objects

  • Divide the space into M subspace partitions; search each sequentially
  • Weighted resampling
  • “IS” using obs likelihood
  • Adverse effects
  • impoverishment
  • bias

distribution Importance function g particle representation

slide-11
SLIDE 11

PS: Ordering and Impoverishment

  • Weighted resampling effects
  • rdering
  • Impoverishment
  • Loss of multi-modality
  • Bias
  • Poor tracking quality
  • In general, ordering of objects is arbitrary
  • More objects, greater effect

1 2 3 4 5 6 7

impoverishment bias

Object #

slide-12
SLIDE 12

Distributed Partitioned Sampling (DPS)

~ ~gC ) | (

1 : 1 1

  • t

t

Y X p ) | ’ (

1

  • t

t C

x x p … ) | (

t t X

Y p ) | (

: 1t t Y

X p prior resampling dynamics weighted resampling likelihood posterior Block repeats for M objects ~g1 ) | ’ (

1 1

  • t

t x

x p … … Mixture components Assemble

{1

  • {N
  • 1)}

Each subset: PS in a different ordering circular shift: {1

  • 1)}
slide-13
SLIDE 13

Results

*200 particles, examples taken from 50 runs per scenario

Joint PF PS DPS Joint PF PS DPS

slide-14
SLIDE 14

audio-visual MOT

slide-15
SLIDE 15

Audio-visual observation model

Visual 1: contour-based (wire on clutter), edges on normal lines Visual 2: skin-blob-based

  • precision/recall between configuration and skin blobs
  • GMM on features

Audio: switching distribution around 2-D audio estimates

( ) 2 ( ) 2 2 ( ) 1 t t t t t ( ) t t ( ) 2 ( ) 2 2 ( ) 2 t t t t t

, ( ) ( ) , ( | x ) , ( ) ( ) , _

i est i est i audio i i est i est i

K u u v v R spk p y K u u v v R no spk

slide-16
SLIDE 16

Sampling using MCMC

MH sampler Posterior as target distribution Better candidates are almost always accepted Particles where all objects have good guesses

slide-17
SLIDE 17

Results (1)

Joint PF, contour-only likelihood, 2000p Joint PF, contour-blob likelihood, 1000p

slide-18
SLIDE 18

Results (2)

Joint PF-MCMC, contour-blob likelihood, 500p, visual clutter Joint PF-MCMC, contour-blob likelihood, 500p

slide-19
SLIDE 19

Conclusion

visual tracking

+ DPS improves MOT because ordering matters + fairly distributes ordering effects + retains computational benefits of PS

  • not so good for low number of particles (e.g. <100)

audio-visual tracking

+ blob likelihood improves robustness + joint a-v likelihood allows for fast spk/non-spk switching + MCMC reduces complexity + currently: (re)-initialization + later: extension to more complex models