Multi-object tracking (MOT): visual and audio-visual Daniel - - PowerPoint PPT Presentation
Multi-object tracking (MOT): visual and audio-visual Daniel - - PowerPoint PPT Presentation
Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland Outline MOT using Particle Filters
Outline
MOT using Particle Filters Our work Visual MOT with Distributed Partitioned Sampling
[Smith et al, BMVC’04]
Audio-Visual MOT [Gatica et al, in preparation] Conclusion
MOT as Bayesian inference
the problem: given image observations a state-space MO representation
- bject state:
geometric transformations discrete indices: head pose, speak compute posterior or filtering distribution
0: 1:
(x | y )
t t
p ,
I R
N N i i t t
k x
- 1:
y t
1:
(x | y )
t t
p
1 1 1: 1:
( ,..., ,..., ) (
x , , )
K M K M t t t t t t t
k k x x k x
Joint state space representation
M objects: a joint state formal MO joint configuration:
} {
j
t
x
1 1 1 1 1
( , , , ) x u v
- 2
2 2 2 2
( , , , ) x u v
- 3
3 3 3 3
( , , , ) x u v
- )
,..., , , ( ) , (
2 1 : 1 M t t t M t
x x x M x M X
t
- )
,..., , ( ) , (
1 : 1 M t t M t t
x x M x M X
- (
, , , )
j j j j j
x u v
- bject state vector:
spk/no-spk translation scaling
The basic MOT joint tracker
1 1 t
x
- 1
t
x
1 1 t
x
- 2
1 t
x
- 2
t
x
2 1 t
x
- 1
t
y
- t
y
1 t
y
- 1
2 1 1 2 2 1:2 0: 1: n n-1 n n-1 n 1
(x ,y ) ( ) ( ) ( | ) ( | ) ( | )
t t t n n
p p x p x p x x p x x p y x
- assumptions:
each object has its own dynamics marginally independent, but conditionally dependent given
- bservations (explaining away)
Particle Filters for MOT
Filtering distribution approximated with particle set by
} ,..., 1 ), , {(
) ( ) (
N i w x
i t i t
- ( )
( ) 1: 1
ˆ ( | ) ( )
N i i N t t t t t i
p x y w x x
- 1
1 1 : 1 1 1 : 1
) | ( ) | ( ) | ( ) | (
t
x t t t t t t t t t
dx y x p x x p x y p y x p
) | (
: 1t t y
x p
1:
ˆ ( | )
N t t
p x y
t+1
- 1. resample
- 2. prediction
- 3. likelihood
z z t t t 1
( | x ) ( | x )
M t z
p y p y
- z
z t t-1 t-1 1
(x | x ) (x | x )
M t z
p p
3
N
Complexity for Joint State Space
More objects: cost increases exponentially Solution: sample more efficiently
M M
N N
1
- 1
N
2
N
Distributed Partitioned Sampling (DPS) for visual MOT
Partitioned Sampling (PS)
A
x
B
x 1 1
B Ax
x Q
Q Q 1 . ’
- Reduces size of
search space
Searches each
- bjects state
sequentially
Samples moved
to areas of high likelihood
- Example: 2 one-
dimensional objects’ configuration space
0.2 0.5
[MacCormick, Isard, Blake, ECCV 2000]
Partitioned Sampling (PS)
~ ~g ) | (
1 : 1 1
- t
t
Y X p ) | ’ (
1
- t
t x
x p … ) | (
t t X
Y p ) | (
: 1t t Y
X p prior resampling dynamics weighted resampling likelihood posterior Block repeats for M objects
- Divide the space into M subspace partitions; search each sequentially
- Weighted resampling
- “IS” using obs likelihood
- Adverse effects
- impoverishment
- bias
distribution Importance function g particle representation
PS: Ordering and Impoverishment
- Weighted resampling effects
- rdering
- Impoverishment
- Loss of multi-modality
- Bias
- Poor tracking quality
- In general, ordering of objects is arbitrary
- More objects, greater effect
1 2 3 4 5 6 7
impoverishment bias
Object #
Distributed Partitioned Sampling (DPS)
~ ~gC ) | (
1 : 1 1
- t
t
Y X p ) | ’ (
1
- t
t C
x x p … ) | (
t t X
Y p ) | (
: 1t t Y
X p prior resampling dynamics weighted resampling likelihood posterior Block repeats for M objects ~g1 ) | ’ (
1 1
- t
t x
x p … … Mixture components Assemble
{1
- {N
- 1)}
Each subset: PS in a different ordering circular shift: {1
- 1)}
Results
*200 particles, examples taken from 50 runs per scenario
Joint PF PS DPS Joint PF PS DPS
audio-visual MOT
Audio-visual observation model
Visual 1: contour-based (wire on clutter), edges on normal lines Visual 2: skin-blob-based
- precision/recall between configuration and skin blobs
- GMM on features
Audio: switching distribution around 2-D audio estimates
( ) 2 ( ) 2 2 ( ) 1 t t t t t ( ) t t ( ) 2 ( ) 2 2 ( ) 2 t t t t t
, ( ) ( ) , ( | x ) , ( ) ( ) , _
i est i est i audio i i est i est i
K u u v v R spk p y K u u v v R no spk
Sampling using MCMC
MH sampler Posterior as target distribution Better candidates are almost always accepted Particles where all objects have good guesses
Results (1)
Joint PF, contour-only likelihood, 2000p Joint PF, contour-blob likelihood, 1000p
Results (2)
Joint PF-MCMC, contour-blob likelihood, 500p, visual clutter Joint PF-MCMC, contour-blob likelihood, 500p
Conclusion
visual tracking
+ DPS improves MOT because ordering matters + fairly distributes ordering effects + retains computational benefits of PS
- not so good for low number of particles (e.g. <100)
audio-visual tracking