Moving Object Tracking
Harpreet S. Sawhney
hsawhney@sarnoff.com
Princeton University COS 429 Lecture
- Dec. 6, 2007
Moving Object Tracking Princeton University COS 429 Lecture Dec. - - PowerPoint PPT Presentation
Moving Object Tracking Princeton University COS 429 Lecture Dec. 6, 2007 Harpreet S. Sawhney hsawhney@sarnoff.com Recapitulation : Last Lecture Moving object detection as robust regression with outlier detection Simultaneous
hsawhney@sarnoff.com
– Position, Velocity, Shape, Color, Appearance, Template…
– Having seen what state do these measurements predict for the next time instant i? – Need a representation for
– Which of the measurements at the i-th instant correspond to the predicted state at that instant ? – Use to establish the correspondence
– With the corresponding measurement established for instant i, compute an estimate of the optimal new state through
} y , , y , y {
1
1
K ) y Y , , y Y | X ( P
1
1
i
= = K ) y Y , , y Y | X ( P
1
1
i
= = K
i
y ) y Y , , y Y | X ( P
i i i
= = K
representation to be tracked
a reference distribution
now well-known techniques studied in this class
aligned frames to detect changes designated as new objects
– Approach
– Problem with this approach is that it uses no appearance information
drift over time.
– State prediction – Optimal state updation
– Not just a change blob, or fixed template – Optimal method for updating the state
Rudolf Emil Rudolf Emil Kalman Kalman
Acknowledgment: much of the following material is based on the Acknowledgment: much of the following material is based on the SIGGRAPH 2001 course by Greg Welch and Gary Bishop (UNC) SIGGRAPH 2001 course by Greg Welch and Gary Bishop (UNC)
1 1
1 1
2 2 1
1
2 2 1
1
1 2 1 1 1 2 1 1 1 2
2 2 2 1 2 1 2 2 2 1 2 2 2 1
+σ σ σ σ σ σ σ
1 2 1 1 1 2 1 1 1 2
2 2 2 1 2 1 2 2 2 1 2 2 2 1
+σ σ σ σ σ σ σ
2 2 2 1
1 ˆ 1 2 2
σ σ
2 2 2 1
1 ˆ 1 2 2
σ σ
We have essentially computed the Least Squares OR Minimum Variance OR Maximum Likelihood estimate of X given a number
– in general, any vector
– (in this example, “nothing changes”)
– (possibly incomplete, possibly noisy)
– We assume there are moving objects, which have an underlying state X – There are measurements Z, some of which are functions of this state – There is a clock
– object is ball, state is 3D position+velocity, measurements are stereo pairs – object is person, state is body configuration, measurements are frames, clock is in camera (30 fps)
1
X
1
k
X
1 k
X +
k
Z
1 k+
Z
Measuement Model
State Variables: Those that tell us about objects & their states But they are hidden, cannot be directly observed
Dynamic Model
Measurements: Can be directly observed Are noisy, uncertain
1
1
1
1
k k k k k
k k
Posterior probability after latest measurement
k k
Likelihood of the current measurement
1
k
Temporal prior from the dynamic model
1
1
Posterior probability after previous measurement
Normalizing constant
1 1 1 − − −
k k k k
1 1 1 − − −
k k k k
k 1
k k
k 1
k k
k k k k k k k k
k k k k k k k k
k k dt dx k
k k dt dx k
Position-Velocity State Constant Velocity Dynamic Model Matrix Only position is directly
1 T 1 1 1 1 1ˆ − − − − − −
k k k k k k k k
1 T 1 1 1 1 1ˆ − − − − − −
k k k k k k k k
k k k k k k k k k
k k k k k k k k k
1 T T −
k k k k k k k
1 T T −
k k k k k k k
2 2 2 1 2 1
2 2 2 1 2 1
k k k k k
Predicted state Measurement Optimal Linear Estimate
Under Gaussian assumptions, linear estimate is the optimal
Estimation Error:
k k k
k k k k k k k k k
For an unbiased estimate:
k
k k k
k k k k k k
k
Is obtained by minimizing the variance of the state estimate
– Appearance, shape, specific object models : people, vehicles, etc.
– Track background as well as the foreground
– Compute the likelihood of the presence of a head-n-shoulders person model at a given location in the image
– Measurements need to be optimally associated with a set of models rather than a single model as in the previous examples
Results Video Stream
Tracking System
– detect new moving objects – maintain identity of objects, handle multiple objects and interactions between them. e.g. passing, stopped, etc. – provide information regarding the objects, e.g. shape, appearance and motion.
– motion of objects and background – shape of objects and support – appearance of objects
Local constraints Global constraints Multi-frame consistency Motion
Smooth dense flow:
Weiss 97
2D affine:
Darrell91, Wang93, Hsu94, Sawhney96, Weiss 96, Vasconcelos97
3D planar: Torr99 2D rotation and translation & constant velocity:
This paper
Segmentation
MRF segmentation prior:
Weiss96, Vasconcelos97
Background+Gaussian segmentation prior:
This paper - Section 2.1
Constant segmentation prior:
This paper - Elliptical shape prior
Appearance
Constant appearance:
This paper
t t t
β
Layer j
t
Background layer
– translation + rotation – constant velocity model
– planar surface
t t t
t
t
1 1 1 1 − − − − t t t t t t t t
motion, appearance, shape prior motion, appearance, shape prior motion, appearance, shape prior
t
] A , , [
t t t t
Φ Θ Λ = ) , | ( ) , , | ( arg max ) , , | ( arg max
1 1 1 1 1 1 − − − − − −
=
t t t t t t t t t t t
P P P
t t
I I I I I Λ Λ Λ Λ Λ Λ
Λ Λ
prior likelihood
– observation and parameter – objective function: – equivalent to iteratively improving conditional expectation
θ
1 1 1 1 1 − − − − −
t t t t t t t t t t t
t
– layer motion estimation based on current segmentation and appearance ⇒ weighted correlation or direct method – layer segmentation estimation ⇒ competition between motion layers – layer appearance estimation ⇒ Kalman filtering of appearance
estimate motion
t
Θ
estimate shape prior
t
update ownership
j i
estimate appearance
t
A
update ownership
j i
update ownership
j i
h ,
appearance t-1
frame t motion estimation shape estimation appearance estimation
shape prior t-1 motion t-1 appearance t shape prior t motion t
) / / 1 ( / ) ( / )) ( ( )) ( (
2 , 2 2 , 2 , , I j i A I i t j i A i j j t i j j t
h x I h x T A x T A σ σ σ σ + + =
2 , 1 , 3 , 2 . , 1 , , , , ,
/ ) ( / ) ) ( ( ) ( ) ( )) ( ) ( (
ls j t j t j t y j i n i i j t i i j t i j t i j i j t
s s s y x L x D x L x L x D h s f σ γ
− − =
− − − − = ∂ ∂
∑
2 , 1 , 3 , 2 . , 1 , , , , ,
/ ) ( / ) ) ( ( ) ( ) ( )) ( ) ( (
ls j t j t j t x j i n i i j t i i j t i j t i j i j t
l l l y x L x D x L x L x D h l f σ γ
− − =
− − − − = ∂ ∂
∑
Conditions
NS = normal SSD score OB = out of scope LT = NM for a long time ZM= zero motion estimation NB = new blob, no object covering a blob NM = no motion blob covering the
SI = significant increase of SSD
disappear new moving stop
NB NM & SI NM &!SI&ZM !NM&NS OB|LT !NS|OB|LT OB !NM&NS NM | OB !NM NM&!SI&!ZM NM |{!NM&!NS} NM&NS !NM & !OB
– Originally developed on a PC, ported to SGI Octane. 20-25 Hz for one object over a single processor.
Sarnoff VFE 200 SGI Octane Video Stream
(a) (b) (c)
(a) (b) (c)
(a) (b) (c)
(a) (b) (c)
– 95% of computation is for motion estimation. Currently, weighted SSD correlation is used. Searching in a 13x13 window at half resolution, for 3 different angles. The size of the object is around 40x40 pixels.
– change image is integrated into the formulation to further improve the robustness.
– appearance model for the background is not computed, instead, the previous image is used.
– Jepson et al.’s WSL model – A mixture model of appearance – Estimated incrementally using online EM
1 1 t l l t t w w t t s s t t t t
− −
stable parameters
) , (
2 , , t s t s t
σ μ = q ) , , (
l w s t
m m m = m
mixing probabilities
t s,
2 ,t s
1 − t
t
2 w
Wandering Process: constant variance Lost Process Stable Process: variance
1 1 1 1 1 1 , , − − − − − −
t t t t t t t j t j t t j
) ( , , t j k t j
) ( , ) 1 ( , , t s t s t s
2 , ) ( , ) 2 ( , 2 , t s t s t s t s
Updated mixing probabilities (0th order moments): Updated mean and variance of stable process:
) ( 1 , , ) ( , i t j i t t t j i t j
−
1 1 1 − − −
t t t t t t t
likelihood prior
1
R ∈ −
x
1
t t t
1 t t t
−
t
∈ − − −
1
1 1 1
t t t t t t
t
R x
2 1 1 2 1
− −
t t t t t
slow smooth
t
Adaptive Background Modeling Foreground Detection Frame-to- frame Tracking State Machine
Filtered Representation Filtered Background Model Detected change
Major Components of a Tracker
System state at time t At time (t+1)
System state at time (t+1)
Object (green box) as seen at time t. (latest model of appearance) Object appearance as learnt from recent past. (learnt model of appearance) Probabilistic visibility mask, brighter the pixel, more likely that it belongs to the object
motion estimation appearance estimation visibility estimation
Latest appearance model Updated learnt model Updated visibility mask
Tracker Block Diagram
Occlusion handling
System state at time t=1 System state at time t=8 System state at time t=16 System state at time t=27 Occlusion is detected at this frame Note learnt model is much more immune to occlusions than the latest model. The appearance models and visibility mask are still frozen to t=8 because of
The object reappears after
and visibility mask are updated
Sample Progress of the Tracker
– Resilient to environmental effects like wind-induced motion, heat- induced scintillation etc.
– Tunable for different scenarios: outdoors, indoors.
– Automatically adapts to smooth and sudden changes of appearance. – Automatically weights appearance and shape matching. – Precise motion estimation based on optical flow.
– Handles occlusions, and confusing events with multiple objects.
3D Tracking with Presence of Clutter and Multi-Camera Handoff
Video of Camera 1 and Camera 2 Camera 2 Handing-off from camera 1 to camera 2 Camera 1
3D Tracking in Outdoor Scenarios
Original video Video with entire mob being tracked simultaneously
different person in the image
distinguish between people and their shadows Depth Map Video
3D Tracking in Outdoor Scenarios
Original video Video with people and vehicles being tracked simultaneously Each color represents a different person/vehicle in the image Depth Map Video