 
              Technische Universität München Model-based Visual Tracking: the OpenTL framework Giorgio Panin Technische Universität München · Institut für Informatik Lehrstuhl für Echtzeitsysteme und Robotik (Prof. Alois Knoll) Dr.–Ing. Giorgio Panin
Technische Universität München Contents • Object tracking: theory • Building applications with OpenTL Dr.–Ing. Giorgio Panin
Object tracking G. Panin 27.04.2010
Goal: Multi-Target / -Sensor / -Modal localization O 1 O 2 C i C 1 W G. Panin 27.04.2010
Model-based tracking Visual processing Localization (tracking) Model Video surveillance Control/navigation Face tracking ... G. Panin 27.04.2010
Ground shape O 2 O 1 W G. Panin 27.04.2010
Pose parameters – single-body transforms Base Euclidean Similarity Affine Homography 2D 3D Distances Angles Parallel lines Straight lines Invariant Angles Parallel lines Straight lines properties Parallel lines Straight lines Straight lines G. Panin 27.04.2010
Pose parameters – articulated body T W , l 0 T l 0 , l 1 W ρ T l 1 , l 2 T l 2 , l = 3 x T x x W W , l l 3 3 G. Panin 27.04.2010
Pose parameters – deformable shapes x x Link 0 T , 0 ( ) W 0 Link 0 y y T ( p ) W , 0 W z z y y x T , 0 ( ) W 0 x T ( p ) W , 0 W G. Panin 27.04.2010
Active Shape Model (2D) – face tracking Learning deformation modes from examples (PCA) G. Panin 27.04.2010
Object appearance Key-frames Texture map Active Appearance Model G. Panin 27.04.2010
Object dynamics 2nd order, Auto-Regressive model Process noise Damped-spring motion − = − + − + x x F ( x x ) F ( x x ) W w − − t 1 t 1 2 t 2 0 t x Motion type: specified by three main parameters x x • Average state: • Oscillation frequency : f Damping rate : β • G. Panin 27.04.2010
Object dynamics - examples White Noise Accelerati on f β Brownian Motion ( , undefined) = β = f 0 0 = = = 1 2 0 F 1 F 0 W 0 . 1 = = − = 1 2 0 F 2 F 1 W 0 . 1 Periodic, undamped : Aperiodic (criticall y damped) : Periodic, damped : = β = = β = f 0 . 1 0 = β = f 0 0 . 5 f 0 . 1 0 . 05 1 = 2 = − 0 = F 1 . 99 F 1 W 0 . 1 1 = 2 = − 0 = 1 = 2 = − 0 = F 1.9 F 0.9 W 2 . 12 F 1.98 F 0.99 W 0 . 88 G. Panin 27.04.2010
( ) = + − + s F s I F s W w Object dynamics – multi-dim. − 1 t t t t=10000 t=50 t=1000 Constrained   1.9 I - 0.9 I =   F   I 0    2 . 12 I  =   W   0   Unconstrained   I 0   = F   I 0    2 . 12 I  =   W   0   G. Panin 27.04.2010
Object model Shape Appearance Pose t 2 t 1 t 0 Dynamics t 1 t i t 3 t 2 t 0 t 0 t k G. Panin 27.04.2010
Camera model Intrinsic parameters Radial distortion Pin-hole model x C x c optical axis z c c y c y f retinal plane G. Panin 27.04.2010
Camera model Extrinsic parameters c c 2 1 x w w T T , c , w c w 2 1 G. Panin 27.04.2010
Camera model Object-to-image projection (and back-projection) y 1 x y 2 y C Depth map O T , W T , c w w o G. Panin 27.04.2010
Visual modalities Shape moments Intensity gradients Contour lines Color statistics Texture template Optical flow Local keypoints (and others: Background subtraction / CCD / Harris keypoints / Histogram of oriented gradients / SIFT) G. Panin 27.04.2010
The „tracking pipeline“ Data Pre- Targets Matching Output acquisition processing update Targets Off-line features Data On-line features prediction sampling fusion update Back-projection − s − s t t s t Rendered view New features Prediction Sampling model features Update model features − s t Re-projection Image features Data acquisition Pre-processing Matching G. Panin 27.04.2010
Abstraction: visual modality processing Rule: any modality class must implement • Model free pre-processing • Off-line and on-line features sampling and back-projection • Multi-level data association (Pixel-, Feature-, State-space) • Residuals, covariances and Jacobians computation Additional classes: • Multi-modal, multi-sensor data fusion (cascade, parallel) • Likelihood computation G. Panin 27.04.2010
Features sampling – GPU assisted G. Panin 27.04.2010
Multi-level visual processing e = z-h h(s) z Pixel-level measurement h(s) z Feature-level measurement z = s* (LSE estimate) Object-level measurement h = s - (predicted pose) G. Panin 27.04.2010
Measurement: pixel- vs. feature-level Analogy with fluid mechanics Eulerian = pixel-level Lagrangian = feature-level v v Dense optical flow (HS) Sparse optical flow (LK) G. Panin 27.04.2010
Feature-level: re-projection vs. tracking Model feature re-projection Feature tracking (= „flow“) G. Panin 27.04.2010
Feature-level: validation gates (local search) Prior density (state-space) − P x − ( s , ) 1 x 3 Innovation densities x 2 (measurement space) − ( y i S , ) i − y i S , i − y 1 , S 1 − − y 3 , S y 2 , S 3 2 G. Panin 27.04.2010
Example: color histograms Model Color segmentation Pixel-level matching Feature-level matching = histogram distance Object-level matching = mean-shift optimization G. Panin 27.04.2010
Example: intensity edges Pixel-level Draw the silhouette Match silhouette to the Distance Transform Feature-level sample contour points Re-project and search in the image G. Panin 27.04.2010
Building in OpenTL Multi-camera, multi-level data fusion Pix Color segmentation View 1 Pix Feat Weighted Blobs Average s - Motion Pix Joint likelihood − = P ( Z | s ) − − = ⋅ P ( Z | s ) P ( Z | s ) − blobs obj MLE Feat Edges View 2 Joint MLE Obj Keypoints Feat G. Panin 27.04.2010
Building processing trees in OpenTL Generalization of the tree Pix Mod 1 I 1 Pix Feat Static Mod 5 fusion s - Mod 2 Pix − P ( Z | s ) Dynamic fusion Feat Mod 3 I 2 Static fusion Obj Mod 4 Feat G. Panin 27.04.2010
Data fusion – multi-modal Background Color model Pixel-wise (AND) Benefits: • Combine independent information sources • Increase robustness (tracking fails if ALL modalities fail) Drawbacks: • Need to define a proper fusion scheme, and parameters • Higher computational effort  slower frame rate G. Panin 27.04.2010
Data fusion – multi-camera Complimentary setup Redundant setup G. Panin 27.04.2010
Data fusion – multi-camera Complimentary setup: Indoor people tracking Redundant setup: 3D hand tracking G. Panin 27.04.2010
Multi-target – occlusion handling Pixel-level Data (multi-class segmentation) Feature-level G. Panin 27.04.2010
Bayesian Tracking: prediction - correction Gaussian filters • (Extended) Kalman filter • Information filter • Unscented Kalman/Information filter Monte-Carlo filters • S-I-R particle filter • MCMC particle filter G. Panin 27.04.2010
Representing the target distribution P ( s | z ,..., z ) P ( s | z , z ,..., z ) Prior density Posterior density − − t t 1 0 t t t 1 0 True density ML / MAP Kalman Mix. Of Gaussians Unscented Kalman Kernel Particle Histogram ( Eulerian ) G. Panin 27.04.2010
Flow diagram of OpenTL-based applications Track Maintainance t I Local − Meas Obj t t processing t Bayesian Obj Post-processing t tracking Track Initiation I t ∆ t + Detection/ Obj − 1 Recognition t Obj − t 1 Models Degrees of Dynamics Shape Appearance Sensors Environment freedom G. Panin 27.04.2010
Object detection (examples) • General-purpose Monte-Carlo sampling in state-space • People detection based on foreground blobs clustering • Invariant keypoints matching (for textured objects) • Marker detection based on intensity edges • Hand detection based on color and edge lines • Face detection based on a trained classifier (with Haar features) G. Panin 27.04.2010
Resume: model-based object tracking Sensors Objects Environment Models Target Pre-processing Prediction Object Tracking Features Measurement Sampling Visual Tracking processing Data Occlusion fusion Handling Detection/ Target Data Recognition Update association G. Panin 27.04.2010
Building applications with OpenTL G. Panin 27.04.2010
G. Panin 27.04.2010
Features of OpenTL • Modularized, object-oriented software architecture • Common abstractions for layers • Real-time performance • Different Bayesian filters • A large variety of visual modalities, with multiple processing levels • Robust improvements (multi-hypotheses, data fusion, …) • Generic sensor abstraction • Support multi-camera, multi-target and multi-modal applications • Support for GPU acceleration G. Panin 27.04.2010
Recommend
More recommend