multi object tracking mot visual and audio visual
play

Multi-object tracking (MOT): visual and audio-visual Daniel - PowerPoint PPT Presentation

Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland Outline MOT using Particle Filters


  1. Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland

  2. Outline � MOT using Particle Filters � Our work � Visual MOT with Distributed Partitioned Sampling [Smith et al, BMVC’04] � Audio-Visual MOT [Gatica et al, in preparation] � Conclusion

  3. MOT as Bayesian inference � the problem: given y t � image observations 1: � a state-space MO representation x , , ) ( 1 ,..., 1 ,..., ) ( 1: 1: K M K M � k k x x � k x t t t t t t t , N N i i � � � k x � � I � R t t � object state: � geometric transformations � discrete indices: head pose, speak � compute posterior or filtering distribution (x | y ) (x | y ) p p 0: 1: 1: t t t t

  4. Joint state space representation � M objects: a joint state � formal { } 1 ( 1 , 1 , 1 , 1 ) j 2 ( 2 , 2 , 2 , 2 ) x x � u v x � u v � � � � t � MO joint configuration: ( , 1 : ) ( , 1 ,..., ) M M X � M x � M x x t t t t object state vector: 3 ( 3 , 3 , 3 , 3 ) x � u v � � ( , , , ) j j j j j x � u v � � spk/no-spk 1 : 1 2 ( , ) ( , , ,..., ) M M X � M x � M x x x translation t t t t t scaling

  5. The basic MOT joint tracker assumptions: � each object has its own dynamics � marginally independent, but conditionally dependent given observations (explaining away) 1 1 1 x x x 1 1 t t t � � 2 2 2 x x x 1 1 t t t � � y y y 1 1 t t t � � t � (x ,y ) ( 1 ) ( 2 ) ( 1 | 1 ) ( 2 | 2 ) ( | 1:2 ) p p x p x p x x p x x p y x � 0: 1: 0 0 n n-1 n n-1 n t t n 1 n �

  6. Particle Filters for MOT Filtering distribution ( | ) ( | ) � ( | ) ( | ) p x y p y x p x x p x y dx � 1 : 1 1 1 : 1 1 t t t t t t � t � t � t � x 1 t � {( ( ) , ( ) ), 1 ,..., } i i x w i � N approximated with particle set t t � N ˆ ( | ) ( ) ( ( ) ) by i i p x y � w x � x � 1: N t t t t t 1. resample 1 i � ( | ) t+1 p x t y 2. prediction 1 t : M � � (x | x ) (x | x ) z z p p t t-1 t-1 t 1 z � ˆ ( | ) p x y 3. likelihood 1: N t t M � � ( | x ) ( z | x ) z p y p y t t t t 1 z �

  7. Complexity for Joint State Space � More objects: cost increases exponentially � Solution: sample more efficiently M N � N 1 M N 2 N 3 N 1

  8. Distributed Partitioned Sampling (DPS) for visual MOT

  9. Partitioned Sampling (PS) Reduces size of B x � search space � Searches each A x B Q � x 1 objects state sequentially � Samples moved to areas of high ’ � 0 . 1 Q Q 0.5 likelihood Example: 2 one- � dimensional objects’ configuration space 0 A x 0 1 0.2 [MacCormick, Isard, Blake, ECCV 2000]

  10. Partitioned Sampling (PS) Divide the space into M subspace partitions; search each sequentially � Block repeats for M objects … ( | ) ~ ( ’ | ) ~g ( | ) ( | ) p X Y p x t x p Y t X p X t Y 1 1 : 1 1 1 t : t � t � t � t prior dynamics likelihood resampling weighted resampling posterior Importance function g Weighted resampling � distribution “IS” using obs likelihood � Adverse effects � impoverishment � bias � particle representation

  11. PS: Ordering and Impoverishment Weighted resampling effects ordering � Impoverishment � Loss of multi-modality � Bias � Poor tracking quality � In general, ordering of objects is arbitrary � More objects, greater effect � Object # 1 2 3 4 5 6 7 impoverishment bias

  12. Distributed Partitioned Sampling (DPS) Block repeats for M objects {1 � �� Mixture components … ( ’ | ) ~g 1 p x t x 1 1 t � Assemble … ( | ) ~ ( | ) ( | ) p X Y p Y t X p X t Y 1 1 : 1 1 t : t � t � t {N �� -1)} � � prior likelihood … ( ’ | ) ~g C p x x 1 C t t � resampling posterior dynamics weighted resampling Each subset: PS in a different ordering circular shift: {1 �� -1)} � ��������� � �

  13. Results *200 particles, examples taken from 50 runs per scenario Joint PF PS DPS Joint PF PS DPS

  14. audio-visual MOT

  15. Audio-visual observation model � Visual 1: contour-based (wire on clutter), edges on normal lines � Visual 2: skin-blob-based precision/recall between configuration and skin blobs � GMM on features � � Audio: switching distribution around 2-D audio estimates , ( ( ) ) 2 ( ( ) ) 2 2 , ( ) � i est i est i K u � u � v � v � R � spk � � 1 t t t t t ( | x ( ) ) audio i p y � � t t , ( ( ) ) 2 ( ( ) ) 2 2 , ( ) _ i est i est i K u � u � v � v � R � no spk � � � 2 t t t t t

  16. Sampling using MCMC � MH sampler � Posterior as target distribution � Better candidates are almost always accepted � Particles where all objects have good guesses

  17. Results (1) Joint PF, contour-only likelihood, 2000p Joint PF, contour-blob likelihood, 1000p

  18. Results (2) Joint PF-MCMC, contour-blob likelihood, 500p Joint PF-MCMC, contour-blob likelihood, 500p, visual clutter

  19. Conclusion � visual tracking + DPS improves MOT because ordering matters + fairly distributes ordering effects + retains computational benefits of PS - not so good for low number of particles (e.g. <100) � audio-visual tracking + blob likelihood improves robustness + joint a-v likelihood allows for fast spk/non-spk switching + MCMC reduces complexity + currently: (re)-initialization + later: extension to more complex models

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend