Multi-object tracking (MOT): visual and audio-visual Daniel - PowerPoint PPT Presentation

Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland

Outline � MOT using Particle Filters � Our work � Visual MOT with Distributed Partitioned Sampling [Smith et al, BMVC’04] � Audio-Visual MOT [Gatica et al, in preparation] � Conclusion

MOT as Bayesian inference � the problem: given y t � image observations 1: � a state-space MO representation x , , ) ( 1 ,..., 1 ,..., ) ( 1: 1: K M K M � k k x x � k x t t t t t t t , N N i i � � � k x � � I � R t t � object state: � geometric transformations � discrete indices: head pose, speak � compute posterior or filtering distribution (x | y ) (x | y ) p p 0: 1: 1: t t t t

Joint state space representation � M objects: a joint state � formal { } 1 ( 1 , 1 , 1 , 1 ) j 2 ( 2 , 2 , 2 , 2 ) x x � u v x � u v � � � � t � MO joint configuration: ( , 1 : ) ( , 1 ,..., ) M M X � M x � M x x t t t t object state vector: 3 ( 3 , 3 , 3 , 3 ) x � u v � � ( , , , ) j j j j j x � u v � � spk/no-spk 1 : 1 2 ( , ) ( , , ,..., ) M M X � M x � M x x x translation t t t t t scaling

The basic MOT joint tracker assumptions: � each object has its own dynamics � marginally independent, but conditionally dependent given observations (explaining away) 1 1 1 x x x 1 1 t t t � � 2 2 2 x x x 1 1 t t t � � y y y 1 1 t t t � � t � (x ,y ) ( 1 ) ( 2 ) ( 1 | 1 ) ( 2 | 2 ) ( | 1:2 ) p p x p x p x x p x x p y x � 0: 1: 0 0 n n-1 n n-1 n t t n 1 n �

Particle Filters for MOT Filtering distribution ( | ) ( | ) � ( | ) ( | ) p x y p y x p x x p x y dx � 1 : 1 1 1 : 1 1 t t t t t t � t � t � t � x 1 t � {( ( ) , ( ) ), 1 ,..., } i i x w i � N approximated with particle set t t � N ˆ ( | ) ( ) ( ( ) ) by i i p x y � w x � x � 1: N t t t t t 1. resample 1 i � ( | ) t+1 p x t y 2. prediction 1 t : M � � (x | x ) (x | x ) z z p p t t-1 t-1 t 1 z � ˆ ( | ) p x y 3. likelihood 1: N t t M � � ( | x ) ( z | x ) z p y p y t t t t 1 z �

Complexity for Joint State Space � More objects: cost increases exponentially � Solution: sample more efficiently M N � N 1 M N 2 N 3 N 1

Distributed Partitioned Sampling (DPS) for visual MOT

Partitioned Sampling (PS) Reduces size of B x � search space � Searches each A x B Q � x 1 objects state sequentially � Samples moved to areas of high ’ � 0 . 1 Q Q 0.5 likelihood Example: 2 one- � dimensional objects’ configuration space 0 A x 0 1 0.2 [MacCormick, Isard, Blake, ECCV 2000]

Partitioned Sampling (PS) Divide the space into M subspace partitions; search each sequentially � Block repeats for M objects … ( | ) ~ ( ’ | ) ~g ( | ) ( | ) p X Y p x t x p Y t X p X t Y 1 1 : 1 1 1 t : t � t � t � t prior dynamics likelihood resampling weighted resampling posterior Importance function g Weighted resampling � distribution “IS” using obs likelihood � Adverse effects � impoverishment � bias � particle representation

PS: Ordering and Impoverishment Weighted resampling effects ordering � Impoverishment � Loss of multi-modality � Bias � Poor tracking quality � In general, ordering of objects is arbitrary � More objects, greater effect � Object # 1 2 3 4 5 6 7 impoverishment bias

Distributed Partitioned Sampling (DPS) Block repeats for M objects {1 � �� Mixture components … ( ’ | ) ~g 1 p x t x 1 1 t � Assemble … ( | ) ~ ( | ) ( | ) p X Y p Y t X p X t Y 1 1 : 1 1 t : t � t � t {N �� -1)} � � prior likelihood … ( ’ | ) ~g C p x x 1 C t t � resampling posterior dynamics weighted resampling Each subset: PS in a different ordering circular shift: {1 �� -1)} � ��

Results *200 particles, examples taken from 50 runs per scenario Joint PF PS DPS Joint PF PS DPS

audio-visual MOT

Audio-visual observation model � Visual 1: contour-based (wire on clutter), edges on normal lines � Visual 2: skin-blob-based precision/recall between configuration and skin blobs � GMM on features � � Audio: switching distribution around 2-D audio estimates , ( ( ) ) 2 ( ( ) ) 2 2 , ( ) � i est i est i K u � u � v � v � R � spk � � 1 t t t t t ( | x ( ) ) audio i p y � � t t , ( ( ) ) 2 ( ( ) ) 2 2 , ( ) _ i est i est i K u � u � v � v � R � no spk � � � 2 t t t t t

Sampling using MCMC � MH sampler � Posterior as target distribution � Better candidates are almost always accepted � Particles where all objects have good guesses

Results (1) Joint PF, contour-only likelihood, 2000p Joint PF, contour-blob likelihood, 1000p

Results (2) Joint PF-MCMC, contour-blob likelihood, 500p Joint PF-MCMC, contour-blob likelihood, 500p, visual clutter

Conclusion � visual tracking + DPS improves MOT because ordering matters + fairly distributes ordering effects + retains computational benefits of PS - not so good for low number of particles (e.g. <100) � audio-visual tracking + blob likelihood improves robustness + joint a-v likelihood allows for fast spk/non-spk switching + MCMC reduces complexity + currently: (re)-initialization + later: extension to more complex models

Multi-object tracking (MOT): visual and audio-visual Daniel - PowerPoint PPT Presentation

Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland Outline MOT using Particle Filters

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 Outline Siamese Architecture

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

1 Graph Neural Network for 3D Multi-Object Tracking Xinshuo Weng, Yongxin Wang, Yunze Man, Kris

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Avoiding Circular Repetitions Hamoon Mousavi and Jeffrey Shallit School of Computer Science

Part III. OFDM Discrete Fourier Transform; Circular Convolution; Eigen Decomposition of Circulant

CSE 262 Lecture 7 Parallel Matrix Multiplication Announcements Projects Scott B. Baden /CSE

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF ( 2 m ) Mtairie

Physics 116 ELECTROMAGNETISM AND OSCILLATORY MOTION Lecture 2 SHM and circular motion Sept 30,

Shared Nothing Parallelism MPI Programmierung Paralleler und Verteilter Systeme (PPV) Sommer

The Numbers of De Bruijn Sequences in Extremal Weight Classes Ming Li, Yupeng Jiang, Dongdai Lin

Multi-object tracking (MOT): visual and audio-visual Daniel - PowerPoint PPT Presentation

Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland Outline MOT using Particle Filters

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory &amp; Implementation

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 Outline Siamese Architecture

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

1 Graph Neural Network for 3D Multi-Object Tracking Xinshuo Weng, Yongxin Wang, Yunze Man, Kris

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Avoiding Circular Repetitions Hamoon Mousavi and Jeffrey Shallit School of Computer Science

Part III. OFDM Discrete Fourier Transform; Circular Convolution; Eigen Decomposition of Circulant

CSE 262 Lecture 7 Parallel Matrix Multiplication Announcements Projects Scott B. Baden /CSE

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF ( 2 m ) Mtairie

Physics 116 ELECTROMAGNETISM AND OSCILLATORY MOTION Lecture 2 SHM and circular motion Sept 30,

Shared Nothing Parallelism MPI Programmierung Paralleler und Verteilter Systeme (PPV) Sommer

The Numbers of De Bruijn Sequences in Extremal Weight Classes Ming Li, Yupeng Jiang, Dongdai Lin

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation