 
              GPU-Supported Object Tracking Using Adaptive Appearance Models Bogusław Rymut, Bogdan Kwolek Rzeszów University of Technology This paper describes how Graphics Processor Unit can be effectively used to speed-up the tracking algorithm based on adaptive appearance models. The object tracking is done by particle swarm optimization algorithm. Experimental results show that the GPU implementation of the algorithm exhibits a more than 40-fold speed-up over the CPU implementation. ICCVG 2010
Agenda  The problem  CUDA programming model  Particle Swarm Optimization  Problem decomposition  Experiments 2
The problem  Appearance based object tracking is time-consuming  The tracking algorithm must run in real-time  GPU implementation of PSO algorithm  Real-time tracking using PSO and GPU  How to decompose algorithm on GPU 3
Object appearance t-1 t 1 1 Fintess function K 3         f m M I , k i , k i , k k i ,   k 1 i 1  K 1 initial intensity K    i 2 previous intensity  I  3 slow changes 4
CPU vs. GPU SIMD Architecture 1. www.nvidia.com 5
CUDA programming model  Highly Multithreaded Coprocessor  Small set of extensions to C language  Low level programming  Focus on parallel algorithms 6
CUDA programming model  High scalable heterogeneous system CPU & GPU are separate devices with separate  DRAMs GPU uses and executes thousand of extremely  light threads to achive high performance GPU DEVICE CPU DEVICE 7
Particle Swarm Optimization  Stochastic optimization algorithm  The optimization is achieved via set of particles  Particles collaborate each other in optimization process 8
Particle Swarm Optimization 9
Particle Swarm Optimization       ( ) i ( ) i ( ) i ( ) i ( ) i v v c r ( pbest x ) c r ( gbest x ) j j 1 1, j j j 2 2, j j j   ( ) i ( ) i ( ) i x x v j j j 10
Particle Swarm Optimization Assign each particle a random position in the 1. problem hyperspace Evaluate the fitness function and find local best 2. value for each particle Find the particle that has the best fitness value 3. Update the velocities and positions of all particles 4. Repeat steps 2-4 until maximum number of 5. iterations is not attained Update appearance model 6. 11
Particle Swarm Optimization 12
Approach to algorithm decomposition  Each part of the algorithm has been implemented as kernel function.  Every particle has been implemented as thread block 13
Approach to algorithm decomposition 14
Data decomposition 15
Optimization of data access  Access to on GPU global memory is bottleneck  Correctly data alignments essential to overall performance 16
Experiments  PC with Intel Core 2 Quad 2.66 GHz, 1GB RAM  PC with nVidia GeForce 9800 GT 14 multiprocessors 1.5 GHz, 1024MB RAM 17
Face tracking Real time Slow motion 18
Experimental results Computation time [ms] CPU [ms] 9800 GT [ms] Speedup #32, 5 it 30.6 1.4 x22.4 #64, 5 it 60.0 1.9 x31.5 #128, 5 it 117.9 3.4 x38.8 #256, 5 it 234,2 5.6 x41.5 19
Conclusions  GPU implementation of PSO algorithm has been prepared  Our GPU based implementation is 40 times faster than CPU implementation 20
Recommend
More recommend