Boosting Visual Object Tracking Using Deep Features and GPU - - PowerPoint PPT Presentation

boosting visual object tracking
SMART_READER_LITE
LIVE PREVIEW

Boosting Visual Object Tracking Using Deep Features and GPU - - PowerPoint PPT Presentation

Boosting Visual Object Tracking Using Deep Features and GPU Implementations Michael Felsberg Computer Vision Laboratory Department of Electrical Engineering Linkping University michael.felsberg@liu.se Martin Danelljan Fahad Khan


slide-1
SLIDE 1

Boosting Visual Object Tracking Using Deep Features and GPU Implementations

Michael Felsberg Computer Vision Laboratory Department of Electrical Engineering Linköping University michael.felsberg@liu.se

Martin Danelljan Fahad Khan

slide-2
SLIDE 2

Definition of Visual Tasks

Classification Task: Is there a dog in the image ? Detection Task: Where is a dog in the image ? Tracking Task: Where is the dog from the first frame in all subsequent frames of the image sequence ?

slide-3
SLIDE 3

Visual Object Tracking (VOT)

Use-cases: autonomous driving, surveillance, sports, …

Problems:

  • cclusion

clutter changes in viewpoint scale illumination motion articulation

slide-4
SLIDE 4

Why Visual Object Tracking? Why Generic?

  • cue for behavior
  • human-centered sensing
  • adaptation to environment
  • interaction
  • visualization
  • high inter-class

variability

  • constantly new

classes

slide-5
SLIDE 5

VOT: Problem Definition

  • Input:
  • image sequence (video)
  • object bounding box in frame #1
  • Output:
  • bounding boxes for frames t > 1
  • determined from frames < t (causality)
  • Assumptions:
  • object is visible in all frames (at least partially)
  • camera might be moving (no background model)
  • Challenge:
  • build model from one annotated training sample of unknown object class
  • update model from previous estimates (bootstrapping process)
slide-6
SLIDE 6

[Jianbo Shi & Carlo Tomasi CVPR 1994 ]

C-COT

“Tracking is a solved problem”

  • 23 years on: > 40 tracking papers per year
  • major conferences CVPR, ICCV, ECCV
  • major journals PAMI, IJCV, TIP
  • Several benchmarks in the last four years:
  • OTB[Wu et al., CVPR2013, PAMI2015]
  • VOT[Kristan et al., VOT-workshops at ICCV2013,15,17,ECCV2014,16]
  • ALOV+[Smeulders et al., PAMI2014]
  • UAV123[Mueller et al., ECCV2016]
  • Example: SRDCF (2015) vs. SOTA 2014
slide-7
SLIDE 7

VOT Results DCF-Based Approaches

VOT2014 VOT2015 C-COT, ECCV2016 VOT2016 DSST, PAMI2016 SRDCF, ICCV2015 DeepSRDCF, VOT2015 ACT, CVPR2014

slide-8
SLIDE 8

DCF: Discriminative Correlation Filters

Feature map Output scores Learned filters

slide-9
SLIDE 9

Standard DCF formulation

slide-10
SLIDE 10

Continuous approach: Overview

Continuous filters Continuous

  • utput

Multi- resolution features

slide-11
SLIDE 11

Convolution Operator

slide-12
SLIDE 12

Training Loss

[Danelljan et al., ICCV 2015] [Danelljan et al., CVPR 2016]

slide-13
SLIDE 13

Spatially Regularized DCF

Circular Convolution ⟺ Periodic Extension

[Danelljan et al., ICCV 2015]

slide-14
SLIDE 14

Decontamination of Training Set

[Danelljan et al., CVPR 2016]

slide-15
SLIDE 15

Deep Learning Revolution SOTA 2012: Learning on ImageNet

  • ImageNet Large Scale Visual Recognition Challenge [Deng et al. 2009]
  • Today:
  • more than 14 million images
  • more than 10 million images annotated
  • more than 1 million images with bounding box
  • Classification error rate:

2012 vs 2011

Method Top-5 Error Method Description SuperVision (Toronto) 0.16422 CNN ISI (Tokyo) 0.26172 Hand-crafted features: SIFT, HOG and LBP OXFORD 0.26979 DPM + Hand-crafted features XRCE/INRIA 0.27058 Hand-crafted features

slide-16
SLIDE 16

Deep Features for Object Tracking

  • deep features from

imagenet-vgg-2048 network (five layers): shallow layers relevant

  • imagenet-vgg-very-

deep-16 network (layers 4+13)

  • deep motion

features (layer 5)

[Gladh et al., ICPR 2016, best paper] [Danelljan & Häger et al., VOT 2015]

slide-17
SLIDE 17

C-COT Results

[Danelljan et al., ECCV 2016]

slide-18
SLIDE 18

EAO-EFO trade-off?

slide-19
SLIDE 19

ECO: Efficient Convolution Operators

  • Over-fitting and Complexity: Model Size

C-COT ECO High-dimensional features No. of parameters (800,000) in online learning. Scarcity of training data in tracking

  • No. of parameters beyond dimensionality of input

Discriminatively learn a lower-dimensional feature space by jointly minimizing the classification error. 80% reduction in the number of modell parameters [Danelljan et al., CVPR 2017]

slide-20
SLIDE 20

ECO: Efficient Convolution Operators

  • Over-fitting and Complexity: Training Set Size

C-COT ECO Large training sample set Significant computational burden. Memory size is limited due to large feature set Discarding old samples lead to over-fitting to recent appearance Model the training data as a mixture of Gaussian components. Compact and diverse representation of training data [Danelljan et al., CVPR 2017]

slide-21
SLIDE 21

ECO: Efficient Convolution Operators

  • UAV dataset: 123 aerial videos with 110K frames.

HC features Accuracy Speed C-COT 50.8 < 10 FPS ECO 52.9 60 FPS partial occlusion (the guitar) deformations

  • ut-of-plane rotations

[Danelljan et al., CVPR 2017]

slide-22
SLIDE 22

ECO: Efficient Convolution Operators

  • VOT2016 dataset:

13.3% relative gain in performance

[Danelljan et al., CVPR 2017]

slide-23
SLIDE 23
  • Matlab implementation using Matconvnet
  • CPU-measurements (ECO-HC) on 4-core i7-6700 @ 3.4 GHz
  • GPU-measurements (ECO) on a Tesla K40 GPU (donated by NVidia)
  • Fine-tuning of networks on Kebnekaise (Umeå)
  • 32 nodes with 2 K80 cards (4992 cores each)
  • 4 nodes with 4 K80 cards
  • Intel Xeon E5-2690v4 (14 cores), 128 GB
  • Each batch
  • 80 videos
  • 16 frames each
  • processed on 4 GPUs

Computational Considerations

slide-24
SLIDE 24

System Implementation

slide-25
SLIDE 25

Acknowledgements

  • Wallenberg Autonomous Systems and Software Program (WASP)
  • Swedish Research Council (EMC2, ELLIIT)
  • SSF (CUAS, SymbiCloud)
  • NVidia
  • CVL (Martin, Fahad, Gustav, Andreas, Susanna, Goutam)
slide-26
SLIDE 26

References (selection)

“Adaptive Color Attributes for Real-Time Visual Tracking” (CVPR2014) “Learning Spatially Regularized Correlation Filters for Visual Tracking” (ICCV2015, 1st rank UAV123) “Convolutional Features for Correlation Filter Based Visual Tracking” (ICCVWS- VOT2015, 2nd rank VOT2015) “Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking” (CVPR2016) “Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking” (ECCV2016, 1st rank VOT2016) “Deep Motion Features for Visual Tracking” (ICPR2016, best paper award) “Discriminative Scale Space Tracking” (IEEE TPAMI2016 , 1st rank VOT2014, 1st rank OpenCV challenge) “ECO: Efficient Convolution Operators for Tracking” (CVPR2017, accepted)

slide-27
SLIDE 27

CAIP 2017

17th international Conference on Computer Analysis of Images and Patterns Ystad - Sweden, 22-24 Aug

Paper submission 3 Apr, 2017 Author notification 26 May, 2017 Camera-ready paper 31 May, 2017 Early registration 16 Jun, 2017 Main conference 22-24 Aug, 2017 Invited speakers Alan Bovik Markus Vincze Christian Igel REACTS Workshop Pose estimation tutorial George Azzopardi Anders G. Buch

2D-to-3D 3D Vision Biomedical image and pattern analysis Biometrics Brain-inspired methods Document analysis Face and gestures Feature extraction Graph-based methods High-dimensional topology methods Human pose estimation Image/video indexing & retrieval Image restoration Keypoint detection Machine learning for image and pattern analysis Mobile multimedia Model-based vision Motion and tracking Object recognition Segmentation Shape representation and analysis Static and dynamic scene analysis Statistical models Surveillance Vision for robotics

General Chair Michael Felsberg Program Chairs Anders Heyden Norbert Krüger Industrial Liaison Zhibo Pang

The conference invites novel contributions to the automatic analysis of images and patterns, encompassing both new challenging application areas and substantial new theoretical developments in the field.

slide-28
SLIDE 28

Questions?

  • michael.felsberg@liu.se
  • http://users.isy.liu.se/cvl/mfe/
  • https://liu.se/en/employee/micfe03
  • http://www.cvl.isy.liu.se/
  • https://liu.se/en/organisation/liu/isy/cvl