Fully-Convolutional Siamese Networks for Object Tracking Luca - PowerPoint PPT Presentation

Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto*, Jack Valmadre*, João Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk

Tracking of single, arbitrary objects Problem . Track an arbitrary object with the sole supervision of a single bounding box in the first frame of the video. Challenges. We need to be class-agnostic . ● Stability-Plasticity dilemma [Grossberg87] ● “ How can a learning system remain plastic in response to significant new events, yet also remain stable in response to irrelevant events?”

Recent history of object tracking [2010 - today] Tracking-by-detection paradigm Learn online a binary classifier ( + is object, - is background). ● Re-detect the object at every frame + update the classifier. ●

Recent history of object tracking [2014 - today] Correlation filters become the most popular choice Sampling space is loosely a circulant matrix → diagonalized with Discrete ● Fourier Transform. From [Henriques15] Fast training and evaluation of linear classifier in the Fourier Domain. ● Mostly used with HOG features. ●

Recent history of object tracking [2015 - today] What about the deep learning frenzy? In tracking, deep-nets took more time to become mainstream. ● CVPR’15 - not a single tracker was using deep-nets as a core and not even deep features. ○ CVPR’16 - 50% were. ○ Not clear advantage ● Slow ○ Similar performance to methods based on legacy features. ○ Training on benchmarks → controversial. ● Benchmarks propose very similar scenarios. Risk to overfit and lack of generalization. ○

MDNet [CVPR16, winner of VOT15] Best results so far. ● Rationale: separate domain-independent ● (e.g. the concept of “objectness”) to domain-dependent (video-specific) information. Training . fixed common part (3conv+2fc) ● and several “one-hot” fc branches. 1 fps Best results so far. ● Tracking . fine-tuning of several layers, ● hard-negative mining, bbox regression. Trained from benchmarks video. ● Very slow. ● Learning Multi-Domain Convolutional Neural Networks for Visual Tracking - Hyeonseob Nam and Bohyung Han - CVPR 2016.

Our work We wanted to use conv-nets for arbitrary object tracking ● Three constraints ● No below real-time (at least 20-25 frames per second). ○ No benchmark videos for training. ○ Simplicity. ○

Vanilla siamese conv-net for similarity learning Siamese conv-net trained to address a similarity learning problem in an offline phase. ● The conv-net learns a function that compares an exemplar z to a candidate of the same size x’. ● Score tell us how similar are the two image patches. ●

Fully-Convolutional Siamese Networks for Object Tracking Our network is fully convolutional . ● No padding. ○ No fully-connected layers. ○ Cross-correlation layer Two inputs of different sizes: smaller is ● the exemplar (target object during tracking), bigger is the search area. Output of embedding function has spatial ● support. Cross-correlation layer: computes the ● similarity at all translated sub-windows on a dense grid in a single evaluation. Forward pass: >100Hz Output is a score map. ● CODE AVAILABLE! www.robots.ox.ac.uk/~luca/siamese-fc.html

Training Dataset build by extracting two patches with +/- context for every labelled object. ● Then resized to 127x127 and 255x255. Pick random video and random pair of frames within the video (max N frames apart). ● N controls the “difficulty” of the problem. ○ Mean of logistic loss at every position, ● CODE AVAILABLE! www.robots.ox.ac.uk/~luca/siamese-fc.html

ILSVRC15-VID (ImageNet Video) So far tracking community could not rely on large labelled dataset. ● ALOV+OTB+VOT in total have less than 600 video, with some overlap. ○ They should be reserved for the purpose of testing. ○ ImageNet Video ● Official task is object detection and classification from video. ○ Step-by-step guide to prepare the data to train our net: ■ https://github.com/bertinetto/siamese-fc/tree/master/ILSVRC15-curation Almost 4,500 videos and 1,200,000 bounding boxes ! ○ 30 classes: mostly animals (~75%) and some vehicles (~25%) ○

Tracking pipeline Activations for the exemplar z only ● computed for first frame. Subwindow of x with max similarity sets ● Frame 1 the new location. That’s (almost) it! ● No update of target representation. ○ No re-detection. ○ No bbox regression. ○ Frame t No fine-tuning → fast! ○ 50-100 fps Only three little tricks: ● Pyramid of 3 scales. ○ Response upsamped with bi-cubic ○ interpolation. Cosine window to penalize large ○ displacements. CODE AVAILABLE! www.robots.ox.ac.uk/~luca/siamese-fc.html

New state-of-the art for real-time trackers (OTB-13)

State-of-the-art for general trackers (VOT-15) At 1 fps, the best tracker ● is almost 2 orders of magnitude slower of our method, which runs at 86 frames per second. None among the top-15 ● trackers operate above 20 frames per second.

Concurrent work - GOTURN [ECCV `16] Siamese architecture trained to solve Bounding ● Box regression problems. Differently, network is not fully convolutional. ● Trained from consecutive frames. ● They are not strictly learning a similarity function ● - method works (albeit worse) also with a single branch. Fast (100fps), but significantly lower results ● compared to our method. Learning to Track at 100 FPS with Deep Regression Networks - David Held, Sebastian Thrun, Silvio Savarese - ECCV 2016.

Concurrent work - SINT [CVPR `16] Siamese architecture trained to learn a generic ● similarity function. Differently, their network is not fully ● convolutional and they recur instead to ROI pooling to sample candidates. Results reported only on OTB-13: ~2% better ● than our method. BBox regression to improve tracking ● performance. Much slower: only 2 fps vs 50-85 fps of our ● method . Siamese Instance Search for Tracking - Ran Tao, Efstratios Gavves, Arnold W.M. Smeulders - CVPR 2016.

Few examples

Conclusions ImageNet Video: new standard for training tracking algorithms? ● Siamese networks allow simplistic trackers to achieve state-of-the-art results. ● Fully-convolutional siamese: allows very high frame-rates, still achieving ● state-of-the-art performance. Fully-convolutional siamese: simple and fast building block for future work: e.g. ● online update of representation. → Code available: www.robots.ox.ac.uk/~luca/siamese-fc.html

Thank you.

Fully-Convolutional Siamese Networks for Object Tracking Luca - PowerPoint PPT Presentation

Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto, Jack Valmadre, Joo Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk Tracking of single, arbitrary objects Problem .

Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 Outline Siamese Architecture

Speaker Change Detection using Siamese Networks Siamese layers share their Acoustic Data

Siamese :: Balinese/Javanese :: Colorpoint Siamese :: Balinese/Javanese :: Colorpoint The most

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

On Siamese Association Schemes Martin Ma caj October 4th, 2016 Overview Introduction

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Neural Network Basics Part II Content Image-to-image Why fully convolutional?

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

High Performance Computing What is it used for and why? Overview What is it used for?

Class Six openFrameworks! A software frameworks, by which we mean: a software infrastructure that

Combining algorithms for deciding knowledge in security protocols Mathilde Arnaud, Vronique

Introduction to Objects are a natural way of thinking about the world and about scripts that

VMGL: VMM-Independent Graphics Acceleration H. Andrs Lagar-Cavilla, U of Toronto

I made a website! Now what? Sebastian Witowski 1 Disclaimer There are many great tools at CERN

Scheduling Your Communications How to plan ahead for successful communication before, during,

SALES FUNNEL WORKSHOP AD FUNNEL ANATOMY 1. Ad copy 2. Squeeze page -> Deliver lead

Fully-Convolutional Siamese Networks for Object Tracking Luca - PowerPoint PPT Presentation

Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto*, Jack Valmadre*, Joo Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk Tracking of single, arbitrary objects Problem .

Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 Outline Siamese Architecture

Speaker Change Detection using Siamese Networks Siamese layers share their Acoustic Data

Siamese :: Balinese/Javanese :: Colorpoint Siamese :: Balinese/Javanese :: Colorpoint The most

Siamese Network &amp; Matching Network for one-shot learning Reference Papers Siamese Neural

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Overview Introduction Object Tracking Vehicle Tracking Theory &amp; Implementation

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

On Siamese Association Schemes Martin Ma caj October 4th, 2016 Overview Introduction

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Neural Network Basics Part II Content Image-to-image Why fully convolutional?

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

High Performance Computing What is it used for and why? Overview What is it used for?

Class Six openFrameworks! A software frameworks, by which we mean: a software infrastructure that

Combining algorithms for deciding knowledge in security protocols Mathilde Arnaud, Vronique

Introduction to Objects are a natural way of thinking about the world and about scripts that

VMGL: VMM-Independent Graphics Acceleration H. Andrs Lagar-Cavilla, U of Toronto

I made a website! Now what? Sebastian Witowski 1 Disclaimer There are many great tools at CERN

Scheduling Your Communications How to plan ahead for successful communication before, during,

SALES FUNNEL WORKSHOP AD FUNNEL ANATOMY 1. Ad copy 2. Squeeze page -&gt; Deliver lead

Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto, Jack Valmadre, Joo Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk Tracking of single, arbitrary objects Problem .

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

SALES FUNNEL WORKSHOP AD FUNNEL ANATOMY 1. Ad copy 2. Squeeze page -> Deliver lead