From Rigid Templates to Grammars: Object Detection with Structured - PowerPoint PPT Presentation

From Rigid Templates to Grammars: Object Detection with Structured Models Ross B. Girshick Dissertation defense April 20, 2012

The question What objects are where? 2

Why it matters • Intellectual curiosity - How do we extract this information from the signal? • Applications - Semantic image and video search - Human-computer interaction ( e.g. , Kinect) - Automotive safety - Camera focus-by-detection - Surveillance - Semantic image and video editing - Assistive technologies - Medical imaging - ... 3

Proxy task: PASCAL VOC Challenge • Localize & name ( detect ) 20 basic-level object categories - Airplane, bicycle, bus, cat, car, dog, person, sheep, sofa, monitor, etc. person motorbike Input Desired output • 11k training images with 500 to 8000 instances / category • Evaluation: bounding-box overlap; average precision (AP) �� ( � �� , � � ) = | � �� ∩ � � | | � �� ∪ � � | 4 Image credits: PASCAL VOC

Challenges • Deformation • Viewpoint • Subcategory • Variable structure • Occlusion • Background clutter • Photometric 5

Challenges • Deformation 6 Image credit: http://i173.photobucket.com/albums/w78/yahoozy/MultipleExposures2.jpg

Challenges • Viewpoint Image credits: PASCAL VOC 7

Challenges • Subcategory –– “airplane” images 8 Image credits: PASCAL VOC

Challenges • Variable structure 9 Image credits: PASCAL VOC

PASCAL VOC Challenges 2007-2011 • 2007 Challenge - Winner: Deformable part models & Latent SVM [FMR’08] - 21% mAP ↑ - Baseline for dissertation Prior work This work • Winners of 2008 & 2009 Challenges ↓ • Fast forward to the 2011 Challenge - Our system ( voc-release4 ): 34% mAP - Top system (NLPR): 41% mAP - NLPR method: voc-release4 + LBP image features + richer spatial model (GMM) + more context rescoring - Second (MIT-UCLA) and third place (Oxford) also based on voc-release4 10

Contributions –– By area • Object representation * - Mixture models (in PAMI’10); Latent orientation; Person grammar model • E ffi cient detection algorithms * - Cascaded detection for DPM (oral at CVPR’10) • Learning * - Weak-label structural SVM (spotlight at NIPS’11) • Detection post-processing - Bounding box prediction & context rescoring • Image representation - Enhanced HOG features; features for boundary truncation & small objects • Software - voc-release{2,3,4} – currently the “go to” object detection system 11

Object representation

Model lineage – Dalal & Triggs p score( � , � ) = w · ψ ( � , � ) (f) [Dalal and Triggs ’05] “Root fi lter” Image pyramid HOG feature pyramid • Histogram of Oriented Gradients (HOG) features • Scanning window detector (linear fi lter) • w learned by SVM 13

Model lineage – Latent SVM DPM p 0 score( � , � � ) = max � ∈ Z ( � ) w · ψ ( � , ( � � , � )) � = ( � � , . . . , � � ) z Image pyramid HOG feature pyramid [FMR’08] • Dalal & Triggs + Parts in a deformable con fi guration z • Scanning window detection: max over z at each p 0 • w learned by latent SVM 14

Superposition of views 15

Mixture of DPMs Person Car • Training (component labels are hidden) - Cluster training examples by bounding-box aspect ratio - Initialize root fi lters for each component (cluster) independently - Merge components into mixture model and train with latent SVM 16

Mixtures with latent orientation (“pushmi-pullyu” instead of horse) Learning without latent orientation (right-facing horse) Learning with latent orientation [GFM voc-release4] 17

Unsupervised orientation clustering • Online clustering with a hard constraint Cluster 1 Cluster 2 Seed ... i th example ... Assign i th example to nearest cluster Flipped example must go to the other cluster 18

Latent orientation improves performance Horse model type AP (2007) Single component 42.1 Mixture model 47.3 3 components Latent orientation 56.8 2x3 components 19

Results – Mixture models and latent orientation • Mixture models boost mAP by 3.7 points • Latent orientation boost mAP by 2.6 AP points AP scores using the PASCAL 2007 evaluation • 12 AP point improvement (>50% relative) over the baseline 20

E ffi cient detection

Cascaded detection for DPM • Add in parts one-by-one and prune partial scores • Sparse dynamic programming tables (reuse computation!) 22

Threshold selection & PCA fi lters • Data-driven threshold selection - Based on statistics of partial scores on training data - Provably safe (“probably approximately admissible” thresholds) - Empirically e ff ective • 2-stage cascade with simpli fi ed appearance models - Use PCA of HOG features (or model fi lters) - Stage 1: place low-dimensional fi lters; Stage 2: place original fi lters 23

Results –– 15x speedup (no loss in mAP) High recall Lower recall ⇒ faster PASCAL 2007 comp3 class: motorbike PASCAL 2007 comp3 class: motorbike 1 1 0.9 0.9 0.8 0.8 0.7 0.7 precision precision 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 baseline (AP 48.7) baseline (AP 48.7) 0.1 0.1 cascade (AP 48.9) cascade (AP 41.8) 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall recall 23.2x faster 31.6x faster (618ms per/image) (454ms per/image) 24

Towards richer grammar models

People are complicated Helmet, Ski cap, no face, Pirate hat, dresses, Truncation, holding glass, occluded left side truncated long hair heavy occlusion Objects from visually rich categories have diverse structural variation 26

Compositional models More mixture components? [DT’05] No! AP 0.12 [FMR’08] AP 0.27 (f) [FGMR’10] AP 0.36 [GFM voc-release4] There are too many combinations! AP 0.42 Instead... ... compositional models de fi ned by grammars 27

Object detection grammars • A modeling language for building object detectors [FM’10] - Terminals (model image data appearance) - Nonterminals (objects, parts, ...) - Weighted production rules (de fi ne composition, variable structure) • Composition - Objects are recursively composed of other objects (parts) • Variable structure - Expanding di ff erent rules produces di ff erent structures • Person → Head, Torso, Arms, Legs • Head → Eye, Eye, Mouth • Mouth → Smile OR Mouth → Frown 28

Object detection grammars • Object hypothesis = derivation tree T p = ( x , y , l ) T : Person( x , y , l ) Root( x , y , l ) Part 1 ( x 1 , y 1 , l 1 ) Part N ( x N , y N , l N ) ... • Linear score function Detection with DP � ∗ ( � ) = argmax score( � , � ) = w · ψ ( � , � ) w · ψ ( � , � ) � ∈ T � 29

Build on what works Can we build a better person detector? 30

Case study: a person detection grammar Subtype 1 Subtype 2 Example detections and derived fi lters Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Occluder Parts 1-6 (no occlusion) Parts 1-4 & occluder Parts 1-2 & occluder • Fine-grained occlusion • Sharing across all derivations • Model of the stu ff that causes occlusion • Part subtypes and multiple resolutions • Parts have subparts (not pictured) 31

Training models Subtype 1 Subtype 2 Part 1 Part 2 training Part 3 Part 4 Part 5 Part 6 Occluder • PASCAL data: bounding-box labels • No derivation trees given! (weakly-supervised learning) • Learn the parameters w 32

De fi ning examples • Each bounding box is a foreground example • All locations in background images are background examples • From these examples, learn the prediction rule � w ( � ) = argmax w · ψ ( � , � ) Predicted output � ∈ S ( � ) Possible outputs Input example Feature map (derivation trees) 33

Parameter learning • Richer models, richer problems One good output... and many bad ones! • Which learning framework should we use? 34

� Classi fi cation training � � � � w � � + � � � �� ( w ) = � max � , � � � � max � ∈ S ( � � ) w · ψ ( � � , � ) � = � Training: LSVM objective: LSVM objective: “score +1 here” “score +1 here” Who wins? Both derivations were Testing: vs. trained to score +1. 35

Structured output training Good output Bad output Training: “outscore all other “score lower by a margin” outputs by a margin” A “good” output Testing: vs. should win. 36

� � Latent structural SVM [Yu and Joachims] � � w � � + � � � ( w ) = � � �� ( w , � � , � � ) � = � � ) ∈ Y × Z [ w · ψ ( � , ˆ � , ˆ � ) + � margin ( � , ˆ � �� ( w , � , � ) = max � )] (ˆ � , ˆ � ∈ Z w · ψ ( � , � , ˆ � max � ) ˆ • Objective and task loss ( L margin ) might be inconsistent ... y ˆ � margin ( � , � ) = � �� margin ( � , ˆ � ) = � - Many outputs with zero loss –– LSSVM “requires” the training label 37

From Rigid Templates to Grammars: Object Detection with Structured - PowerPoint PPT Presentation

From Rigid Templates to Grammars: Object Detection with Structured Models Ross B. Girshick Dissertation defense April 20, 2012 The question What objects are where? 2 Why it matters Intellectual curiosity - How do we extract this

Zariski Main Theorem for Henselian affinoid algebras Henselian Rigid Spaces Henselian rigid

Rigid body dynamics Rigid body simulation Once we consider an object with spatial extent,

Rigid body dynamics Rigid body simulation Once we consider an object with spatial extent,

Rigid body dynamics Rigid body simulation Once we consider an object with spatial extent,

with C++ Templates Templates Thomas Gschwind <thg at zurich dot ibm dot com> Templates

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Quaternions John C. Hart CS 318 Interactive Computer Graphics Rigid Body Dynamics Rigid

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Hybrid rigid body modelling Al Kikhney EMBL Hamburg Hybrid rigid body modelling in ATSAS 3.0

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Behavior & Learning Support Initiative Powerpoint Templates Powerpoint Templates Page 1

Ppt Templates Presentation GraphicRiver has a huge collection of PowerPoint Templates and

Powerpoint Presentation Templates Education Academic and education templates to assist students

WEB COMMUNITY New Public Web Templates May 26, 2017 Our Agenda Today New Templates Site

MBTA Passes Stand Alone Benefit 50% off monthly pass with pre-tax payroll deduction

Provisioning On-line Games: A Traffic Goal Analysis of a Busy Counter-Strike Understand the

EYES IN IMAGINE THIS SCENARIO : Using thermal imaging, a ground sensor notices an unusual heat

VA (Video Acceleration) API Jonathan Bian 2009 Linux Plumbers Conference Motivation for creating

Fast QML UI prototyping for platforms WITHOUT Qt/QtQuick support Attila Csipa @achipa Qt

L0: INTRODUCTION 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545:

Welcome Submit questions / comments via the webcast Comment cards RA-PDBIKEPEDPA@pa.gov

Disclosures I have no disclosures. CNS INFECTIONS: PEARLS AND PERILS Felicia Chow, MD, MAS

Sambuz

Useful Links

Newsletter

Mail Us

From Rigid Templates to Grammars: Object Detection with Structured - PowerPoint PPT Presentation

From Rigid Templates to Grammars: Object Detection with Structured Models Ross B. Girshick Dissertation defense April 20, 2012 The question What objects are where? 2 Why it matters Intellectual curiosity - How do we extract this

Zariski Main Theorem for Henselian affinoid algebras Henselian Rigid Spaces Henselian rigid

Rigid body dynamics Rigid body simulation Once we consider an object with spatial extent,

Rigid body dynamics Rigid body simulation Once we consider an object with spatial extent,

Rigid body dynamics Rigid body simulation Once we consider an object with spatial extent,

with C++ Templates Templates Thomas Gschwind &lt;thg at zurich dot ibm dot com&gt; Templates

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Quaternions John C. Hart CS 318 Interactive Computer Graphics Rigid Body Dynamics Rigid

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Hybrid rigid body modelling Al Kikhney EMBL Hamburg Hybrid rigid body modelling in ATSAS 3.0

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Behavior &amp; Learning Support Initiative Powerpoint Templates Powerpoint Templates Page 1

Ppt Templates Presentation GraphicRiver has a huge collection of PowerPoint Templates and

Powerpoint Presentation Templates Education Academic and education templates to assist students

WEB COMMUNITY New Public Web Templates May 26, 2017 Our Agenda Today New Templates Site

MBTA Passes Stand Alone Benefit 50% off monthly pass with pre-tax payroll deduction

Provisioning On-line Games: A Traffic Goal Analysis of a Busy Counter-Strike Understand the

EYES IN IMAGINE THIS SCENARIO : Using thermal imaging, a ground sensor notices an unusual heat

VA (Video Acceleration) API Jonathan Bian 2009 Linux Plumbers Conference Motivation for creating

Fast QML UI prototyping for platforms WITHOUT Qt/QtQuick support Attila Csipa @achipa Qt

L0: INTRODUCTION 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545:

Welcome Submit questions / comments via the webcast Comment cards RA-PDBIKEPEDPA@pa.gov

Disclosures I have no disclosures. CNS INFECTIONS: PEARLS AND PERILS Felicia Chow, MD, MAS

Sambuz

Useful Links

Newsletter

Mail Us

with C++ Templates Templates Thomas Gschwind <thg at zurich dot ibm dot com> Templates

Behavior & Learning Support Initiative Powerpoint Templates Powerpoint Templates Page 1