Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - PowerPoint PPT Presentation

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille

High-Level Vision Goals • Given an image – Decide if it contains a car – Find its location Find its location – Find its extent – Find its structures

Two Main Approaches to Vision • Bottom-up • Top-down – Data Driven – Model Driven – Feature Extraction – Parameter Estimation – Pattern recognition – Analysis-by-Synthesis

Motivation – Vision problems have both low- and high- level aspects D. Mumford, Pattern Theory, 1995 – Synergy: joint treatment improves performance Synergy: joint treatment improves performance – Combined bottom-up and top-down processing

Talk Outline • Motivation M ti ti • Deform ations and Contours • Object Parsing • Object Parsing • Appearance Information • Conclusions • Conclusions

Top-Down: Object Models • • Deformable Models Deformable Models S( X) S( X) X X • Active Appearance Models pp

Joint Segmentation and Recognition • EM formulation – E-step: segmentation – M-step: deformable model fitting M-step E-step • AAM-based segmentation • Segmentation-based detection Kokkinos & Maragos, PAMI 2008

Learning Deformation Models AAM Learning: g E: Deform M: Update s Input Images Edges & Ridges T S AAM Fit Training Set Deformation modes Kokkinos and Yuille, ICCV 2007

Bottom-Up: Contour-Based Image Description • P i Primal Sketch Contours: Edges and Ridges l Sk t h C t Ed d Rid Sketch Contours Edge Tokens Ridge Tokens – Geometry & semantics

Talk Outline • Motivation M ti ti • Contours, Deformations and Hierarchy • Object Parsing • Object Parsing • Appearance Information • Conclusions • Conclusions

Hierarchical Compositional Models Object Object Object Parts Parts Parts Parts Contours Contours Tokens Tokens • Top-down view: object generates tokens • Bottom-up view: object is composed from tokens Bottom up view: object is composed from tokens

Inference for Structured Models • Graphical Models ( Bayesian Netw orks/ MRFs) – Encode random variable dependencies with a graph. – High-Level Vision High Level Vision • Random variables: part poses (e.g. location, orientation, scale) • Dependencies: kinematic constraints D d i ki ti t i t • Belief Propagation – Graph nodes ` inform’ each other by sending messages. – Converges after 2 passes through the graph.

Exploiting the Particular Setting – Sparse Image Representation • Bottom-up cues guide the search for objects. • No need to consider all node states as in BP – Hierarchical Object Representation • Quickly rule out unpromising solutions • Coarse-to-Fine detection

Compositional Detection • View production rules as composition rules • Build a parse tree for the object • Requires – Composition rules C i i l – Prioritized search

Composition of the ` Back’ Structure

Composing Structures • How can we compose complex structures? H l t t ? – Gestalt rules (parallelism, similarity..) • How will we compose this? p ? • How will we compose learned structures?

Canonical Rule Formulation • Combine structure with one constituent at a time. C bi t t ith tit t t ti • Mechanical construction of composition rules • At most binary rules • At most binary rules • Derivation cost: minus log-likelihood of observations

Composition as Climbing a Lattice • Introduce vector indicating instantiated substructures – partial ordering among structures • Hasse Diagram for 3-partite structure g p 1 1 1 3 1 1 0 1 1 1 0 0 1 1 0 0 1 1 0 0 2 0 1 0 0 0 0 – By acquiring a substructure, the structure climbs upwards

Composition of the ` Back’ Structure Problem: Too many options! (Combinatorial explosion)

Analogy: Building a puzzle • Bottom-Up solution: Combine pieces until you build the car p p y – Does not exploit the box’ cover • Top-Down solution: Try fitting each piece to the box’ cover. – Most pieces are uniform/irrelevant • Bottom-Up/Top-Down solution: – Form car-like structures, but use cover to suggest combinations. F lik t t b t t t bi ti

Best First Search • Dijkstra’s Algorithm Dijk t ’ Al ith – Prioritize based on ` cost so far’ – For parsing: Knuth’s Lightest Derivation For parsing: Knuth s Lightest Derivation • A* Search – Consider ` cost to go’ – Approximate with heuristic cost Cost so far Exit Cost to go Cost to go Heuristic cost Entry

` Cost to go’ for Parsing • The Generalized A* Architecture, Felzenszwalb & McAllester • Context: complement needed to get to the goal. • • Recursive derivation of contexts Recursive derivation of contexts.

Heuristics for Parsing: Context Abstractions • A* requires lower bound of derivation cost A* i l b d f d i ti t • Derive context in coarser domain ( abstraction ) – Lower bound cost on fine domain – Lower bound cost on fine domain • Use it to prioritize search KLD: A* :

Abstractions via Structure Coarsening • Coarsening: identify nodes of Hasse diagram • Coarsening: identify nodes of Hasse diagram 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 Coarsen 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 part suffices • Lower bound composition cost

Coarse Level Parsing Bottom-Up KLD: Coarse Domain Contexts to Fine Level Top-Down

Fine Level Parsing Top-Down Guidance: Heuristic, Coarse Level Bottom-Up Composition, Fine level

A* versus Best First Parsing • A* Parsing A* P i Front Part Middle Part Back Part Object Goal Coarse Level Fine Level • KLD Parsing

Parsing & Localization Results - I

Parsing & Localization Results - II

Parsing & Localization Results - III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Apples Bottles 1 1 1 1 0.9 0.9 0.8 0.8 etection rate tection rate 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 Dete 0.4 0.4 Dete 0.3 0.3 Contour Segment Networks Contour Segment Networks 0.2 0.2 Our method − Berkeley Edges Our method − Berkeley Edges 0.1 Our method − Lindeberg Edges 0.1 Our method − Lindeberg Edges 0 0 0 0.25 0.5 0.75 1 1.25 1.5 0 0.25 0.5 0.75 1 1.25 1.5 False−positives per image False−positives per image

UIUC Benchmark Results • 170 Images, heavy clutter • 170 Images heavy clutter – KLD: typically ~ 10 seconds – A* Search: ~ 1-2 seconds A Search: 1 2 seconds 1 Comparison with prior work 0.9 0.9 0.8 0.7 Recall all 0.6 0.6 0.5 R 0.4 0.3 Our method Leibe et. al. 0.2 Fergus et. al. 0.1 Agarwal and Roth Agarwal and Roth 0 0.6 0.7 0.8 0.9 1 Precision

Talk Outline • Motivation M ti ti • Contours, Deformations and Hierarchy • Object Parsing • Object Parsing • Appearance I nform ation • Conclusions • Conclusions

Are we missing something? • Appearance information • Appearance information • Main challenge: scale invariance for edges • Main challenge: scale invariance for edges – Edges are intrinsically 1-D features

Scale Invariance without Scale Selection • Log-Polar sampling & spatially varying filtering Log Polar sampling & spatially varying filtering Scale Space – Turns scalings/ rotations into translations. • Fourier Transform Modulus: translation invariance Kokkinos and Yuille, CVPR 2008

Descriptor Performance

Talk Outline • Motivation M ti ti • Contours, Deformations and Hierarchy • Object Parsing • Object Parsing • Appearance Information • Conclusions • Conclusions

Contributions • A* Search framework for Object Parsing – Bottom-Up information: production cost Bottom Up information: production cost – Top-Down information: heuristic function • Composition Rules – Canonical Rule Formulation / Hasse Diagrams – Integral Angles (not covered) • Heuristics for Parsing f – Structure Coarsening

Future Research – Compositional Approach • Learning Structures and Hierarchies • Parsing and Learning with Alternative Structures (ORs) • Reusable Parts Multiple Class Recognition • Reusable Parts, Multiple Class Recognition – Revisit Low- and Mid- level vision problems • Segmentation • Boundary detection • Perceptual grouping Perceptual grouping – Scene parsing Sce e pa s g

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - PowerPoint PPT Presentation

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille High-Level Vision Goals

Tow n of Byron W ater District No. 8 Novem ber 2 , 2 0 1 6 Byron Tow n Board Tow n Supervisor

Tow ards North- -Kurzeme Kurzeme Tow ards North Coastal Region Coastal Region Development:

Tow ards a Model of Tow ards a Model of Provenance and User View s Provenance and User View s

A cooperation of Dow Jones Indexes and SAM Content Key Facts Assessment 2010 Dow

Dow Dow Polyure Polyureth than anes es Innovation and Sustainability Project K12 Mid Term

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

Standing Conference of Standing Conference of Tow ns and Municipalities Tow ns and

Bottom Bottom Bottom- Bottom - - -Up Studies for Regional Models Up Studies for Regional

BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS ( B

City of Atlanta Dow ntow n and Midtow n W ayfinding Signage System System Overview Dow ntow n

FEDERI CA tow ards the Cloud FEDERI CA Vision An e-Infrastructure based on virtualization in

Stacks Linear list. One end is called top. Other end is called bottom.

MCP gap bottom bottom electrode gap Anode

2 0 1 1 2 0 1 1 Researcher-Academ ic Researcher-Academ ic Tow n Meeting Tow n Meeting +

Tow n of Cape Tow n of Cape Elizabeth Elizabeth Recycling and Municipal Solid Recycling and

TOW N OF NE W HARTFORD STORMWATE R MANAGE ME NT PLANNING TOW N BOARD ME E TING FE BRUAR

Vision Network Session 1 February 7, 2019 Dinner & Get to Know Those at Your Table 1

via a Hybrid Neural Network Sifei Liu 1 Jinshan Pan 12 Ming-Hsuan Yang 1 1 University of California

Performance Computing Lab INAOE Puebla, Mexico Embedded vision with FPGA vs CUDA processing.

In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson

Rotational Rectification Network (R2N): Enabling Pedestrian Detection for Mobile Vision Xinshuo

for Sen ensor sor Dat ata a An Analyt alytics ics Arcot t Raj ajas asek ekar ar 1 , ,

Thermal Buyers' Statement : Companies and Institutions Call for More Renewable Thermal Options

Age-Related Macular Degeneration Past Present & Future Cynthia J. MacKay, MD Clinical

Sambuz

Useful Links

Newsletter

Mail Us

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - PowerPoint PPT Presentation

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille High-Level Vision Goals

Tow n of Byron W ater District No. 8 Novem ber 2 , 2 0 1 6 Byron Tow n Board Tow n Supervisor

Tow ards North- -Kurzeme Kurzeme Tow ards North Coastal Region Coastal Region Development:

Tow ards a Model of Tow ards a Model of Provenance and User View s Provenance and User View s

A cooperation of Dow Jones Indexes and SAM Content Key Facts Assessment 2010 Dow

Dow Dow Polyure Polyureth than anes es Innovation and Sustainability Project K12 Mid Term

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

Standing Conference of Standing Conference of Tow ns and Municipalities Tow ns and

Bottom Bottom Bottom- Bottom - - -Up Studies for Regional Models Up Studies for Regional

BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS ( B

City of Atlanta Dow ntow n and Midtow n W ayfinding Signage System System Overview Dow ntow n

FEDERI CA tow ards the Cloud FEDERI CA Vision An e-Infrastructure based on virtualization in

Stacks Linear list. One end is called top. Other end is called bottom.

MCP gap bottom bottom electrode gap Anode

2 0 1 1 2 0 1 1 Researcher-Academ ic Researcher-Academ ic Tow n Meeting Tow n Meeting +

Tow n of Cape Tow n of Cape Elizabeth Elizabeth Recycling and Municipal Solid Recycling and

TOW N OF NE W HARTFORD STORMWATE R MANAGE ME NT PLANNING TOW N BOARD ME E TING FE BRUAR

Vision Network Session 1 February 7, 2019 Dinner &amp; Get to Know Those at Your Table 1

via a Hybrid Neural Network Sifei Liu 1 Jinshan Pan 12 Ming-Hsuan Yang 1 1 University of California

Performance Computing Lab INAOE Puebla, Mexico Embedded vision with FPGA vs CUDA processing.

In In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson

Rotational Rectification Network (R2N): Enabling Pedestrian Detection for Mobile Vision Xinshuo

for Sen ensor sor Dat ata a An Analyt alytics ics Arcot t Raj ajas asek ekar ar 1 , ,

Thermal Buyers' Statement : Companies and Institutions Call for More Renewable Thermal Options

Age-Related Macular Degeneration Past Present &amp; Future Cynthia J. MacKay, MD Clinical

Sambuz

Useful Links

Newsletter

Mail Us

Vision Network Session 1 February 7, 2019 Dinner & Get to Know Those at Your Table 1

Age-Related Macular Degeneration Past Present & Future Cynthia J. MacKay, MD Clinical