tow ards bridging bottom up top dow n vision w ith
play

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - PowerPoint PPT Presentation

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille High-Level Vision Goals


  1. Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille

  2. High-Level Vision Goals • Given an image – Decide if it contains a car – Find its location Find its location – Find its extent – Find its structures

  3. Two Main Approaches to Vision • Bottom-up • Top-down – Data Driven – Model Driven – Feature Extraction – Parameter Estimation – Pattern recognition – Analysis-by-Synthesis

  4. Motivation – Vision problems have both low- and high- level aspects D. Mumford, Pattern Theory, 1995 – Synergy: joint treatment improves performance Synergy: joint treatment improves performance – Combined bottom-up and top-down processing

  5. Talk Outline • Motivation M ti ti • Deform ations and Contours • Object Parsing • Object Parsing • Appearance Information • Conclusions • Conclusions

  6. Top-Down: Object Models • • Deformable Models Deformable Models S( X) S( X) X X • Active Appearance Models pp

  7. Joint Segmentation and Recognition • EM formulation – E-step: segmentation – M-step: deformable model fitting M-step E-step • AAM-based segmentation • Segmentation-based detection Kokkinos & Maragos, PAMI 2008

  8. Learning Deformation Models AAM Learning: g E: Deform M: Update s Input Images Edges & Ridges T S AAM Fit Training Set Deformation modes Kokkinos and Yuille, ICCV 2007

  9. Bottom-Up: Contour-Based Image Description • P i Primal Sketch Contours: Edges and Ridges l Sk t h C t Ed d Rid Sketch Contours Edge Tokens Ridge Tokens – Geometry & semantics

  10. Talk Outline • Motivation M ti ti • Contours, Deformations and Hierarchy • Object Parsing • Object Parsing • Appearance Information • Conclusions • Conclusions

  11. Hierarchical Compositional Models Object Object Object Parts Parts Parts Parts Contours Contours Tokens Tokens • Top-down view: object generates tokens • Bottom-up view: object is composed from tokens Bottom up view: object is composed from tokens

  12. Inference for Structured Models • Graphical Models ( Bayesian Netw orks/ MRFs) – Encode random variable dependencies with a graph. – High-Level Vision High Level Vision • Random variables: part poses (e.g. location, orientation, scale) • Dependencies: kinematic constraints D d i ki ti t i t • Belief Propagation – Graph nodes ` inform’ each other by sending messages. – Converges after 2 passes through the graph.

  13. Exploiting the Particular Setting – Sparse Image Representation • Bottom-up cues guide the search for objects. • No need to consider all node states as in BP – Hierarchical Object Representation • Quickly rule out unpromising solutions • Coarse-to-Fine detection

  14. Compositional Detection • View production rules as composition rules • Build a parse tree for the object • Requires – Composition rules C i i l – Prioritized search

  15. Composition of the ` Back’ Structure

  16. Composing Structures • How can we compose complex structures? H l t t ? – Gestalt rules (parallelism, similarity..) • How will we compose this? p ? • How will we compose learned structures?

  17. Canonical Rule Formulation • Combine structure with one constituent at a time. C bi t t ith tit t t ti • Mechanical construction of composition rules • At most binary rules • At most binary rules • Derivation cost: minus log-likelihood of observations

  18. Composition as Climbing a Lattice • Introduce vector indicating instantiated substructures – partial ordering among structures • Hasse Diagram for 3-partite structure g p 1 1 1 3 1 1 0 1 1 1 0 0 1 1 0 0 1 1 0 0 2 0 1 0 0 0 0 – By acquiring a substructure, the structure climbs upwards

  19. Composition of the ` Back’ Structure Problem: Too many options! (Combinatorial explosion)

  20. Analogy: Building a puzzle • Bottom-Up solution: Combine pieces until you build the car p p y – Does not exploit the box’ cover • Top-Down solution: Try fitting each piece to the box’ cover. – Most pieces are uniform/irrelevant • Bottom-Up/Top-Down solution: – Form car-like structures, but use cover to suggest combinations. F lik t t b t t t bi ti

  21. Best First Search • Dijkstra’s Algorithm Dijk t ’ Al ith – Prioritize based on ` cost so far’ – For parsing: Knuth’s Lightest Derivation For parsing: Knuth s Lightest Derivation • A* Search – Consider ` cost to go’ – Approximate with heuristic cost Cost so far Exit Cost to go Cost to go Heuristic cost Entry

  22. ` Cost to go’ for Parsing • The Generalized A* Architecture, Felzenszwalb & McAllester • Context: complement needed to get to the goal. • • Recursive derivation of contexts Recursive derivation of contexts.

  23. Heuristics for Parsing: Context Abstractions • A* requires lower bound of derivation cost A* i l b d f d i ti t • Derive context in coarser domain ( abstraction ) – Lower bound cost on fine domain – Lower bound cost on fine domain • Use it to prioritize search KLD: A* :

  24. Abstractions via Structure Coarsening • Coarsening: identify nodes of Hasse diagram • Coarsening: identify nodes of Hasse diagram 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 Coarsen 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 part suffices • Lower bound composition cost

  25. Coarse Level Parsing Bottom-Up KLD: Coarse Domain Contexts to Fine Level Top-Down

  26. Fine Level Parsing Top-Down Guidance: Heuristic, Coarse Level Bottom-Up Composition, Fine level

  27. A* versus Best First Parsing • A* Parsing A* P i Front Part Middle Part Back Part Object Goal Coarse Level Fine Level • KLD Parsing

  28. Parsing & Localization Results - I

  29. Parsing & Localization Results - II

  30. Parsing & Localization Results - III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Apples Bottles 1 1 1 1 0.9 0.9 0.8 0.8 etection rate tection rate 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 Dete 0.4 0.4 Dete 0.3 0.3 Contour Segment Networks Contour Segment Networks 0.2 0.2 Our method − Berkeley Edges Our method − Berkeley Edges 0.1 Our method − Lindeberg Edges 0.1 Our method − Lindeberg Edges 0 0 0 0.25 0.5 0.75 1 1.25 1.5 0 0.25 0.5 0.75 1 1.25 1.5 False−positives per image False−positives per image

  31. UIUC Benchmark Results • 170 Images, heavy clutter • 170 Images heavy clutter – KLD: typically ~ 10 seconds – A* Search: ~ 1-2 seconds A Search: 1 2 seconds 1 Comparison with prior work 0.9 0.9 0.8 0.7 Recall all 0.6 0.6 0.5 R 0.4 0.3 Our method Leibe et. al. 0.2 Fergus et. al. 0.1 Agarwal and Roth Agarwal and Roth 0 0.6 0.7 0.8 0.9 1 Precision

  32. Talk Outline • Motivation M ti ti • Contours, Deformations and Hierarchy • Object Parsing • Object Parsing • Appearance I nform ation • Conclusions • Conclusions

  33. Are we missing something? • Appearance information • Appearance information • Main challenge: scale invariance for edges • Main challenge: scale invariance for edges – Edges are intrinsically 1-D features

  34. Scale Invariance without Scale Selection • Log-Polar sampling & spatially varying filtering Log Polar sampling & spatially varying filtering Scale Space – Turns scalings/ rotations into translations. • Fourier Transform Modulus: translation invariance Kokkinos and Yuille, CVPR 2008

  35. Descriptor Performance

  36. Talk Outline • Motivation M ti ti • Contours, Deformations and Hierarchy • Object Parsing • Object Parsing • Appearance Information • Conclusions • Conclusions

  37. Contributions • A* Search framework for Object Parsing – Bottom-Up information: production cost Bottom Up information: production cost – Top-Down information: heuristic function • Composition Rules – Canonical Rule Formulation / Hasse Diagrams – Integral Angles (not covered) • Heuristics for Parsing f – Structure Coarsening

  38. Future Research – Compositional Approach • Learning Structures and Hierarchies • Parsing and Learning with Alternative Structures (ORs) • Reusable Parts Multiple Class Recognition • Reusable Parts, Multiple Class Recognition – Revisit Low- and Mid- level vision problems • Segmentation • Boundary detection • Perceptual grouping Perceptual grouping – Scene parsing Sce e pa s g

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend