Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - - PowerPoint PPT Presentation
Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - - PowerPoint PPT Presentation
Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille High-Level Vision Goals
High-Level Vision Goals
- Given an image
– Decide if it contains a car Find its location – Find its location – Find its extent – Find its structures
Two Main Approaches to Vision
- Bottom-up
– Data Driven
- Top-down
– Model Driven – Feature Extraction – Parameter Estimation – Pattern recognition – Analysis-by-Synthesis
Motivation
– Vision problems have both low- and high- level aspects – Synergy: joint treatment improves performance
- D. Mumford, Pattern Theory, 1995
Synergy: joint treatment improves performance – Combined bottom-up and top-down processing
Talk Outline
M ti ti
- Motivation
- Deform ations and Contours
- Object Parsing
- Object Parsing
- Appearance Information
- Conclusions
- Conclusions
Top-Down: Object Models
- Deformable Models
- Deformable Models
X S( X) X S( X)
- Active Appearance Models
pp
Joint Segmentation and Recognition
- EM formulation
– E-step: segmentation – M-step: deformable model fitting
- AAM-based segmentation
M-step E-step
- Segmentation-based detection
Kokkinos & Maragos, PAMI 2008
Learning Deformation Models
AAM Learning: g
s T M: Update E: Deform Edges & Ridges Input Images S AAM Fit
Training Set
Kokkinos and Yuille, ICCV 2007
Deformation modes
P i l Sk t h C t Ed d Rid
Bottom-Up: Contour-Based Image Description
- Primal Sketch Contours: Edges and Ridges
Sketch Contours Edge Tokens Ridge Tokens
– Geometry & semantics
Talk Outline
M ti ti
- Motivation
- Contours, Deformations and Hierarchy
- Object Parsing
- Object Parsing
- Appearance Information
- Conclusions
- Conclusions
Hierarchical Compositional Models
Object
Parts Object
Parts Object
Parts Contours
Parts Contours
Tokens
Tokens
- Top-down view: object generates tokens
- Bottom-up view: object is composed from tokens
Bottom up view: object is composed from tokens
Inference for Structured Models
- Graphical Models ( Bayesian Netw orks/ MRFs)
– Encode random variable dependencies with a graph. High Level Vision – High-Level Vision
- Random variables: part poses
(e.g. location, orientation, scale)
D d i ki ti t i t
- Dependencies: kinematic constraints
- Belief Propagation
– Graph nodes ` inform’ each other by sending messages. – Converges after 2 passes through the graph.
Exploiting the Particular Setting
– Sparse Image Representation
- Bottom-up cues guide the search for objects.
- No need to consider all node states as in BP
– Hierarchical Object Representation
- Quickly rule out unpromising solutions
- Coarse-to-Fine detection
Compositional Detection
- View production rules as composition rules
- Build a parse tree for the object
- Requires
C i i l – Composition rules – Prioritized search
Composition of the ` Back’ Structure
Composing Structures
H l t t ?
- How can we compose complex structures?
– Gestalt rules (parallelism, similarity..)
- How will we compose this?
p
?
- How will we compose learned structures?
Canonical Rule Formulation
C bi t t ith tit t t ti
- Combine structure with one constituent at a time.
- Mechanical construction of composition rules
- At most binary rules
- At most binary rules
- Derivation cost: minus log-likelihood of observations
Composition as Climbing a Lattice
- Introduce vector indicating instantiated substructures
– partial ordering among structures
- Hasse Diagram for 3-partite structure
g p
1 3 1 1 1 1 0 1 0 1 1 1 1 0 2 0 0 1 0 0 0 1 0 0 0 1 0
– By acquiring a substructure, the structure climbs upwards
Composition of the ` Back’ Structure
Problem: Too many options! (Combinatorial explosion)
Analogy: Building a puzzle
- Bottom-Up solution: Combine pieces until you build the car
p p y
– Does not exploit the box’ cover
- Top-Down solution: Try fitting each piece to the box’ cover.
– Most pieces are uniform/irrelevant
- Bottom-Up/Top-Down solution:
F lik t t b t t t bi ti – Form car-like structures, but use cover to suggest combinations.
Best First Search
Dijk t ’ Al ith
- Dijkstra’s Algorithm
– Prioritize based on ` cost so far’ – For parsing: Knuth’s Lightest Derivation For parsing: Knuth s Lightest Derivation
- A* Search
– Consider ` cost to go’ – Approximate with heuristic cost
Exit Cost so far Cost to go Cost to go Heuristic cost Entry
` Cost to go’ for Parsing
- The Generalized A* Architecture, Felzenszwalb & McAllester
- Context: complement needed to get to the goal.
- Recursive derivation of contexts
- Recursive derivation of contexts.
Heuristics for Parsing: Context Abstractions
A* i l b d f d i ti t
- A* requires lower bound of derivation cost
- Derive context in coarser domain (abstraction)
– Lower bound cost on fine domain – Lower bound cost on fine domain
- Use it to prioritize search
KLD: A* :
Abstractions via Structure Coarsening
- Coarsening: identify nodes of Hasse diagram
- Coarsening: identify nodes of Hasse diagram
1 1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 1 Coarsen 1 1 1
1 part suffices
0 0 0 0 0 0
- Lower bound composition cost
Coarse Level Parsing
KLD: Coarse Domain
Bottom-Up
Contexts to Fine Level
Top-Down
Fine Level Parsing
Top-Down Guidance: Heuristic, Coarse Level Bottom-Up Composition, Fine level
A* versus Best First Parsing
A* P i
- A* Parsing
Front Part Middle Part Back Part Object Goal
Coarse Level Fine Level
- KLD Parsing
Parsing & Localization Results - I
Parsing & Localization Results - II
Parsing & Localization Results - III
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
Apples
1
Bottles
0.4 0.5 0.6 0.7 0.8 0.9 1
etection rate
0.4 0.5 0.6 0.7 0.8 0.9 1
tection rate
0.25 0.5 0.75 1 1.25 1.5 0.1 0.2 0.3 0.4
False−positives per image Dete
Contour Segment Networks Our method − Berkeley Edges Our method − Lindeberg Edges 0.25 0.5 0.75 1 1.25 1.5 0.1 0.2 0.3 0.4
False−positives per image Dete
Contour Segment Networks Our method − Berkeley Edges Our method − Lindeberg Edges
UIUC Benchmark Results
- 170 Images heavy clutter
- 170 Images, heavy clutter
– KLD: typically ~ 10 seconds – A* Search: ~ 1-2 seconds A Search: 1 2 seconds
0.9 1 Comparison with prior work 0.6 0.7 0.8 0.9
all
0.4 0.5 0.6
Recall
0.1 0.2 0.3
R
Our method Leibe et. al. Fergus et. al. Agarwal and Roth 0.6 0.7 0.8 0.9 1
Precision
Agarwal and Roth
Talk Outline
M ti ti
- Motivation
- Contours, Deformations and Hierarchy
- Object Parsing
- Object Parsing
- Appearance I nform ation
- Conclusions
- Conclusions
Are we missing something?
- Appearance information
- Appearance information
- Main challenge: scale invariance for edges
- Main challenge: scale invariance for edges
– Edges are intrinsically 1-D features
- Log-Polar sampling & spatially varying filtering
Scale Invariance without Scale Selection
Log Polar sampling & spatially varying filtering
Scale Space
– Turns scalings/ rotations into translations.
- Fourier Transform Modulus: translation invariance
Kokkinos and Yuille, CVPR 2008
Descriptor Performance
Talk Outline
M ti ti
- Motivation
- Contours, Deformations and Hierarchy
- Object Parsing
- Object Parsing
- Appearance Information
- Conclusions
- Conclusions
Contributions
- A* Search framework for Object Parsing
Bottom Up information: production cost – Bottom-Up information: production cost – Top-Down information: heuristic function
- Composition Rules
– Canonical Rule Formulation / Hasse Diagrams – Integral Angles (not covered)
f
- Heuristics for Parsing
– Structure Coarsening
Future Research
– Compositional Approach
- Learning Structures and Hierarchies
- Parsing and Learning with Alternative Structures (ORs)
- Reusable Parts Multiple Class Recognition
- Reusable Parts, Multiple Class Recognition
– Revisit Low- and Mid- level vision problems
- Segmentation
- Boundary detection
Perceptual grouping
- Perceptual grouping