Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - - PowerPoint PPT Presentation

tow ards bridging bottom up top dow n vision w ith
SMART_READER_LITE
LIVE PREVIEW

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith - - PowerPoint PPT Presentation

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models UC Irvine UC Irvine Iasonas Kokkinos Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille High-Level Vision Goals


slide-1
SLIDE 1

Tow ards Bridging Bottom -Up & Top-Dow n Vision w ith Hierarchical Com positional Models

UC Irvine Iasonas Kokkinos UC Irvine Iasonas Kokkinos Center for Image and Vision Sciences UCLA Joint work with Alan Yuille

slide-2
SLIDE 2

High-Level Vision Goals

  • Given an image

– Decide if it contains a car Find its location – Find its location – Find its extent – Find its structures

slide-3
SLIDE 3

Two Main Approaches to Vision

  • Bottom-up

– Data Driven

  • Top-down

– Model Driven – Feature Extraction – Parameter Estimation – Pattern recognition – Analysis-by-Synthesis

slide-4
SLIDE 4

Motivation

– Vision problems have both low- and high- level aspects – Synergy: joint treatment improves performance

  • D. Mumford, Pattern Theory, 1995

Synergy: joint treatment improves performance – Combined bottom-up and top-down processing

slide-5
SLIDE 5

Talk Outline

M ti ti

  • Motivation
  • Deform ations and Contours
  • Object Parsing
  • Object Parsing
  • Appearance Information
  • Conclusions
  • Conclusions
slide-6
SLIDE 6

Top-Down: Object Models

  • Deformable Models
  • Deformable Models

X S( X) X S( X)

  • Active Appearance Models

pp

slide-7
SLIDE 7

Joint Segmentation and Recognition

  • EM formulation

– E-step: segmentation – M-step: deformable model fitting

  • AAM-based segmentation

M-step E-step

  • Segmentation-based detection

Kokkinos & Maragos, PAMI 2008

slide-8
SLIDE 8

Learning Deformation Models

AAM Learning: g

s T M: Update E: Deform Edges & Ridges Input Images S AAM Fit

Training Set

Kokkinos and Yuille, ICCV 2007

Deformation modes

slide-9
SLIDE 9

P i l Sk t h C t Ed d Rid

Bottom-Up: Contour-Based Image Description

  • Primal Sketch Contours: Edges and Ridges

Sketch Contours Edge Tokens Ridge Tokens

– Geometry & semantics

slide-10
SLIDE 10

Talk Outline

M ti ti

  • Motivation
  • Contours, Deformations and Hierarchy
  • Object Parsing
  • Object Parsing
  • Appearance Information
  • Conclusions
  • Conclusions
slide-11
SLIDE 11

Hierarchical Compositional Models

Object

Parts Object

Parts Object

Parts Contours

Parts Contours

Tokens

Tokens

  • Top-down view: object generates tokens
  • Bottom-up view: object is composed from tokens

Bottom up view: object is composed from tokens

slide-12
SLIDE 12

Inference for Structured Models

  • Graphical Models ( Bayesian Netw orks/ MRFs)

– Encode random variable dependencies with a graph. High Level Vision – High-Level Vision

  • Random variables: part poses

(e.g. location, orientation, scale)

D d i ki ti t i t

  • Dependencies: kinematic constraints
  • Belief Propagation

– Graph nodes ` inform’ each other by sending messages. – Converges after 2 passes through the graph.

slide-13
SLIDE 13

Exploiting the Particular Setting

– Sparse Image Representation

  • Bottom-up cues guide the search for objects.
  • No need to consider all node states as in BP

– Hierarchical Object Representation

  • Quickly rule out unpromising solutions
  • Coarse-to-Fine detection
slide-14
SLIDE 14

Compositional Detection

  • View production rules as composition rules
  • Build a parse tree for the object
  • Requires

C i i l – Composition rules – Prioritized search

slide-15
SLIDE 15

Composition of the ` Back’ Structure

slide-16
SLIDE 16

Composing Structures

H l t t ?

  • How can we compose complex structures?

– Gestalt rules (parallelism, similarity..)

  • How will we compose this?

p

?

  • How will we compose learned structures?
slide-17
SLIDE 17

Canonical Rule Formulation

C bi t t ith tit t t ti

  • Combine structure with one constituent at a time.
  • Mechanical construction of composition rules
  • At most binary rules
  • At most binary rules
  • Derivation cost: minus log-likelihood of observations
slide-18
SLIDE 18

Composition as Climbing a Lattice

  • Introduce vector indicating instantiated substructures

– partial ordering among structures

  • Hasse Diagram for 3-partite structure

g p

1 3 1 1 1 1 0 1 0 1 1 1 1 0 2 0 0 1 0 0 0 1 0 0 0 1 0

– By acquiring a substructure, the structure climbs upwards

slide-19
SLIDE 19

Composition of the ` Back’ Structure

Problem: Too many options! (Combinatorial explosion)

slide-20
SLIDE 20

Analogy: Building a puzzle

  • Bottom-Up solution: Combine pieces until you build the car

p p y

– Does not exploit the box’ cover

  • Top-Down solution: Try fitting each piece to the box’ cover.

– Most pieces are uniform/irrelevant

  • Bottom-Up/Top-Down solution:

F lik t t b t t t bi ti – Form car-like structures, but use cover to suggest combinations.

slide-21
SLIDE 21

Best First Search

Dijk t ’ Al ith

  • Dijkstra’s Algorithm

– Prioritize based on ` cost so far’ – For parsing: Knuth’s Lightest Derivation For parsing: Knuth s Lightest Derivation

  • A* Search

– Consider ` cost to go’ – Approximate with heuristic cost

Exit Cost so far Cost to go Cost to go Heuristic cost Entry

slide-22
SLIDE 22

` Cost to go’ for Parsing

  • The Generalized A* Architecture, Felzenszwalb & McAllester
  • Context: complement needed to get to the goal.
  • Recursive derivation of contexts
  • Recursive derivation of contexts.
slide-23
SLIDE 23

Heuristics for Parsing: Context Abstractions

A* i l b d f d i ti t

  • A* requires lower bound of derivation cost
  • Derive context in coarser domain (abstraction)

– Lower bound cost on fine domain – Lower bound cost on fine domain

  • Use it to prioritize search

KLD: A* :

slide-24
SLIDE 24

Abstractions via Structure Coarsening

  • Coarsening: identify nodes of Hasse diagram
  • Coarsening: identify nodes of Hasse diagram

1 1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 1 Coarsen 1 1 1

1 part suffices

0 0 0 0 0 0

  • Lower bound composition cost
slide-25
SLIDE 25

Coarse Level Parsing

KLD: Coarse Domain

Bottom-Up

Contexts to Fine Level

Top-Down

slide-26
SLIDE 26

Fine Level Parsing

Top-Down Guidance: Heuristic, Coarse Level Bottom-Up Composition, Fine level

slide-27
SLIDE 27

A* versus Best First Parsing

A* P i

  • A* Parsing

Front Part Middle Part Back Part Object Goal

Coarse Level Fine Level

  • KLD Parsing
slide-28
SLIDE 28

Parsing & Localization Results - I

slide-29
SLIDE 29

Parsing & Localization Results - II

slide-30
SLIDE 30

Parsing & Localization Results - III

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1

Apples

1

Bottles

0.4 0.5 0.6 0.7 0.8 0.9 1

etection rate

0.4 0.5 0.6 0.7 0.8 0.9 1

tection rate

0.25 0.5 0.75 1 1.25 1.5 0.1 0.2 0.3 0.4

False−positives per image Dete

Contour Segment Networks Our method − Berkeley Edges Our method − Lindeberg Edges 0.25 0.5 0.75 1 1.25 1.5 0.1 0.2 0.3 0.4

False−positives per image Dete

Contour Segment Networks Our method − Berkeley Edges Our method − Lindeberg Edges

slide-31
SLIDE 31

UIUC Benchmark Results

  • 170 Images heavy clutter
  • 170 Images, heavy clutter

– KLD: typically ~ 10 seconds – A* Search: ~ 1-2 seconds A Search: 1 2 seconds

0.9 1 Comparison with prior work 0.6 0.7 0.8 0.9

all

0.4 0.5 0.6

Recall

0.1 0.2 0.3

R

Our method Leibe et. al. Fergus et. al. Agarwal and Roth 0.6 0.7 0.8 0.9 1

Precision

Agarwal and Roth

slide-32
SLIDE 32

Talk Outline

M ti ti

  • Motivation
  • Contours, Deformations and Hierarchy
  • Object Parsing
  • Object Parsing
  • Appearance I nform ation
  • Conclusions
  • Conclusions
slide-33
SLIDE 33

Are we missing something?

  • Appearance information
  • Appearance information
  • Main challenge: scale invariance for edges
  • Main challenge: scale invariance for edges

– Edges are intrinsically 1-D features

slide-34
SLIDE 34
  • Log-Polar sampling & spatially varying filtering

Scale Invariance without Scale Selection

Log Polar sampling & spatially varying filtering

Scale Space

– Turns scalings/ rotations into translations.

  • Fourier Transform Modulus: translation invariance

Kokkinos and Yuille, CVPR 2008

slide-35
SLIDE 35

Descriptor Performance

slide-36
SLIDE 36

Talk Outline

M ti ti

  • Motivation
  • Contours, Deformations and Hierarchy
  • Object Parsing
  • Object Parsing
  • Appearance Information
  • Conclusions
  • Conclusions
slide-37
SLIDE 37

Contributions

  • A* Search framework for Object Parsing

Bottom Up information: production cost – Bottom-Up information: production cost – Top-Down information: heuristic function

  • Composition Rules

– Canonical Rule Formulation / Hasse Diagrams – Integral Angles (not covered)

f

  • Heuristics for Parsing

– Structure Coarsening

slide-38
SLIDE 38

Future Research

– Compositional Approach

  • Learning Structures and Hierarchies
  • Parsing and Learning with Alternative Structures (ORs)
  • Reusable Parts Multiple Class Recognition
  • Reusable Parts, Multiple Class Recognition

– Revisit Low- and Mid- level vision problems

  • Segmentation
  • Boundary detection

Perceptual grouping

  • Perceptual grouping

– Scene parsing Sce e pa s g