Markov Random Fields and its Applications Huiwen Chang - - PowerPoint PPT Presentation

markov random fields and its applications
SMART_READER_LITE
LIVE PREVIEW

Markov Random Fields and its Applications Huiwen Chang - - PowerPoint PPT Presentation

Markov Random Fields and its Applications Huiwen Chang Introduction Markov Random Fields(MRF) A kind of undirected graphical model To model vision problems: Low level: image restoration, segmentation, texture analysis High


slide-1
SLIDE 1

Markov Random Fields and its Applications

Huiwen Chang

slide-2
SLIDE 2

Introduction

  • Markov Random Fields(MRF)

– A kind of undirected graphical model

  • To model vision problems:

– Low level: image restoration, segmentation, texture analysis… – High level: object recognition and matching(structure from motion, stereo matching)…

slide-3
SLIDE 3

Introduction

  • Markov Random Fields(MRF)

– A kind of undirected graphical model

  • To model vision problems:

– Low level: image restoration, segmentation, texture analysis… – High level: object recognition and matching(structure from motion, stereo matching)…

Labeling Problem

slide-4
SLIDE 4

Graphical Models

  • Probabilistic graphical models

Nodes: random variables Edges: statistical dependencies among random variables

– Advantage

Compact and efficient way to visualize conditional independence assumptions to represent probability distribution

slide-5
SLIDE 5

Conditional Independence

slide-6
SLIDE 6

Graphical Model

  • Bayesian Network

– Directed acyclic graph:

  • Factorization:
  • Conditional Independence:
slide-7
SLIDE 7

Graphical Model

  • Markov Network (MRF)

Potential functions over the maximal clique of the graph Potential function ψ(.) > 0

P(a,b,c,d)=

slide-8
SLIDE 8

Graphical Model

  • Markov Network (MRF)

Hammersley-Clifford theorem says

such distributions that are consistent with the set of conditional independence statements & the set of such distributions that can be expressed as a factorization by the maximal cliques of the graph are identical P(a,b,c,d)=

slide-9
SLIDE 9

MAP inference

  • Posterior probability of the labelling y given observation x is :
slide-10
SLIDE 10

Markov Random Field

  • Posterior probability of the labelling y given observation x is :

P(𝑧 𝑦 =

1 𝑎(𝑦) 𝜒𝑑 𝒛𝐷; 𝑦 𝑑

where 𝑎(𝑦) = 𝜒𝑑 𝒛𝐷; 𝑦

𝑑

is called the partition function.

  • Since we define potential function is strictly positive, we can

express them as exponentials:

cliques

Potential functions

slide-11
SLIDE 11

MAP inference

  • Posterior probability of the labelling y given observation x is :
  • The most possible labeling is to minimize the Energy
slide-12
SLIDE 12

Pairwise MRF

  • Most common energy function for image

labeling

  • Which of the energy acted as the prior?

Pairwise Unary

slide-13
SLIDE 13

Example MRF model: Image Denoising

  • How can we retrieve the original image given the

noisy one?

Original image Y Noisy image X(Input)

slide-14
SLIDE 14
  • Nodes

– For each pixel i,

  • yi : latent variable (value in original image)
  • xi : observed variable (value in noisy image)

Simple setting : xi, yi  {-1,1}

MRF formulation

y1 y2 yi yn x1 x2 xi xn

slide-15
SLIDE 15

MRF formulation

  • Edges

– xi,yi of each pixel i correlated – neighboring pixels, similar value(smoothness)

y1 y2 yi yn x1 x2 xi xn

slide-16
SLIDE 16

MRF formulation

  • Edges

– xi,yi of each pixel i correlated  𝑦𝑗, 𝑧𝑗 = −𝛾𝑦𝑗𝑧𝑗 – neighboring pixels, similar value(smoothness) (𝑧𝑗, 𝑧𝑘) = −𝛽𝑧𝑗𝑧𝑘

y1 y2 yi yn x1 x2 xi xn

slide-17
SLIDE 17

MRF formulation

Energy function

E y; 𝑦 = (𝑧𝑗, 𝑧𝑘)

𝑗𝑘

+  𝑦𝑗, 𝑧𝑗

𝑗

= −𝛽 𝑧𝑗𝑧𝑘

𝑗𝑘

− 𝛾 𝑦𝑗𝑧𝑗

𝑗

y1 y2 yi yn x1 x2 xi xn

slide-18
SLIDE 18

Optimization

Iterated Conditional Modes (ICM)

  • Initialize 𝑧𝑗 = 𝑦𝑗 for all i
  • Take a 𝑧𝑗, fix others, flip 𝑧𝑗 if −𝑧𝑗 make energy

lower

  • Repeat until converge

Energy function

E y; 𝑦 = (𝑧𝑗, 𝑧𝑘)

𝑗𝑘

+  𝑦𝑗, 𝑧𝑗

𝑗

= −𝛽 𝑧𝑗𝑧𝑘

𝑗𝑘

− 𝛾 𝑦𝑗𝑧𝑗

𝑗

slide-19
SLIDE 19
  • riginal image

noise image(10%) Restored by ICM(4%)

slide-20
SLIDE 20
  • riginal image

noise image(10%) Restored by ICM(4%) Restored by Graph Cut(<1%)

slide-21
SLIDE 21

Optimization

  • Iterated Conditional Modes(ICM)
  • Graph Cuts (GC)
  • Message Passing

– Belief Propagation (BP) – Tree Reweighted (not concluded here)

  • LP-relaxation

– Cutting-plane (not concluded here)

……

slide-22
SLIDE 22

Graph Cuts

  • To find labeling f that minimizes the energy

neighbors pixels

𝐹𝑡𝑛𝑝𝑝𝑢ℎ 𝐹𝑒𝑏𝑢𝑏 𝐸𝑞(𝑔

𝑞; 𝑌𝑞)

E(𝑔; 𝑌)

slide-23
SLIDE 23

Graph Cuts

  • For labeling problem

– 2 Labels: Find global minimum

  • Max-flow/min-cut algorithm
  • Boykov and Kolmogrov 2001

– the worst case complexity O(mn^2 |C|) – Fast in practice

– For multi-labels: Computing the global minimum is NP-hard

  • Will discuss later
slide-24
SLIDE 24

Two-Label Example: Lazy snapping

  • Goal: Separate foreground from background
  • 1st step: Use stroke to select
  • 2nd step: Use polygon with vertices to refine
slide-25
SLIDE 25

Model the problem

𝐹1(𝑦𝑗): Likelihood Energy 𝐹2(𝑦𝑗, 𝑦𝑘): Prior Energy 𝑌𝑗: Label of node i. in {0: background, 1: foreground}

slide-26
SLIDE 26

Likelihood Energy

  • Data Term
  • Use color similarity with known pixel to give

an energy for uncertain pixel

– 𝑒𝑗

𝐺 : Minimum distance to front color

– 𝑒𝑗

𝐶 : Minimum distance to background color

slide-27
SLIDE 27

Prior Energy

  • Penalty Term for boundaries

– Only nonzero if across segmentation boundary – Larger if adjacent pixels have similar colors

  • Smoothness Term
slide-28
SLIDE 28

Problem

  • Too slow for “real-time” requirement!
slide-29
SLIDE 29

Problem

  • Too slow for “real-time” requirement!
slide-30
SLIDE 30

Pre-segmentation

  • Graph Cut on segment instead of pixel level
  • 𝐷𝑗𝑘: the mean color difference between the two

segments, weighted by the shared boundary length

  • Speed Comparison
slide-31
SLIDE 31

Graph Cuts

  • 2 Labels: Find global minimum

– Max-flow/min-cut algorithm – Fast

  • Multi-Labels: computing the global minimum

is NP-hard

– Approximation algorithm(for some forms of energy function: smoothness energy term V which is metric or semi-metric)

Identity Symmetry &non-negativity Triangle inequality

slide-32
SLIDE 32

Graph Cuts for multi labels

  • α-expansion(d)

– V is metric, within a known factor of global minimum

  • α-swap(c)

– V is semi-metric, local minimum

slide-33
SLIDE 33

Graph Cut for multi labels

  • α-swap algorithm

  • α-expansion algorithm
slide-34
SLIDE 34

Graph Cut for multi labels

  • α-swap algorithm

  • α-expansion algorithm

Min cut/ Max Flow

slide-35
SLIDE 35

α-swap algorithm

r p

slide-36
SLIDE 36

α-expansion algorithm

slide-37
SLIDE 37

Multi-label Examples: Shift Map

User Constraints

A B C D A B D C

No accurate segmentation required

slide-38
SLIDE 38

Multi-label Examples: Shift Map

User Constraints

A B C D A B D C

slide-39
SLIDE 39

Multi-label Examples: Shift Map

Image completion/In-painting(object removing)

User’s mask Input Output

slide-40
SLIDE 40

Multi-label Examples: Shift Map

Input Output

Retargeting(content-aware image resizing)

slide-41
SLIDE 41

Multi-label Examples: Shift Map

  • Label Set: relative mapping coordinate M

assign a label to pixel Nodes: pixels Labels: shift-map values (tx,ty) (x,y) Output : R(u,v) Input : I(x,y)

slide-42
SLIDE 42
  • Energy function:
  • Data term

– varies between different application – Inpainting :

Specific input pixels can be forced not to be included in the output image by setting D(x,y)=∞

Multi-label Examples: Shift Map

 

 

 

N q p s R p d

q M p M E p M E M E

,

)) ( ), ( ( )) ( ( ) ( 

Smoothness term : Avoid Stitching Artifacts Data term : External Editing Requirement

(x,y)

 D

(u,v)

slide-43
SLIDE 43

Smoothness term

q’ p’ np’ nq’ Output Image Input Image

p q Discontinuity in the shift-map

)) ( ), ( ( q M p M Es

For p For q

color gradient

slide-44
SLIDE 44

Multi-label Examples: Shift map

  • Graph Cuts : α-expansion
  • Why design label as disparity instead of

absolute coordinate of input image?

slide-45
SLIDE 45

Hierarchical Solution

Gaussian pyramid

  • n input

Shift-Map Shift-Map Output

slide-46
SLIDE 46
  • Label set for each pixel:

– disparity 𝑒 ∈ {0, 1, . . 𝐸}

Left Camera Image Right Camera Image Dense Stereo Result

Multi-label Examples: Dense Stereo

slide-47
SLIDE 47

Multi-label Examples: Dense Stereo

  • Data term: for pixel p, label d

Left Camera Image Right Camera Image

slide-48
SLIDE 48

Multi-label Examples: Dense Stereo

  • Smoothness term for neighboring pixels p and q

– – or,

slide-49
SLIDE 49

Design smoothness term V

  • Choices of V: not Robust Better!

= min (|𝛽 − 𝛾|,k) = 1 ( 𝛽 − 𝛾 >k)

slide-50
SLIDE 50

Results

Original Image Initial Solution

slide-51
SLIDE 51

Results

Original Image 1st Expansion

slide-52
SLIDE 52

Results

Original Image 2nd Expansion

slide-53
SLIDE 53

Results

Original Image 3rd Expansion

slide-54
SLIDE 54

Results

Original Image Final expansion

slide-55
SLIDE 55

Results

  • http://vision.middlebury.edu/stereo/eval/
slide-56
SLIDE 56

Comments on Graph Cuts

  • In practice, GraphCut α-expansion algorithm

usually outperforms α-swap method

  • Limitations of GC algorithm:

– Constraint on energy term – Speed

slide-57
SLIDE 57

Belief Propagation(BP)

  • Belief Propagation allows the marginals and

maximizer to be computed efficiently on graphical models.

  • Sum-product BP is a message passing algorithm that

calculate the marginal distribution on a graphical model

  • Max-product BP(or max-sum in log domain), is used

to estimate the state configuration with maximum probability.

  • Exhaustive search O(|state|^N)
slide-58
SLIDE 58

Sum-product BP

slide-59
SLIDE 59

Messages

  • Initialize all messages to uniform (any non-negative)
  • Update: message from i to j, consider all messages flowing

into i (except for message from j):

In one round: O( #edges #states)

slide-60
SLIDE 60

Noisy neighbor rule

  • BP follows the “noisy neighbor rule” (from Brendan

Frey, U. Toronto).

  • Every node is a house in some neighborhood.
  • A message: The noisy neighbor says “given

everything I’ve heard, here’s what I think is going on inside your house.”

slide-61
SLIDE 61

Belief Propagation

  • Once messages have converged, output the

normalized belief b as the marginal 𝑐𝑗 𝑦𝑗 ∝ 𝑕 𝑦𝑗 𝑛𝑙𝑗

𝑢

𝑦𝑗

𝑙≠𝑗,𝑙∈𝑂(𝑗)

  • Max-product: x* that maximizes 𝑐𝑗(𝑦𝑗) is

individually selected for each node is the maximal configuration(compute maximum during update)

slide-62
SLIDE 62

Comments on Belief Propagation

  • BP gives exact solutions when the

graph is a tree (ie. has no loops), but only approximates the truth in loopy graphs(no convergence guarantee)

  • Scheduling

– For a tree or chain, with proper scheduling of the message updates, it will terminate after 2 steps. – For a grid (e.g. stereo on pixel lattice), people often sweep in an “up-down- left-right fashion” [Tappen & Freeman]

slide-63
SLIDE 63

Speed-ups

  • Binary variables: Use log ratios [Mackay]
  • Distance transform and multi-scale [Felzenszwalb

& Huttenlocher]

  • Sparse forward-backward [Pal et al]
  • Dynamic quantization of state space [Coughlan

& Shen]

  • Higher-order factors with linear interactions

[Potetz & Lee]

  • GPU [Brunton et al]
slide-64
SLIDE 64

Large-scale Structure from Motion

  • Recent work has built 3D models from large,

unstructured online image collections

– [Snavely06], [Li08], [Agarwal09], [Frahm10], Microsoft’s PhotoSynth, …

slide-65
SLIDE 65

From Fisher Yu’s Lecture Two-view reconstruction Incremental Bundle adjustment Surface Reconstruction

slide-66
SLIDE 66

Incremental Bundle Adjustment

Incremental BA

Works very well for many scenes Poor scalability, much use of bundle adjustment Poor results if a bad seed image set is chosen Drift and bad local minima for some scenes

slide-67
SLIDE 67

Feature detection Bundle adjustment

Refine camera poses R, T and scene structure P

Feature matching

Find scene points seen by multiple cameras

Initialization

Robustly estimate camera poses and/or scene points

Reconstruction pipeline

slide-68
SLIDE 68
  • Structure from motion-

Examples: Optimizing MRF using BP

  • View SfM as inference over a Markov Random

Field, solving for all camera poses at once

– Initializes all cameras at once – Can avoid local minima – Easily parallelized – Up to 6x speedups over IBA for large problems – Vertices are images (or points) – Inference problem: label each image with a camera pose, such that constraints are satisfied

slide-69
SLIDE 69

The MRF model

binary constraints: pairwise camera transformations unary constraints: pose estimates (e.g., GPS, heading info)

  • Input: set of images with correspondence
slide-70
SLIDE 70

Constraints on camera pairs

Rij

relative 3d rotation

tij

relative translation direction

  • Find absolute camera poses (Ri, ti) and (Rj, tj)

that agree with these pairwise estimates:

rotation consistency translation direction consistency

  • Recall how to

compute relative pose between camera pairs using 2-frame SfM

slide-71
SLIDE 71

Constraints on camera pairs

  • Define robust error functions to use as pairwise

potentials:

rotation consistency 𝜍𝑆 𝑗𝑡 truncated quadratic translation direction consistency Rij

relative 3d rotation

tij

relative translation direction

slide-72
SLIDE 72

Data term: pose information

  • Noisy absolute pose info for some cameras

– 2D positions from geotags (GPS coordinates) – Orientations (tilt & twist angles) from vanishing point detection [Sinha10]

slide-73
SLIDE 73

Prior pose information

dθ, dφ, dϕ measure robust error wrt prior pose estimate

Rotations unary potential

gi is a GPS coordinate en & π project into a common coordinate system ρT is a robust distance function

Translations unary potential

  • Noisy absolute pose info for some cameras

– 2D positions from geotags (GPS coordinates) – Orientations (tilt & twist angles) from vanishing point detection

slide-74
SLIDE 74

Overall optimization problem

  • Given pairwise and unary pose constraints,

solve for absolute camera poses simultaneously

– for n cameras, estimate = (R1, R2, …, Rn) and = (t1, t2, …, tn) so as to minimize total error over the entire graph

pairwise rotation consistency unary rotation consistency pairwise translation consistency unary translation consistency

slide-75
SLIDE 75

Solving the MRF

  • Use discrete loopy belief propagation

– Reduced by solving Rotation and Translation separately – Up to 1,000,000 nodes (cameras and points) – Up to 5,000,000 edges (constraints between cameras and points)

slide-76
SLIDE 76

Discrete BP: Rotations

  • Parameterize viewing

directions as points on unit sphere

– Discretize into 10x10x10 = 1,000 possible labels – Measure rotational errors as robust Euclidean distances on sphere (to allow use of distance transform)

  • Uniform Grid on the sphere:

http://vision.princeton.edu/ code.html#icosahedron

slide-77
SLIDE 77

Discrete BP: Translations

  • Parameterize positions as 2D points in plane

– Use approximation to error function (to allow use of distance transforms) – Discretize into up to 300 x 300 = 90,000 labels

slide-78
SLIDE 78

Central Rome

Reconstructed images: 14,754 Edges in MRF: 2,258,416 Median camera pose difference wrt IBA: 25.0m

Our result Incremental Bundle Adjustment [Agarwal09]

slide-79
SLIDE 79

Markov Random Fields

  • In MRF:
  • MRF assumes that the prior distribution is

independent of the measurements

– Specifically, the prior(smoothness) energy E(L) is independent of the observations X

  • Sometimes, we hope each variable is based on

a set of global observation data

– No pure prior knowledge

) ( ) | ( ) | ( L E L X E X L E  

posterior energy prior energy likelihood energy

slide-80
SLIDE 80

Why Conditional Random Fields?

  • Object segmentation/recognition:
  • Label set = {tree, people, water, sky, car}
slide-81
SLIDE 81

Conditional Random Field

  • Given observations X, (X, L) is said to

be a Conditional Random Field (CRF) if, the random variables/labels L obey the Markov property with respect to the graph:

p 𝑀𝑗 𝑌, 𝑀𝑘≠𝑗 = 𝑞 𝑀𝑗 𝑌, 𝑀𝑜𝑓𝑗𝑕ℎ𝑐𝑝𝑠 𝑗

  • In MRF, we have very strict

independence assumptions :

p 𝑀𝑗 𝑀𝑘≠𝑗 = 𝑞 𝑀𝑗 𝑀𝑜𝑓𝑗𝑕ℎ𝑐𝑝𝑠 𝑗

slide-82
SLIDE 82

CRF in supervised training problem

  • Define the conditional distribution 𝑄(𝑧 |𝑦, 𝜄)
  • Parameter Learning

𝜄∗ = 𝑏𝑠𝑕 𝑛𝑏𝑦𝜄 log 𝑄 𝑧𝑗 𝑦𝑗, 𝜄)

𝑗

Take the label 𝑧∗ = arg 𝑛𝑏𝑦𝑧 𝑄(𝑧|𝑦, 𝜄∗)

slide-83
SLIDE 83

Object Recognition

  • Possible Image Label Set, i.e. Y = {background; car}
  • Given an image, define variable: label y∈ 𝑍,

“patches” 𝑦 = {𝑦1, … 𝑦𝑛}

  • Training set : n labeled images (𝑦𝑗, 𝑧𝑗) where each

𝑧𝑗 ∈ 𝑍, and each 𝑦𝑗 = {𝑦𝑗,1, … 𝑦𝑗,𝑛}

  • ∅ 𝑦𝑘 ∈ 𝑆𝑒 : representation feature
  • Hidden variable: parts h= {ℎ1, … ℎ𝑛}
  • 𝜄 : parameter of the model
  • Take the label argmaxy p(y|x, 𝜄∗)

Image Label y m parts h m patches x

slide-84
SLIDE 84

Object Recognition

  • We define

where the potential function is

Image Label y Part h Patches x

slide-85
SLIDE 85

Object Recognition

  • The object function in training the parameter:

where the first term is likelihood term, and the second term is the log of a Gaussian prior with variance 𝜏2 𝜄∗ = argmax𝜄 𝑀(𝜄)

  • Optimization: gradient ascent
slide-86
SLIDE 86

Using BP

  • argmaxy p(y|x, 𝜄∗)
  • 𝜖𝑀 𝜄

𝜖𝜄

slide-87
SLIDE 87

Results

slide-88
SLIDE 88

Summary

  • MRF
  • CRF
  • Graph Cut Algorithm

– Binary – Multi-label

  • Belief Propagation Algorithm
  • Modeling Vision Problem is hard because

– Optimization algorithm doesn’t help you to find global maxima – Model itself

slide-89
SLIDE 89

Thank you!