Undirected Graphical Models: Markov Random Fields Probabilistic - - PowerPoint PPT Presentation
Undirected Graphical Models: Markov Random Fields Probabilistic - - PowerPoint PPT Presentation
Undirected Graphical Models: Markov Random Fields Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018 Markov Network Structure: undirected graph Undirected edges show correlations (non-causal
Markov Network
2
Structure: undirected graph Undirected
edges show correlations (non-causal relationships) between variables
e.g., Spatial image analysis: intensity of neighboring pixels are
correlated
A B C D Markov Network
MRF: Joint distribution
3
Factor 𝜚(𝑌1, … , 𝑌𝑙)
𝜚: 𝑊𝑏𝑚(𝑌1, … , 𝑌𝑙) → ℝ Scope: {𝑌1, … , 𝑌𝑙}
Joint distribution is parametrized by factors 𝚾 = 𝜚1 𝑬1 , … , 𝜚𝐿 𝑬𝐿 : 𝑄 𝑌1, … , 𝑌𝑂 = 1 𝑎
𝑙
𝜚𝑙(𝑬𝑙)
𝑎 =
𝒀 𝑙
𝜚𝑙(𝑬𝑙)
𝑬𝑙: the set of variables in the k-th factor 𝑎: normalization constant called partition function
Misconception example
4
Factors show “compatibilities” between different values of the variables in their scope A factor is only one contribution to the overall joint distribution. [Koller & Friedman] 𝐵 = 0
5
Misconception example
6
Some inferences: 𝑄 𝐵, 𝐶 =
MRF: Gibbs distribution
7
Gibbs distribution with factors 𝚾 = {𝜚1 𝒀𝐷1 , … , 𝜚𝐿 𝒀𝐷𝐿 }: 𝑄𝚾 𝑌1, … , 𝑌𝑂 = 1 𝑎
𝑗=1 𝐿
𝜚𝑗(𝒀𝐷𝑗)
𝑎 =
𝒀 𝑗=1 𝐿
𝜚𝑗(𝒀𝐷𝑗)
𝜚𝑗 𝒀𝐷𝑗 : potential function on clique 𝐷𝑗
𝜚𝑗: Local contingency functions 𝒀𝐷𝑗: the set of variables in the clique 𝐷𝑗
Potential functions and cliques in the graph completely
determine the joint distribution.
MRF Factorization: clique
8
Factors are functions of the variables in the cliques
T
- reduce the number of factors we can only allow factors for
maximal cliques
A B C D Cliques: {A,B,C}, {B,C,D}, {A,B}, {A,C}, {B,C}, {B,D}, {C,D}, {A}, {B}, {C}, {D} Max-cliques: {A,B,C}, {B,C,D}
Clique: subsets of nodes in the graph that are fully connected (complete subgraph) Maximal clique: where no superset of the nodes in a clique are also compose a clique, the clique is maximal
Relation between factorization and independencies
9
Theorem: Let 𝒀, 𝒁, 𝒂 be three disjoint sets of variables: 𝑄 ⊨ 𝒀 ⊥ 𝒁|𝒂 iff 𝑄 𝒀, 𝒁, 𝒂 = 𝑔 𝒀, 𝒂 (𝒁, 𝒂)
MRF Factorization and pairwise independencies
10
A
distribution 𝑄𝚾 with 𝚾 = {𝜚1 𝑬1 , … , 𝜚𝐿 𝑬𝐿 } factorizes over an MRF 𝐼 if each 𝑬𝑙 is a complete subgraph of 𝐼
To hold conditional independence property, 𝑌𝑗 and 𝑌
𝑘
that are not directly connected must not appear in the same factor in the distributions belonging to the graph
MRFs: Global Independencies
11
Global independencies for any disjoint sets A, B, C:
𝐵 ⊥ 𝐶|𝐷
A path is active given 𝑎 if no node in it is in 𝑎 𝑌 and 𝑍 are separated given 𝑎 if there is no active path between 𝑌 and 𝑍 given 𝑎
𝑌 𝑎 𝑍 If all paths that connect a node in 𝐵 to a node in 𝐶 pass through one or more nodes in set 𝐷
Separation in the undirected graph: sep𝐼(𝑌, 𝑍|𝑎)
MRF: independencies
12
Determining
conditional independencies in undirected models is much easier than in directed ones
Conditioning in undirected models can only eliminate
dependencies while in directed ones observations can create new dependencies (v-structure)
MRF: global independencies
13
Independencies encoded by 𝐼 (that are found using the
graph separation discussed previously):
𝐽(𝐼) = {(𝒀 ⊥ 𝒁|𝒂) ∶ sep𝐼(𝒀, 𝒁|𝒂)}
If 𝑄
satisfies 𝐽(𝐼) , we say that 𝐼 is an I-map (independency map) of 𝑄
𝐽 𝐼 ⊆ 𝐽 𝑄 where 𝐽 𝑄 = 𝒀, 𝒁 𝒂 ∶ 𝑄 ⊨ (𝒀 ⊥ 𝒁|𝒂)}
Factorization & Independence
14
Factorization ⇒ Independence (soundness of separation
criterion)
Theorem: If 𝑄 factorizes over 𝐼, and sep𝐼(𝒀, 𝒁|𝒂) then 𝑄
satisfies 𝒀 ⊥ 𝒁|𝒂 (i.e., 𝐼 is an I-map of 𝑄)
Independence ⇒ Factorization
Theorem (Hammersley Clifford): For a positive distribution 𝑄,
if 𝑄 satisfies 𝐽(𝐼) = {(𝒀 ⊥ 𝒁|𝒂) ∶ sep𝐼(𝒀, 𝒁|𝒂)} then 𝑄 factorizes over 𝐼
Factorization & Independence
15
Theorem: Two equivalent views of graph structure for
positive distributions:
If 𝑄 satisfies all independencies held in 𝐼, then it can be
represented factorized on cliques of 𝐼
If 𝑄 factorizes over a graph 𝐼, we can read from the graph
structure, independencies that must hold in 𝑄
Factorization on Markov networks
16
It is not as intuitive as that of Bayesian networks
The correspondence between the factors in a Gibbs distribution
and the distribution 𝑄 is much more indirect
Factors
do not necessarily correspond either to probabilities or to conditional probabilities.
The parameters (of factors) may not be intuitively
understandable, making them hard to elicit from people.
There are no constraints on the parameters in a factor
While both CPDs and joint distributions must satisfy certain
normalization constraints
Interpretation of clique potentials
17
Potentials
cannot all be marginal
- r
conditional distributions
A positive clique potential can be considered as general
compatibility or goodness measure over values of the variables in its scope
Different factorizations
18
Maximal cliques:
𝑄𝚾 𝑌1, 𝑌2, 𝑌3, 𝑌4 = 1 𝑎 𝜚123 𝑌1, 𝑌2, 𝑌3 𝜚234 𝑌2, 𝑌3, 𝑌4 𝑎 = 𝑌1,𝑌2,𝑌3,𝑌4 𝜚123 𝑌1, 𝑌2, 𝑌3 𝜚234 𝑌2, 𝑌3, 𝑌4
Sub-cliques:
𝑄𝚾′ 𝑌1, 𝑌2, 𝑌3, 𝑌4
= 1
𝑎 𝜚12 𝑌1, 𝑌2 𝜚23 𝑌2, 𝑌3 𝜚13 𝑌1, 𝑌3 𝜚24 𝑌2, 𝑌4 𝜚34 𝑌3, 𝑌4 𝑎 = 𝑌1,𝑌2,𝑌3,𝑌4 𝜚12 𝑌1, 𝑌2 𝜚23 𝑌2, 𝑌3 𝜚13 𝑌1, 𝑌3 𝜚24 𝑌2, 𝑌4 𝜚34 𝑌3, 𝑌4
Canonical representation
𝑄𝚾′ 𝑌1, 𝑌2, 𝑌3, 𝑌4
= 1
𝑎 𝜚123 𝑌1, 𝑌2, 𝑌3 𝜚234 𝑌2, 𝑌3, 𝑌4 𝜚12 𝑌1, 𝑌2 𝜚23 𝑌2, 𝑌3 𝜚13 𝑌1, 𝑌3
× 𝜚24 𝑌2, 𝑌4 𝜚34 𝑌3, 𝑌4 𝜚1 𝑌1 𝜚2 𝑌2 𝜚3 𝑌3 𝜚4 𝑌4
𝑎 = 𝑌1,𝑌2,𝑌3,𝑌4 𝜚123 𝑌1, 𝑌2, 𝑌3 𝜚234 𝑌2, 𝑌3, 𝑌4 𝜚12 𝑌1, 𝑌2 𝜚23 𝑌2, 𝑌3 ×
𝜚13 𝑌1, 𝑌3 𝜚24 𝑌2, 𝑌4 𝜚34 𝑌3, 𝑌4 𝜚1 𝑌1 𝜚2 𝑌2 𝜚3 𝑌3 𝜚4 𝑌4 𝑌1 𝑌2 𝑌3 𝑌4
Pairwise MRF
19
All of the factors on single variables or pair of variables (𝑌𝑗, 𝑌
𝑘):
𝑄 𝒀 = 1 𝑎
𝑌𝑗,𝑌𝑘 ∈𝐼
𝜚𝑗𝑘 𝑌𝑗, 𝑌
𝑘 𝑗
𝜚𝑗 𝑌𝑗
Pairwise MRFs are popular (simple special case of general
MRFs)
consider pairwise interactions and not interactions of larger subset of
vars.
Pairwise
MRFs are attractive because
- f
their simplicity, and because interactions on edges are an important special case that often arises in practice
In general, they do not have enough parameters to encompass the whole
space of joint distributions
Factor graph
20
Markov network structure doesn’t itself fully specify the
factorization of 𝑄
does not generally reveal all the structure in a Gibbs
parameterization
Factor graph: two kinds of nodes
Variable nodes Factor nodes
Factor graph is a useful structure for inference and
parametrization (as we will see)
𝑌1 𝑌2 𝑌3 𝑔
1
𝑔
2
𝑔
3
𝑔
4
𝑄 𝑌1, 𝑌2, 𝑌3 = 𝑔
1 𝑌1, 𝑌2, 𝑌3 𝑔 2 𝑌1, 𝑌2 𝑔 3 𝑌2, 𝑌3 𝑔 4(𝑌3)
Energy function
21
Constraining clique potentials to be positive could be
inconvenient
We represent a clique potential in an unconstrained form using
a real-value "energy" function
If potential functions are strictly positive 𝜚𝐷 𝒀𝐷 > 0:
𝜚𝐷 𝒀𝐷 = exp −𝐹𝐷(𝒀𝐷)
𝑄 𝒀 = 1 𝑎 exp{−
𝐷
𝐹𝐷(𝒀𝐷)}
𝐹(𝒀𝐷): energy function 𝐹𝐷 𝒀𝐷 = − ln 𝜚𝐷 𝒀𝐷
Log-linear models
22
Defining the energy function as a linear combination of
features
A set of 𝑛
features {𝑔
1 𝑬1 , … , 𝑔 𝑛 𝑬𝑛 }
- n complete
subgraphs where 𝑬𝑗 shows the scope of the i-th feature:
Scope of a feature is a complete subgraph We can have different features over a sub-graph
𝑄 𝒀 = 1 𝑎 exp −
𝑗=1 𝑛
𝑥𝑗𝑔
𝑗(𝑬𝑗)
Ising model
23
Most likely joint-configurations usually correspond to a
"low-energy" state
𝑌𝑗 ∈ −1,1 Grid model
Image processing, lattice physics, etc. The states of adjacent nodes are related 𝑄 𝒚 = 1 𝑎 exp
𝑗
𝑣𝑗𝑦𝑗 +
𝑗,𝑘∈𝐹
𝑥𝑗𝑘𝑦𝑗𝑦𝑘 Ising model uses 𝑔
𝑗𝑘 𝑦𝑗, 𝑦𝑘 = 𝑦𝑗𝑦𝑘
Shared features in log-linear models
24
In most practical models, same feature and weight are
used over many scopes
𝑄 𝒚 = 1 𝑎 exp
𝑗
𝑣𝑗𝑦𝑗 +
(𝑗,𝑘)∈𝐼
𝑥𝑗𝑘𝑦𝑗𝑦𝑘 𝑔
𝑗𝑘 𝑦𝑗, 𝑦𝑘 = 𝑔 𝑦𝑗, 𝑦𝑘 = 𝑦𝑗𝑦𝑘
𝑄 𝒚 = 1 𝑎 exp
𝑗
𝑣𝑦𝑗 +
(𝑗,𝑘)∈𝐼
𝑥𝑦𝑗𝑦𝑘 𝑥𝑗𝑘 = 𝑥
Image denoising
25
𝑧𝑗 ∈ −1,1 , 𝑗 = 1, … , 𝐸: array of observed noisy pixels 𝑦𝑗 ∈ −1,1 , 𝑗 = 1, … , 𝐸: noise free image
[Bishop]
Image denoising
26
𝜗1 𝑦𝑗, 𝑦𝑘 = −𝛽𝑦𝑗𝑦𝑘 𝜗2 𝑦𝑗 = −𝛾𝑦𝑗 𝜗3 𝑦𝑗, 𝑧𝑗 = 𝛿𝑦𝑗𝑧𝑗 𝑄 𝒚, 𝒛 = 1 𝑎
𝑗
exp{−𝜗1 𝑦𝑗, 𝑧𝑗 }
𝑗
exp{−𝜗2 𝑦𝑗 }
𝑗,𝑘∈𝐼
exp{−𝜗3 𝑦𝑗, 𝑦𝑘 } = 1 𝑎 exp{−
𝑗
𝜗1 𝑦𝑗, 𝑧𝑗 −
𝑗
𝜗2 𝑦𝑗 −
𝑗,𝑘∈𝐼
𝜗3 𝑦𝑗, 𝑦𝑘 }
𝒚 = argmax
𝒚
𝑄(𝒚|𝒛) MPA: Most probable assignment of 𝒚 variables given an evidence 𝒛
Image denoising
27
𝐹 𝒚, 𝒛 = −ℎ
𝑗
𝑦𝑗 − 𝛾
𝑗,𝑘 ∈𝐼
𝑦𝑗𝑦𝑘 − 𝜃
𝑗
𝑦𝑗𝑧𝑗 𝑄 𝒚, 𝒛 = 1 𝑎 exp{−𝐹(𝒚, 𝒛)} 𝒚 = argmax
𝒚
𝑄(𝒚|𝒛)
MPA: Most probable assignment of 𝒚 variables given an evidence 𝒛
Image denoising (gray-scale image)
28
Metric MRF:
𝐹 𝒚, 𝒛 = −𝛾
𝑗,𝑘 ∈𝐼
min( 𝑦𝑗 − 𝑦𝑘
2, 𝑒) − 𝜃 𝑗
𝑦𝑗 − 𝑧𝑘
2
𝒚 = argmax
𝒚
1 𝑎 exp{−𝐹(𝒚, 𝒛)}
MPE: Most probable explanation of 𝒚 variables given an evidence 𝒛 𝑔
𝑗𝑘 𝑦𝑗, 𝑦𝑘 = 𝑔 𝑦𝑗, 𝑦𝑘 = min( 𝑦𝑗 − 𝑦𝑘 2, 𝑒)
MRF: Pairwise and local independencies
29
Pairwise independencies: 𝑌𝑗 ⊥ 𝑌
𝑘 | 𝒀 − {𝑌𝑗, 𝑌 𝑘}
When there is not an edge between 𝑌𝑗 and 𝑌
𝑘
Markov Blanket (local independencies): A variable is
conditionally independent
- f
every
- ther
variables conditioned only on its neighboring nodes 𝑌𝑗 ⊥ 𝒀 − 𝑌𝑗 − 𝑁𝐶 𝑌𝑗 | 𝑁𝐶(𝑌𝑗)
𝑁𝐶 𝑌𝑗 = {𝑌′ ∈ 𝒀|(𝑌𝑗, 𝑌′) ∈ 𝑓𝑒𝑓𝑡}
Relationship between local and global Markov properties
30
If 𝑄 ⊨ 𝐽𝑚(𝐼) then 𝑄 ⊨ 𝐽𝑞(𝐼). If 𝑄 ⊨ 𝐽(𝐼) then 𝑄 ⊨ 𝐽𝑚(𝐼). Theorem: For a positive distribution 𝑄, the following
three statements are equivalent:
𝑄 ⊨ 𝐽𝑞(𝐼) 𝑄 ⊨ 𝐽𝑚(𝐼) 𝑄 ⊨ 𝐽(𝐼)
One way to construct an undirected minimal I-map of a distribution
31
𝐼 is a minimal I-map for 𝑄 if
𝐽(𝐼) ⊆ 𝐽(𝑄) Removal of a single edge in 𝐼 renders it is not an I-map of 𝐻
We can define ℋ to include an edge 𝑌 − 𝑍 for all 𝑌, 𝑍
for which: 𝑄 ⊭ (𝑌 ⊥ 𝑍|𝒴 − {𝑌, 𝑍})
Theorem: ℋ will be a unique minimal I-map for the
positive distribution 𝑄.
Perfect map of a distribution
32
Not every distribution has a MN perfect map Not every distribution has a BN perfect map
Probabilistic models Directed Undirected Graphical models
Loop of at least 4 nodes without chord has no equivalent in BNs
33
Is there a BN that is a perfect map for this MN?
A B D C A B D C A B D C 𝐵 ⊥ 𝐷| 𝐶, 𝐸 𝐶 ⊥ 𝐸| 𝐵, 𝐷 A B D C 𝐶 ⊥ 𝐸| 𝐵, 𝐷 𝐵 ⊥ 𝐷| 𝐶, 𝐸 𝐶 ⊥ 𝐸| 𝐵, 𝐷 𝐵 ⊥ 𝐷| 𝐶, 𝐸 𝐵 ⊥ 𝐷| 𝐶, 𝐸 𝐶 ⊥ 𝐸| 𝐵, 𝐷
V-structure has no equivalent in MNs
34
Is there an MN that is a perfect I-map of this BN?
A B C A B C A B C 𝐵 ⊥ 𝐶 𝐵 ⊥ 𝐶|𝐷 𝐵 ⊥ 𝐶 𝐵 ⊥ 𝐶|𝐷 𝐵 ⊥ 𝐶 𝐵 ⊥ 𝐶|𝐷
Minimal I-map
35
Since we may not find an Markov Network (MN) that is a
perfect map of a BN and vice versa, we study the minimal I-map property
𝐼 is a minimal I-map for 𝐻 if
𝐽(𝐼) ⊆ 𝐽(𝐻) Removal of a single edge in 𝐼 renders it is not an I-map of 𝐻
Minimal I-maps: from DAGs to MNs
36
The moral graph 𝑁(𝐻) of a DAG 𝐻 is an undirected
graph that contains an undirected edge between 𝑌 and 𝑍 if:
there is a directed edge between them in either direction 𝑌 and 𝑍 are parents of the same node
Moralization turns a node and its parent into a fully
connected sub-graph
A B C A B C
Minimal I-maps: from DAGs to MNs
37
Theorem: The moral graph 𝑁(𝐻) of a DAG 𝐻 is a minimal
I-map for 𝐻
The moral graph loses some independence information But, we cannot remove any edge from it without appearing new
independencies that are not in 𝐻
all independencies in the moral graph are also satisfied in 𝐻
Theorem: If a DAG 𝐻 is "moral", then its moralized graph
𝑁(𝐻) is a perfect I-map of 𝐻.
Minimal I-maps: from MNs to DAGs
38
Theorem: If 𝐻 is a BN that is minimal I-map for an MN,
then 𝐻 cannot have immoralities.
Corollary: If 𝐻 is a minimal I-map for an MN then it is
chordal
Any BN that is I-map for an MN must add triangulating edges
into the graph
A B D C An undirected graph is chordal if any loop with more than three nodes has a chord A B D C 𝐻 is a minimal I-map of the left MN
Perfect I-map
39
Theorem: Let 𝐼 be a non-chordal MN. Then there is no
BN that is a perfect I-map for 𝐼.
⇒
If the independencies in an MN can be exactly
represented via a BN then the MN graph is chordal
A B D C
Perfect I-map
40
Theorem: Let 𝐼 be a chordal MN. Then there exists a
DAG 𝐻 that is a perfect I-map for 𝐼
⇒ The independencies in a graph can be represented in
both type of models if and only if the graph is chordal
A B C D E A B C D E
Relationship between BNs and MNs: summary
41
Directed and undirected models represent different
families of independence assumptions
Chordal graphs can be represented in both BNs and MNs
For inference, we can use a single representation for both
types of these models
simpler design and analysis of the inference algorithm
Conditional Random Field (CRF)
42
Undirected graph 𝐼 with nodes 𝒀 ∪ 𝒁
𝒀: observed variables 𝒁: target variables
Consider factors 𝚾 = {𝜚1 𝑬1 , … , 𝜚𝐿 𝑬𝐿 } where each 𝑬𝑗
⊈ 𝒀: 𝑄 𝒁|𝒀 = 1 𝑎 𝒀 𝑄 𝒁, 𝒀 𝑄 𝒁, 𝒀 =
𝑗=1 𝐿
𝜚𝑗(𝑬𝑗)
𝑎(𝒀) =
𝒁
𝑄 𝒁, 𝒀
Nodes are connected by edge in 𝐼 whenever they appear
together in the scope of some factor
CRF: discriminative model
43
Conditional
probability 𝑄(𝒁|𝒀) rather than joint probability 𝑄(𝒁, 𝒀)
CRF is based on the conditional probability of label sequence
given observation sequence
The probability of a transition between labels may depend on
past and future observations
Allow
arbitrary dependency between features
- n
the
- bservation sequence and we need not to model it
As opposed to independence assumptions in generative models
Naïve Markov Model
44
𝜚𝑗 𝑌𝑗, 𝑍 = exp 𝑥𝑗𝐽 𝑌𝑗 = 1, 𝑍 = 1 𝜚0 𝑍 = exp 𝑥0𝐽 𝑍 = 1 𝑄 𝑍 = 1 𝑌1, 𝑌2, … , 𝑌𝑙 = 𝜏 𝑥0 +
𝑘=1 𝑙
𝑥
𝑘𝑌 𝑘
𝑌1 𝑌2 𝑌𝑙 𝑍 … 𝑌𝑗 is binary random variable 𝑍: binary random variable
CRF: logistic model Naïve Markov model
45
𝑄 𝑍, 𝒀 = exp 𝑥0𝐽 𝑍 = 1 +
𝑗=1 𝑛
𝑥𝑗𝐽 𝑌𝑗 = 1, 𝑍 = 1 𝑄 𝑍 = 1, 𝒀 = exp 𝑥0 +
𝑗=1 𝑛
𝑥𝑗𝑌𝑗 𝑄 𝑍 = 0, 𝒀 = exp 0 = 1 𝑄 𝑍 = 1 𝒀 = 1 1 + exp − 𝑥0 + 𝑘=1
𝑙
𝑥
𝑘𝑌 𝑘
= 𝜏 𝑥0 +
𝑗=1 𝑛
𝑥𝑗𝑌𝑗
Number of parameters is linear
Linear-chain CRF
46
𝑄 𝒁|𝒀 = 1 𝑎 𝒀 𝑄 𝒁, 𝒀 𝑄 𝒁, 𝒀 =
𝑗=1 𝐿
𝜚(𝑍
𝑗, 𝑍 𝑗+1) 𝑗=1 𝐿
𝜚(𝑍
𝑗, 𝑌𝑗)
𝑎(𝒀) =
𝒁
𝑄 𝒁, 𝒀
𝑍
1
𝑍
2
… 𝑍
𝐿
𝑌1 𝑌2 … 𝑌𝐿 Linear-chain CRF
CRF for sequence labeling
47
𝑍
1
𝑍
2
𝑍
𝑈
𝒀1:𝑈 … 𝑍
𝑈−1
𝑍
1
𝑍
2
𝑍
𝑈
𝑌1 𝑌2 𝑌𝑈
…𝑍
𝑈−1
𝑌𝑈−1
CRF as a discriminative model
48
Discriminative approach for labeling CRF
does not model the distribution
- ver
the
- bservations
Dependencies between observed variables may be quite
complex or poorly understood but we don’t worry about modeling them
𝑍
1
𝑍
2
… 𝑍
𝑈
𝑌1 𝑌2 … 𝑌𝑈 𝑍
1
𝑍
2
… 𝑍
𝑈
𝑌1, … , 𝑌𝑈 When labeling 𝑌𝑗 future observations are taken into account
CRF: Named Entity Recognition
49
Features: word capitalized, word in atlas of locations,
previous word is “Mrs”, …
𝜚(𝑍
𝑢, 𝑍 𝑢+1)
𝜚(𝑍
𝑢, 𝑌1, … , 𝑌𝑈)
[Koller & Friedman]
CRF: Image segmentation example
50
A node 𝑍
𝑗 for the label of each super-pixel
𝑊𝑏𝑚(𝑍
𝑗) = {1,2, … , 𝐿} (i.e., grass, sky, water, …)
An edge between 𝑍
𝑗 and 𝑍 𝑘 where the corresponding super-
pixels share a boundary
A node 𝑌𝑗 for the features (e.g., color, texture, location) of
each super-pixel
CRF: Image segmentation example
51
Simple: 𝜚 𝑍
𝑗, 𝑍 𝑘 = exp{−𝜇𝐽(𝑍 𝑗 ≠ 𝑍 𝑘)}
More complex:
e.g., horse adjacent to vegetation than water depends on the relative pixel location, e.g., water below vegetation,
sky above every thing
CRF: Image segmentation example
52
[Koller & Friedman]
Reference
53