Algorithms for reasoning with graphical models
Class 1 Rina Dechter
class1 276-2018
Dechter-Morgan&claypool book (Dbook): Chapters 1-2
graphical models Class 1 Rina Dechter Dechter-Morgan&claypool - - PowerPoint PPT Presentation
Algorithms for reasoning with graphical models Class 1 Rina Dechter Dechter-Morgan&claypool book (Dbook): Chapters 1-2 class1 276-2018 Outline Graphical models: The constraint network, Probabilistic networks, cost networks and mixed
class1 276-2018
Dechter-Morgan&claypool book (Dbook): Chapters 1-2
networks and mixed networks. queries: consistency, counting, optimization and likelihood queries.
consistency, and the Davis-Putnam algorithms.) The induced-width
queries (mpe, map, marginal and probability of evidence)
hypertrees, join-trees.
junction-trees )
belief/constraint-propagation, constraint propagation, generalized belief propagation, variational methods)
backjumping and learning.
bound).
sampling, SampleSearch and AND/OR sampling, Stochastic Local Search.
class1 276-2018
networks and mixed networks. Graphical representations and queries: consistency, counting, optimization and likelihood queries.
(Adaptive-consistency, and the Davis-Putnam algorithms.) The induced- width.
marginal and probability of evidence)
algorithm, Cluster tree-elimination. )
propagation, generalized belief propagation)
backjumping and learning.
class1 276-2018
final grades.
models”, R. Dechter, Claypool, 2013 https://www.morganclaypool.com/doi/abs/10.2200/S00529ED1V 01Y201308AIM023
Darwiche, MIT Press, 2009.
class1 276-2018
class1 276-2018
E K F L H C B A M G J D ABC BDEF DGF EFH FHK HJ KLM
1 1 1 1 1 1 1 1 1 1 1 1 0101010101010101010101010101010101010101010101010101010101010101 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1
E C F D B A
1
Context minimal AND/OR search graph
A
OR AND
B
OR AND OR
E
OR
F F
AND
01
AND
0 1 C D D 01 0 1 1 E C D D 0 1 1 B E F F 0 1 C 1 E C
– Queries – Examples, applications, and tasks – Algorithms overview
– Bucket elimination for trees – Bucket elimination – Jointree clustering – Elimination orders
– Decomposition bounds
– Mini-bucket & weighted mini-bucket – Belief propagation
class1 276-2018
ABC BDEF DGF EFH FHK HJ KLM
A D E C B B C E D
E K F L H C B A M G J D
– Queries – Examples, applications, and tasks – Algorithms overview
– Bucket elimination for trees – Bucket elimination – Jointree clustering – Elimination orders
– Decomposition bounds – Mini-bucket & weighted mini-bucket – Belief propagation
class1 276-2018
ABC BDEF DGF EFH FHK HJ KLM
A D E C B B C E D
E K F L H C B A M G J D
– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence
class1 276-2018
– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence
– Maximization (MAP): compute the most probable configuration
[Yanover & Weiss 2002]
class1 276-2018
– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence
– Summation & marginalization
grass plane sky grass cow
Observation y Observation y Marginals p( xi | y ) Marginals p( xi | y )
and “partition function”
class1 276-2018
e.g., [Plath et al. 2009]
– Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence
– Mixed inference (marginal MAP, MEU, …)
Test Drill Oil sale policy Test result Seismic structure Oil underground Oil produced Test cost Drill cost Sales cost Oil sales Market information
Influence diagrams &
(the “oil wildcatter” problem)
class1 276-2018
e.g., [Raiffa 1968; Shachter 1986]
class1 276-2018
A B
red green red yellow green red green yellow yellow green yellow red
Variables - countries (A,B,C,etc.) Values - colors (red, green, blue) Constraints:
etc. , E D D, A B, A
C A B D E F G
A B E G D F C
Constraint graph
class1 276-2018
Is it possible that Chris goes to the party but Becky does not?
A B C
class1 276-2018
P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)
lung Cancer Smoking X-ray Bronchitis Dyspnoea
P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S)
CPD:
C B P(D|C,B) 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1
MAP(P)= 𝑛𝑏𝑦𝑇,𝑀,𝐶,𝑌 P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B) Combination: Product Marginalization: sum/max
class1 276-2018
Questions:
likely to show up at the party?
but Becky does not?
P(W,A,C,B) = P(B|W) · P(C|W) · P(A|W) · P(W) P(A,C,B|W=bad) = 0.9 · 0.1 · 0.5
P(A|W=bad)=.9
W A
P(C|W=bad)=.1
W C
P(B|W=bad)=.5
W B W P(W) P(A|W) P(C|W) P(B|W) B C A
W A P(A|W) good .01 good 1 .99 bad .1 bad 1 .9
class1 276-2018
PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
The “alarm” network: 37 variables, 509 parameters (rather than 237 = 1011 !) [Beinlich et al., 1989]
class1 276-2018
18
P(C|W) P(B|W) P(W) P(A|W) W B A C
Query: Is it likely that Chris goes to the party if Becky does not but the weather is bad?
) , , | , ( A C B A bad w B C P
C→A
B A C P(C|W) P(B|W) P(W) P(A|W) W B A C
A→B C→A
B A C
Alex is-likely-to-go in bad weather Chris rarely-goes in bad weather Becky is indifferent but unpredictable
Example: The combination operator defines an overall function from the individual factors, e.g., “+” : Notation: Discrete Xi values called states Tuple or configuration: states taken by a set of variables Scope of f: set of variables that are arguments to a factor f
class1 276-2018
A graphical model consists of:
and a combination operator
(we’ll assume discrete)
= 0 + 6
A B f(A,B) 6 1 1 1 1 6 B C f(B,C) 6 1 1 1 1 6 A B C f(A,B,C) 12 1 6 1 1 1 6 1 6 1 1 1 1 6 1 1 1 12
For discrete variables, think of functions as “tables” (though we might represent them more efficiently) A graphical model consists of:
and a combination operator Example:
(we’ll assume discrete)
class1 276-2018
Primal graph: variables → nodes factors → cliques
G A B C D F
A graphical model consists of:
and a combination operator
class1 276-2018
Overall function is “and” of individual constraints:
for adjacent regions i,j
“Tabular” form:
X0 X1 f(X0 ,X1) 1 1 2 1 1 1 1 1 1 2 1 2 1 2 1 1 2 2
Tasks: “max”: is there a solution? “sum”: how many solutions?
class1 276-2018
Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B)
Two constants: Anna (A) and Bob (B)
SA CA f(SA,CA) exp(1.5) 1 exp(1.5) 1 1.0 1 1 exp(1.5) FAB SA SB f(.) exp(1.1) 1 exp(1.1) 1 exp(1.1) 1 1 exp(1.1) 1 exp(1.1) 1 1 1.0 1 1 1.0 1 1 1 exp(1.1)
[Richardson & Domingos 2005]
class1 276-2018
Primal graph: variables nodes factors cliques A graphical model consists of:
and a combination operator
class1 276-2018
G A B C D F
ABD BCF AC DFG D B C A F
Dual graph: factor scopes nodes edges intersections (separators)
“Factor” graph: explicitly indicate the scope of each factor variables circles factors squares
G A B C D F A B C D A B C D
Useful for disambiguating factorization:
O(d4) pairwise: O(d2)
A B C D
class1 276-2018
A graphical model consists of:
Operators: combination operator (sum, product, join, …) elimination operator (projection, sum, max, min, ...) Types of queries: Marginal: MPE / MAP: Marginal MAP:
class1 276-2018 ) ( : C A F fi
A D B C E F
A C F P(F|A,C) 0.14 1 0.96 1 0.40 1 1 0.60 1 0.35 1 1 0.65 1 1 0.72 1 1 1 0.68
Conditional Probability Table (CPT)
Primal graph (interaction graph)
A C F red green blue blue red red blue blue green green red blue
Relation
(𝐵⋁𝐷⋁𝐺)
class1 276-2018
class1 276-2018
30
A B
red green red yellow green red green yellow yellow green yellow red
Variables: countries (A B C etc.) Values: colors (red green blue) Constraints:
... , E D D, A B, A
C A B D E F G
A B D C G F E
Queries: Find one solution, all solutions, counting
class1 276-2018
Combination = join Marginalization = projection
Combination: sum Marginalization:min/max
class1 276-2018
class1 276-2018
Combination: product Marginalization: sum or min/max
set2 huji
– Information extraction, semantic parsing, translation, topic models, …
– Object recognition, scene analysis, segmentation, tracking, …
– Pedigree analysis, protein folding and binding, sequence matching, …
– Webpage link analysis, social networks, communications, citations, ….
– Planning & decision making
class1 276-2018
200 400 600 800 1000 1200 1 2 3 4 5 6 7 8 9 10 f(n) n
Linear / Polynomial / Exponential
Linear Polynomial Exponential
Complexity is Time and space(memory)
class1 276-2018
– Queries – Examples, applications, and tasks – Algorithms overview
– Bucket elimination for trees – Bucket elimination – Jointree clustering – Elimination orders
– Decomposition bounds – Mini-bucket & weighted mini-bucket – Belief propagation
class1 276-2018
ABC BDEF DGF EFH FHK HJ KLM
A D E C B B C E D
E K F L H C B A M G J D
Sum-Inference Max-Inference Mixed-Inference
– Anytime: very fast & very approximate ! Slower & more accurate
class1 276-2018
Belief updating (sum-prod) MPE (max-prod)
CSP – consistency (projection-join) #CSP (sum-prod)
P(X) P(Y|X) P(Z|X) P(T|Y) P(R|Y) P(L|Z) P(M|Z)
) (X mZX ) (X mXZ ) (Z mZM
) (Z mZL
) (Z mMZ ) (Z mLZ ) (X mYX ) (X mXY
) (Y mTY ) (Y mYT ) (Y mRY ) (Y mYR
Trees are processed in linear time and memory
class1 276-2018
class1 276-2018
E K F L H C B A M G J D ABC BDEF DGF EFH FHK HJ KLM
treewidth = 4 - 1 = 3 treewidth = (maximum cluster size) - 1 Inference algorithm: Time: exp(tree-width) Space: exp(tree-width)
class1 276-2018
C P J A L B E D F M O H K G N C P J L B E D F M O H K G N
A
C P J L E D F M O H K G N
B
P J L E D F M O H K G N
C Cycle cutset = {A,B,C}
C P J A L B E D F M O H K G N C P J L B E D F M O H K G N C P J L E D F M O H K G N C P J A L B E D F M O H K G N
class1 276-2018
A=yellow A=green B=red B=blue B=red B=blue B=green B=yellow
C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E
A C B K G L D F H M J E
Graph Coloring problem
class1 276-2018
exp(w*) time/space
A D B C E F
1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
E C F D B A
1
Exp(w*) time O(w*) space
E K F L H C B A M G J D ABC BDEF DGF EFH FHK HJ KLM A=yellow A=green B=blue B=red B=blue B=green C K G L D F H M J E A C B K G L D F H M J E C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E
Search+inference: Space: exp(q) Time: exp(q+c(q)) q: user controlled
class1 276-2018
exp(w*) time/space
A D B C E F
1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
E C F D B A
1
Exp(w*) time O(w*) space
E K F L H C B A M G J D ABC BDEF DGF EFH FHK HJ KLM A=yellow A=green B=blue B=red B=blue B=green C K G L D F H M J E A C B K G L D F H M J E C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E
Search+inference: Space: exp(q) Time: exp(q+c(q)) q: user controlled
Context minimal AND/OR search graph 18 AND nodes
A
OR AND
B
OR AND OR
E
OR
F F
AND
0 1
AND
1 C D D 0 1 1 1 E C D D 1 1 B E F F 1 C 1 E C
class1 276-2018
A D B C E F
1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
E C F D B A
1
E K F L H C B A M G J D ABC BDEF DGF EFH FHK HJ KLM A=yellow A=green B=blue B=red B=blue B=green C K G L D F H M J E A C B K G L D F H M J E C K G L D F H M J E C K G L D F H M J E C K G L D F H M J E
Search + inference: Sampling + bounded inference
class1 276-2018
Context minimal AND/OR search graph 18 AND nodes
A
OR AND
B
OR AND OR
E
OR
F F
AND
0 1
AND
1 C D D 0 1 1 1 E C D D 1 1 B E F F 1 C 1 E C