COMP90051 Statistical Machine Learning
- 22. PGM Probabilistic Inference
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - - PowerPoint PPT Presentation
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM Probabilistic Inference Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference on PGMs Computing marginal and conditional distributions
Deck 22 Statistical Machine Learning (S2 2017)
Based on Andrew Moore’s tutorial slides & Ben Rubinstein’s slides
2
Deck 22 Statistical Machine Learning (S2 2017)
* Chooses most likely class given data * Pr 𝑍|𝑌&, … , 𝑌) = +, -,./,…,.0
+, ./,…,.0
=
+, -,./,…,.0 ∑ +, -23,./,…,.0
5
* Given observation 𝑌 = 𝑦 update posterior * Pr 𝜄|𝑌 = +, <,.
+, .
=
+, <,. ∑ +, <,.
3
Y X1 Xd
…
𝜄 X
Deck 22 Statistical Machine Learning (S2 2017)
+,(FG2H)
=
∑ +, FG2H, KF, CL, KL, CD
∑ +, FG2H, KF, CQ, KL, CDR
expanding out sums, joint summing once over 25 table = T T T Pr 𝐼𝑈 Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 Pr 𝐺𝐻 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 Pr 𝐺𝐵
distributing the sums as far down as possible summing over several smaller tables = Pr 𝐼𝑈 T Pr 𝐺𝐻 T Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 T Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻
4
HT FG HG FA AS
High temp Faulty gauge Faulty alarm High gauge Alarm sounds
Deck 22 Statistical Machine Learning (S2 2017)
= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻
= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 𝑛FG 𝐺𝐵, 𝐼𝐻
= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 𝑛KF 𝐼𝐻
= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 𝑛CL 𝐼𝑈, 𝐺𝐻
= Pr 𝐼𝑈 𝑛KL 𝐼𝑈
5
AS HG FA HT FG HG FA HT FG HG HT FG HT FG HT
eliminate AS: since AS observed, really a no-op eliminate FA: multiplying 1x2 by 2x2 eliminate HG: multiplying 2x2x2 by 2x1 eliminate FG: multiplying 1x2 by 2x2 Multiplication
by summing, is actually matrix multiplication FA f t 0.6 0.4 HG f t FA f 1.0 t 0.8 0.2 X 𝑛KF 𝐼𝐻 =
Deck 22 Statistical Machine Learning (S2 2017)
Eliminate (Graph 𝐻, Evidence nodes 𝐹, Query nodes 𝑅) 1. Choose node ordering 𝐽 such that 𝑅 appears last 2. Initialise empty list active 3. For each node 𝑌𝑗 in 𝐻
a) Append Pr 𝑌𝑗 𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌𝑗)) to active
4. For each node 𝑌𝑗 in 𝐹
a) Append 𝜀(𝑌𝑗, 𝑦𝑗) to active
5. For each 𝑗 in 𝐽
a) potentials = Remove tables referencing 𝑌𝑗 from active b) 𝑂𝑗 = nodes other than 𝑌𝑗 referenced by tables c) Table 𝜚𝑗(𝑌𝑗, 𝑌de) = product of tables d) Table 𝑛𝑗 𝑌de = ∑ 𝜚𝑗(𝑌𝑗, 𝑌de)
e) Append 𝑛𝑗(𝑌de) to active
6. Return Pr (𝑌𝑅|𝑌𝐹 = 𝑦𝐹) = 𝜚𝑅(𝑌𝑅)/ ∑ 𝜚𝑅(𝑌𝑅)
6
initialise evidence marginalise normalise
Green background = Slide just for fun!
Deck 22 Statistical Machine Learning (S2 2017)
* Removes a node * Connects node’s remaining neighbours à forms a clique in the “reconstructed” graph (cliques are exactly r.v.’s involved in each sum)
* Treewidth: minimum over orderings of the largest clique * Best possible time complexity is exponential in the treewidth
7
AS HG FA HT FG HG FA HT FG HG HT FG HT FG HT
PGM after successive eliminations “reconstructed” graph From process called moralisation
AS HG FA HT FG
Deck 22 Statistical Machine Learning (S2 2017)
* Cheaply sample from desired distribution * Approximate distribution by histogram of samples
8
0.05 0.1 0.15 0.2 1 2 3 4 5 6 7 8 9 10
Deck 22 Statistical Machine Learning (S2 2017)
1. Order nodes’ parents before children (topological order) 2. Repeat
a) For each node 𝑌𝑗
i. Index into Pr (𝑌𝑗|𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌h)) with parents’ values ii. Sample Xi from this distribution
b) Together 𝒀 = (𝑌1, … , 𝑌𝑒) is a sample from the joint
(𝑌k|𝑌l = 𝑦𝐹)
1. Order nodes’ parents before children 2. Initialise set 𝑇 empty; Repeat
1. Sample 𝒀 from joint 2. If 𝑌𝐹 = 𝑦𝐹 then add 𝑌𝑅 to 𝑇
3. Return: Histogram of 𝑇, normalising counts via divide by |𝑇|
9
5 3 4 1 2
Deck 22 Statistical Machine Learning (S2 2017)
* 2x cost, supplies all marginals * Name: Marginalisation is just sum of product of tables * “Identical” variants: Max-product, for MAP estimation
* Can generalise beyond trees (beyond scope): junction tree algorithm, loopy belief propagation
10
Deck 22 Statistical Machine Learning (S2 2017)
* What is it and why do we care? * Elimination algorithm; complexity via cliques * Monte Carlo approaches as alternate to exact integration
11