COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - - PowerPoint PPT Presentation

comp90051 statistical machine learning
SMART_READER_LITE
LIVE PREVIEW

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM Probabilistic Inference Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference on PGMs Computing marginal and conditional distributions


slide-1
SLIDE 1

COMP90051 Statistical Machine Learning

  • 22. PGM Probabilistic Inference

Semester 2, 2017 Lecturer: Trevor Cohn

slide-2
SLIDE 2

Deck 22 Statistical Machine Learning (S2 2017)

Probabilistic inference on PGMs

Computing marginal and conditional distributions from the joint of a PGM using Bayes rule and marginalisation. This deck: how to do it efficiently.

Based on Andrew Moore’s tutorial slides & Ben Rubinstein’s slides

2

slide-3
SLIDE 3

Deck 22 Statistical Machine Learning (S2 2017)

Two familiar examples

  • Naïve Bayes (frequentist/Bayesian)

* Chooses most likely class given data * Pr 𝑍|𝑌&, … , 𝑌) = +, -,./,…,.0

+, ./,…,.0

=

+, -,./,…,.0 ∑ +, -23,./,…,.0

5

  • Data 𝑌|𝜄~𝑂 𝜄, 1 with prior 𝜄~𝑂 0,1 (Bayesian)

* Given observation 𝑌 = 𝑦 update posterior * Pr 𝜄|𝑌 = +, <,.

+, .

=

+, <,. ∑ +, <,.

  • =
  • Joint + Bayes rule + marginalisation à anything

3

Y X1 Xd

𝜄 X

slide-4
SLIDE 4

Deck 22 Statistical Machine Learning (S2 2017)

Nuclear power plant

  • Alarm sounds; meltdown?!
  • Pr 𝐼𝑈 𝐵𝑇 = 𝑢 = +, CD, FG2H

+,(FG2H)

=

∑ +, FG2H, KF, CL, KL, CD

  • MN, ON, MP

∑ +, FG2H, KF, CQ, KL, CDR

  • MN, ON, MP, OSR
  • Numerator (denominator similar)

expanding out sums, joint summing once over 25 table = T T T Pr 𝐼𝑈 Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 Pr 𝐺𝐻 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 Pr 𝐺𝐵

  • KF
  • CL
  • KL

distributing the sums as far down as possible summing over several smaller tables = Pr 𝐼𝑈 T Pr 𝐺𝐻 T Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 T Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻

  • KF
  • CL
  • KL

4

HT FG HG FA AS

High temp Faulty gauge Faulty alarm High gauge Alarm sounds

slide-5
SLIDE 5

Deck 22 Statistical Machine Learning (S2 2017)

Nuclear power plant (cont.)

= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻

  • KF
  • CL
  • KL

= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 𝑛FG 𝐺𝐵, 𝐼𝐻

  • KF
  • CL
  • KL

= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 𝑛KF 𝐼𝐻

  • CL
  • KL

= Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 𝑛CL 𝐼𝑈, 𝐺𝐻

  • KL

= Pr 𝐼𝑈 𝑛KL 𝐼𝑈

5

AS HG FA HT FG HG FA HT FG HG HT FG HT FG HT

eliminate AS: since AS observed, really a no-op eliminate FA: multiplying 1x2 by 2x2 eliminate HG: multiplying 2x2x2 by 2x1 eliminate FG: multiplying 1x2 by 2x2 Multiplication

  • f tables, followed

by summing, is actually matrix multiplication FA f t 0.6 0.4 HG f t FA f 1.0 t 0.8 0.2 X 𝑛KF 𝐼𝐻 =

slide-6
SLIDE 6

Deck 22 Statistical Machine Learning (S2 2017)

Elimination algorithm

Eliminate (Graph 𝐻, Evidence nodes 𝐹, Query nodes 𝑅) 1. Choose node ordering 𝐽 such that 𝑅 appears last 2. Initialise empty list active 3. For each node 𝑌𝑗 in 𝐻

a) Append Pr 𝑌𝑗 𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌𝑗)) to active

4. For each node 𝑌𝑗 in 𝐹

a) Append 𝜀(𝑌𝑗, 𝑦𝑗) to active

5. For each 𝑗 in 𝐽

a) potentials = Remove tables referencing 𝑌𝑗 from active b) 𝑂𝑗 = nodes other than 𝑌𝑗 referenced by tables c) Table 𝜚𝑗(𝑌𝑗, 𝑌de) = product of tables d) Table 𝑛𝑗 𝑌de = ∑ 𝜚𝑗(𝑌𝑗, 𝑌de)

  • .e

e) Append 𝑛𝑗(𝑌de) to active

6. Return Pr (𝑌𝑅|𝑌𝐹 = 𝑦𝐹) = 𝜚𝑅(𝑌𝑅)/ ∑ 𝜚𝑅(𝑌𝑅)

  • .g

6

initialise evidence marginalise normalise

Green background = Slide just for fun!

slide-7
SLIDE 7

Deck 22 Statistical Machine Learning (S2 2017)

Runtime of elimination algorithm

  • Each step of elimination

* Removes a node * Connects node’s remaining neighbours à forms a clique in the “reconstructed” graph (cliques are exactly r.v.’s involved in each sum)

  • Time complexity exponential in largest clique
  • Different elimination orderings produce different cliques

* Treewidth: minimum over orderings of the largest clique * Best possible time complexity is exponential in the treewidth

7

AS HG FA HT FG HG FA HT FG HG HT FG HT FG HT

PGM after successive eliminations “reconstructed” graph From process called moralisation

AS HG FA HT FG

slide-8
SLIDE 8

Deck 22 Statistical Machine Learning (S2 2017)

Probabilistic inference by simulation

  • Exact probabilistic inference can be expensive/impossible
  • Can we approximate numerically?
  • Idea: sampling methods

* Cheaply sample from desired distribution * Approximate distribution by histogram of samples

8

0.05 0.1 0.15 0.2 1 2 3 4 5 6 7 8 9 10

slide-9
SLIDE 9

Deck 22 Statistical Machine Learning (S2 2017)

Monte Carlo approx probabilistic inference

  • Algorithm: sample once from joint

1. Order nodes’ parents before children (topological order) 2. Repeat

a) For each node 𝑌𝑗

i. Index into Pr (𝑌𝑗|𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌h)) with parents’ values ii. Sample Xi from this distribution

b) Together 𝒀 = (𝑌1, … , 𝑌𝑒) is a sample from the joint

  • Algorithm: sampling from Pr

(𝑌k|𝑌l = 𝑦𝐹)

1. Order nodes’ parents before children 2. Initialise set 𝑇 empty; Repeat

1. Sample 𝒀 from joint 2. If 𝑌𝐹 = 𝑦𝐹 then add 𝑌𝑅 to 𝑇

3. Return: Histogram of 𝑇, normalising counts via divide by |𝑇|

  • Sampling++: Importance weighting, Gibbs, Metropolis-Hastings

9

5 3 4 1 2

slide-10
SLIDE 10

Deck 22 Statistical Machine Learning (S2 2017)

Alternate forms of probabilistic inference

  • Elimination algorithm produces single marginal
  • Sum-product algorithm on trees

* 2x cost, supplies all marginals * Name: Marginalisation is just sum of product of tables * “Identical” variants: Max-product, for MAP estimation

  • In general these are message-passing algorithms

* Can generalise beyond trees (beyond scope): junction tree algorithm, loopy belief propagation

  • Variational Bayes: approximation via optimisation

10

slide-11
SLIDE 11

Deck 22 Statistical Machine Learning (S2 2017)

Summary

  • Probabilistic inference on PGMs

* What is it and why do we care? * Elimination algorithm; complexity via cliques * Monte Carlo approaches as alternate to exact integration

11