COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM Probabilistic Inference

Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference on PGMs Computing marginal and conditional distributions from the joint of a PGM using Bayes rule and marginalisation. This deck: how to do it efficiently. Based on Andrew Moore’s tutorial slides & Ben Rubinstein’s slides 2

� Statistical Machine Learning (S2 2017) Deck 22 Two familiar examples • Naïve Bayes (frequentist/Bayesian) Y * Chooses most likely class given data * Pr 𝑍|𝑌 & , … , 𝑌 ) = +, -,. / ,…,. 0 +, -,. / ,…,. 0 = ∑ +, -23,. / ,…,. 0 +, . / ,…,. 0 �5 … X 1 X d • Data 𝑌|𝜄~𝑂 𝜄, 1 with prior 𝜄~𝑂 0,1 (Bayesian) * Given observation 𝑌 = 𝑦 update posterior 𝜄 * Pr 𝜄|𝑌 = +, <,. +, <,. = +, . ∑ +, <,. = X • Joint + Bayes rule + marginalisation à anything 3

� � � � � � � � Statistical Machine Learning (S2 2017) Deck 22 Nuclear power plant Faulty High temp HT FG gauge • Alarm sounds ; meltdown?! Faulty Pr 𝐼𝑈 𝐵𝑇 = 𝑢 = +, CD, FG2H High gauge FA HG • alarm +,(FG2H) ∑ +, FG2H, KF, CL, KL, CD MN, ON, MP = +, FG2H, KF, CQ, KL, CD R ∑ Alarm sounds MN, ON, MP, OSR AS Numerator (denominator similar) • expanding out sums, joint summing once over 2 5 table = T T T Pr 𝐼𝑈 Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 Pr 𝐺𝐻 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 Pr 𝐺𝐵 KL CL KF distributing the sums as far down as possible summing over several smaller tables = Pr 𝐼𝑈 T Pr 𝐺𝐻 T Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 T Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 KL CL KF 4

� � � � � � � � � Statistical Machine Learning (S2 2017) Deck 22 Nuclear power plant (cont.) FG HT = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 HG FA KL CL KF eliminate AS : since AS observed, really a no-op AS HT FG = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 𝑛 FG 𝐺𝐵, 𝐼𝐻 KL CL KF eliminate FA : multiplying 1x2 by 2x2 HG FA HT FG = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 𝑛 KF 𝐼𝐻 KL CL Multiplication eliminate HG : multiplying 2x2x2 by 2x1 HG of tables, followed by summing, is actually matrix multiplication HT FG = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 𝑛 CL 𝐼𝑈, 𝐺𝐻 KL HG eliminate FG : multiplying 1x2 by 2x2 FA f t f t f 1.0 0 𝑛 KF 𝐼𝐻 = X 0.6 0.4 FA = Pr 𝐼𝑈 𝑛 KL 𝐼𝑈 HT t 0.8 0.2 5

� � Statistical Machine Learning (S2 2017) Deck 22 Elimination algorithm Green background = Slide just for fun! Eliminate (Graph 𝐻 , Evidence nodes 𝐹 , Query nodes 𝑅 ) Choose node ordering 𝐽 such that 𝑅 appears last 1. 2. Initialise empty list active initialise For each node 𝑌 𝑗 in 𝐻 3. Append Pr 𝑌 𝑗 𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 𝑗 )) a) to active For each node 𝑌 𝑗 in 𝐹 4. evidence Append 𝜀(𝑌 𝑗 , 𝑦 𝑗 ) to active a) For each 𝑗 in 𝐽 5. potentials = Remove tables referencing 𝑌 𝑗 from active a) 𝑂 𝑗 = nodes other than 𝑌 𝑗 referenced by tables b) marginalise Table 𝜚 𝑗 (𝑌 𝑗 , 𝑌 d e ) = product of tables c) Table 𝑛 𝑗 𝑌 d e = ∑ 𝜚 𝑗 (𝑌 𝑗 , 𝑌 d e ) d) . e Append 𝑛 𝑗 (𝑌 d e ) to active e) normalise Return Pr (𝑌 𝑅 |𝑌 𝐹 = 𝑦 𝐹 ) = 𝜚 𝑅 (𝑌 𝑅 )/ ∑ 𝜚 𝑅 (𝑌 𝑅 ) 6. . g 6

Statistical Machine Learning (S2 2017) Deck 22 Runtime of elimination algorithm HT FG HT FG FG FG HT HT HG FA HG HT FA HT FG HG HG FA AS AS PGM after successive eliminations “reconstructed” graph From process called moralisation • Each step of elimination * Removes a node * Connects node’s remaining neighbours à forms a clique in the “reconstructed” graph (cliques are exactly r.v.’s involved in each sum) • Time complexity exponential in largest clique • Different elimination orderings produce different cliques * Treewidth: minimum over orderings of the largest clique * Best possible time complexity is exponential in the treewidth 7

Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference by simulation • Exact probabilistic inference can be expensive/impossible • Can we approximate numerically? • Idea: sampling methods * Cheaply sample from desired distribution * Approximate distribution by histogram of samples 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 8

Statistical Machine Learning (S2 2017) Deck 22 Monte Carlo approx probabilistic inference • Algorithm: sample once from joint 1 2 1. Order nodes’ parents before children (topological order) 2. Repeat 3 a) For each node 𝑌 𝑗 4 i. Index into Pr (𝑌 𝑗 |𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 h )) with parents’ values ii. Sample X i from this distribution 5 b) Together 𝒀 = (𝑌 1 , … , 𝑌 𝑒 ) is a sample from the joint • Algorithm: sampling from Pr (𝑌 k |𝑌 l = 𝑦 𝐹 ) 1. Order nodes’ parents before children 2. Initialise set 𝑇 empty; Repeat 1. Sample 𝒀 from joint 2. If 𝑌 𝐹 = 𝑦 𝐹 then add 𝑌 𝑅 to 𝑇 3. Return: Histogram of 𝑇 , normalising counts via divide by |𝑇| • Sampling++: Importance weighting, Gibbs, Metropolis-Hastings 9

Statistical Machine Learning (S2 2017) Deck 22 Alternate forms of probabilistic inference • Elimination algorithm produces single marginal • Sum-product algorithm on trees * 2x cost, supplies all marginals * Name: Marginalisation is just sum of product of tables * “Identical” variants: Max-product, for MAP estimation • In general these are message-passing algorithms * Can generalise beyond trees (beyond scope): junction tree algorithm, loopy belief propagation • Variational Bayes: approximation via optimisation 10

Statistical Machine Learning (S2 2017) Deck 22 Summary • Probabilistic inference on PGMs * What is it and why do we care? * Elimination algorithm; complexity via cliques * Monte Carlo approaches as alternate to exact integration 11

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM Probabilistic Inference Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference on PGMs Computing marginal and conditional distributions

Lecture 1. Introduction. Probability Theory COMP90051 Machine Learning Sem2 2017 Lecturer:

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

STUDY ON LONGITUDINAL DYNAMICS OF TRANSITION CROSSING FOR SIS-100 PROTON SLOW EXTRACTION

Probing Inflation and Reionization with Large-Scale CMB Polarization Vincius Miranda

QFT in curved spacetimes containing null-like boundaries and bulk to boundary correspondence

r s t str s

Strategies to Quantify Mercury Biomethylation Potential in Sediments Helen Hsu-Kim, D UKE U

Lecture 3.6: Normalizers Matthew Macauley Department of Mathematical Sciences Clemson University

ECAL status update. DUNE ND General Meeting Eldwan Brianne DESY Hamburg, 28 th November 2018

MERIT and Target Plans K.T. McDonald MAP Technology Division, L2 for Targets and Absorbers Muon