Expectation Maximization [KF Chapter 19] CS 786 University of - PDF document

Expectation Maximization [KF Chapter 19] CS 786 University of Waterloo Lecture 17: June 28, 2012 Incomplete data • Complete data – Values of all attributes are known – Learning is relatively easy • But many real-world problems have hidden variables (a.k.a latent variables) – Incomplete data – Values of some attributes missing 2 CS786 Lecture Slides (c) 2012 P. Poupart 1

Unsupervised Learning • Incomplete data  unsupervised learning • Examples: – Categorisation of stars by astronomers – Categorisation of species by anthropologists – Market segmentation for marketing – Pattern identification for fraud detection – Research in general! 3 CS786 Lecture Slides (c) 2012 P. Poupart Maximum Likelihood Learning • ML learning of Bayes net parameters: – For  V=true,pa(V)= v = Pr(V=true|par(V) = v ) –  V=true,pa(V)= v = #[V=true,pa(V)= v ] #[V=true,pa(V)= v ] + #[V=false,pa(V)= v ] – Assumes all attributes have values… • What if values of some attributes are missing? 4 CS786 Lecture Slides (c) 2012 P. Poupart 2

“Naive” solutions for incomplete data • Solution #1: Ignore records with missing values – But what if all records are missing values (i.e., when a variable is hidden, none of the records have any value for that variable) • Solution #2: Ignore hidden variables – Model may become significantly more complex! 5 CS786 Lecture Slides (c) 2012 P. Poupart Heart disease example 2 2 2 2 2 2 Smoking Diet Exercise Smoking Diet Exercise 54 HeartDisease 6 6 6 54 162 486 Symptom 1 Symptom 2 Symptom 3 Symptom 1 Symptom 2 Symptom 3 (b) (a) • a) simpler (i.e., fewer CPT parameters) • b) complex (i.e., lots of CPT parameters) 6 CS786 Lecture Slides (c) 2012 P. Poupart 3

“Direct” maximum likelihood • Solution 3: maximize likelihood directly – Let Z be hidden and E observable – h ML = argmax h P( e |h) = argmax h Σ Z P( e , Z |h) = argmax h Σ Z  i CPT(V i ) = argmax h log Σ Z  i CPT(V i ) – Problem: can’t push log past sum to linearize product 7 CS786 Lecture Slides (c) 2012 P. Poupart Expectation-Maximization (EM) • Solution #4: EM algorithm – Intuition: if we knew the missing values, computing h ML would be trival • Guess h ML • Iterate – Expectation: based on h ML , compute expectation of the missing values – Maximization: based on expected missing values, compute new estimate of h ML 8 CS786 Lecture Slides (c) 2012 P. Poupart 4

Expectation-Maximization (EM) • Objective: max h Σ Z P(Z|e,h) log P( e,Z |h) • Iterative approach h i+1 = argmax h Σ Z P( Z | e ,h i ) log P( e , Z |h) • Convergence guaranteed h ∞ = argmax h Σ Z P( Z | e ,h) log P( e , Z |h) • Monotonic improvement of likelihood P( e |h i+1 )  P( e |h i ) 11 CS786 Lecture Slides (c) 2012 P. Poupart Optimization Step • For one data point e: h i+1 = argmax h Σ Z P( Z |h i , e ) log P( e , Z |h) • For multiple data points: h i+1 = argmax h Σ e n e Σ Z P( Z |h i , e ) log P( e , Z |h) Where n e is frequency of e in dataset • Compare to ML for complete data h* = argmax h Σ d n d log P( d |h) 12 CS786 Lecture Slides (c) 2012 P. Poupart 6

Optimization Solution • Since d  <z,e> • Let n d = n e P( z |h i , e )  expected frequency • Similar to the complete data case, the optimal parameters are obtained by setting the derivative to 0, which yields relative expected frequencies • E.g.  V,pa(V) = P(V|pa(V)) = n V,pa(V) / n pa(V) 13 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Suppose you buy two bags of candies of unknown type (e.g. flavour ratios) • You plan to eat sufficiently many candies of each bag to learn their type • Ignoring your plan, your roommate mixes both bags… • How can you learn the type of each bag despite being mixed? 14 CS786 Lecture Slides (c) 2012 P. Poupart 7

Candy Example • “Bag” variable is hidden 15 CS786 Lecture Slides (c) 2012 P. Poupart Unsupervised Clustering • “Class” variable is hidden • Naïve Bayes model P ( Bag= 1) Bag C Bag P ( F=cherry | B ) 1 F 1 2 F 2 Flavor Wrapper Holes X (a) (b) 16 CS786 Lecture Slides (c) 2012 P. Poupart 8

Candy Example • Unknown Parameters: –  i = P(Bag=i) –  Fi = P(Flavour=cherry|Bag=i) –  Wi = P(Wrapper=red|Bag=i) –  Hi = P(Hole=yes|Bag=i) • When eating a candy: – F, W and H are observable – B is hidden 17 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Let true parameters be: –  =0.5,  F1 =  W1 =  H1 =0.8,  F2 =  W2 =  H2 =0.3 • After eating 1000 candies: W=red W=green H=1 H=0 H=1 H=0 F=cherry 273 93 104 90 F=lime 79 100 94 167 18 CS786 Lecture Slides (c) 2012 P. Poupart 9

Candy Example • EM algorithm • Guess h 0 : –  =0.6,  F1 =  W1 =  H1 =0.6,  F2 =  W2 =  H2 =0.4 • Alternate: – Expectation: expected # of candies in each bag – Maximization: new parameter estimates 19 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Expectation: expected # of candies in each bag – #[Bag=i] = Σ j P(B=i|f j ,w j ,h j ) – Compute P(B=i|f j ,w j ,h j ) by variable elimination (or any other inference alg.) • Example: – #[Bag=1] = 612 – #[Bag=2] = 388 20 CS786 Lecture Slides (c) 2012 P. Poupart 10

Candy Example • Maximization: relative frequency of each bag –  1 = 612/1000 = 0.612 –  2 = 388/1000 = 0.388 21 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Expectation: expected # of cherry candies in each bag – #[B=i,F=cherry] = Σ j P(B=i|f j =cherry,w j ,h j ) – Compute P(B=i|f j =cherry,w j ,h j ) by variable elimination (or any other inference alg.) • Maximization: –  F 1 = #[B=1,F=cherry] / #[B=1] = 0.668 –  F 2 = #[B=2,F=cherry] / #[B=2] = 0.389 22 CS786 Lecture Slides (c) 2012 P. Poupart 11

Candy Example -1975 -1980 -1985 -1990 Log-likelihood -1995 -2000 -2005 -2010 -2015 -2020 -2025 0 20 40 60 80 100 120 Iteration number 23 CS786 Lecture Slides (c) 2012 P. Poupart Bayesian networks • EM algorithm for general Bayes nets • Expectation: – #[V i =v ij ,Pa(V i )=pa ik ] = expected frequency • Maximization: –  vij,paik = #[V i =v ij ,Pa(V i )=pa ik ] / #[Pa(V i )=pa ik ] 24 CS786 Lecture Slides (c) 2012 P. Poupart 12

Expectation Maximization [KF Chapter 19] CS 786 University of - PDF document

Expectation Maximization [KF Chapter 19] CS 786 University of Waterloo Lecture 17: June 28, 2012 Incomplete data Complete data Values of all attributes are known Learning is relatively easy But many real-world problems have

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Statistical Machine Learning Lecture 06 Extra: Expectation Maximization Kristian Kersting TU

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

On the dual problem of utility maximization Yiqing LIN Joint work with L. GU and J. YANG

Generalized Majorization-Minimization Sobhan Naderi Kun He Reza Aghajani Stan

Variational denoising for manifold-valued data Andreas Weinmann Helmholtz Center Munich & TU

and how to reverse it Almsgiving is Mammons perversion of giving. It affirms the superiority

Today Finish up Conditional Expectation. Markov Chains. Application: Mixing Each step, pick

Mathematical Foundations for Finance Exercise 1 Martin Stefanik ETH Zurich Which Exercise Class

10-601B Recitation 1 Calvin McCarter September 3, 2015 1 Probability 1.1 Linearity of

Variational Mean Field Variational Mean Field for Graphical Models for Graphical Models

Probability and Random Processes Lecture 7 Conditional probability and expectation

Expectation Maximization [KF Chapter 19] CS 786 University of - PDF document

Expectation Maximization [KF Chapter 19] CS 786 University of Waterloo Lecture 17: June 28, 2012 Incomplete data Complete data Values of all attributes are known Learning is relatively easy But many real-world problems have

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Statistical Machine Learning Lecture 06 Extra: Expectation Maximization Kristian Kersting TU

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

On the dual problem of utility maximization Yiqing LIN Joint work with L. GU and J. YANG

Generalized Majorization-Minimization Sobhan Naderi Kun He Reza Aghajani Stan

Variational denoising for manifold-valued data Andreas Weinmann Helmholtz Center Munich &amp; TU

and how to reverse it Almsgiving is Mammons perversion of giving. It affirms the superiority

Today Finish up Conditional Expectation. Markov Chains. Application: Mixing Each step, pick

Mathematical Foundations for Finance Exercise 1 Martin Stefanik ETH Zurich Which Exercise Class

10-601B Recitation 1 Calvin McCarter September 3, 2015 1 Probability 1.1 Linearity of

Variational Mean Field Variational Mean Field for Graphical Models for Graphical Models

Probability and Random Processes Lecture 7 Conditional probability and expectation

Variational denoising for manifold-valued data Andreas Weinmann Helmholtz Center Munich & TU