graphical models
play

Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org - PowerPoint PPT Presentation

Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org Office hours - after class in my office Marianas Labs Directed Graphical Models Brain & Brawn p (brain) = 0 . 1 p (sports) = 0 . 2 smart strong 0 1 0 0.1 0.8 1


  1. Example - PCA/ICA Latent Factors Observed Effects • Observed effects 
 Click behavior, queries, watched news, emails d ! d X Y y i v i , σ 2 1 x ∼ N and p ( y ) = p ( y i ) i =1 i =1

  2. Example - PCA/ICA Latent Factors Observed Effects • Observed effects 
 Click behavior, queries, watched news, emails d ! d X Y y i v i , σ 2 1 x ∼ N and p ( y ) = p ( y i ) i =1 i =1

  3. Example - PCA/ICA Latent Factors Observed Effects • Observed effects 
 Click behavior, queries, watched news, emails d ! d X Y y i v i , σ 2 1 x ∼ N and p ( y ) = p ( y i ) i =1 i =1

  4. Example - PCA/ICA Latent Factors Observed Effects • Observed effects 
 Click behavior, queries, watched news, emails d ! d X Y y i v i , σ 2 1 x ∼ N and p ( y ) = p ( y i ) i =1 i =1 • p(y) is Gaussian for PCA. General for ICA

  5. Cocktail party problem

  6. Recommender Systems u m r

  7. Recommender Systems u m r • Users u • Movies m • Ratings r (but only for a subset of users)

  8. Recommender Systems u m r ... intersecting plates ... (like nested FOR loops) • Users u • Movies m • Ratings r (but only for a subset of users)

  9. Recommender Systems news, SearchMonkey u m answers social ranking r ... intersecting plates ... OMG personals (like nested FOR loops) • Users u • Movies m • Ratings r (but only for a subset of users)

  10. Challenges engineering machine learning

  11. Challenges • How to design models engineering • Common (engineering) sense • Computational tractability machine learning

  12. Challenges • How to design models engineering • Common (engineering) sense • Computational tractability • Dependency analysis machine learning

  13. Challenges • How to design models engineering • Common (engineering) sense • Computational tractability • Dependency analysis machine learning • Inference • Easy for fully observed situations • Many algorithms if not fully observed • Dynamic programming / message passing

  14. Summary • Repeated structure - encode with plate • Chains, bipartite graphs, etc (more later) • Plates can intersect • Not all variables are observed Y Θ p ( X, θ ) = p ( θ ) p ( x i | θ ) Θ i x1 x2 x3 x4 xi

  15. Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 x 0 x 1 x 2 Transition Matrices 0 1 0 1 0 1 x 0 x 2 x 3 0 0.4 x1 0 0.2 0.1 0 0.8 0.5 0 0 1 1 0.6 1 0.8 0.9 1 0.2 0.5 1 1 0 Unraveling the chain X p ( x 1 ) = p ( x 1 | x 0 ) p ( x 0 ) ⇐ ⇒ π 1 = Π 0 → 1 π 0 x 0 X p ( x 2 ) = p ( x 2 | x 1 ) p ( x 1 ) ⇐ ⇒ π 2 = Π 1 → 2 π 1 = Π 1 → 2 Π 0 → 1 π 0 x 1

  16. Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 • From the start - sum sequentially i − 1 X Y p ( x i | x 1 ) = p ( x l +1 | x l ) · p ( x 2 | x 1 ) | {z } x j :1 <j<i l =2 =: l 2 ( x 2 ) i − 1 X Y X = p ( x l +1 | x l ) · p ( x 3 | x 2 ) l 2 ( x 2 ) x j :2 <j<i x 2 l =3 | {z } =: l 3 ( x 3 ) i − 1 X Y X = p ( x l +1 | x l ) · p ( x 4 | x 3 ) l 3 ( x 3 ) x j :3 <j<i l =4 x 3 | {z } =: l 4 ( x 4 )

  17. 
 Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 x 0 x 1 x 2 Transition Matrices 0 1 0 1 0 1 x 0 x 2 x 3 0 0.4 x1 0 0.2 0.1 0 0.8 0.5 0 0 1 1 0.6 1 0.8 0.9 1 0.2 0.5 1 1 0 Unraveling the chain only need matrix-vector x0 = [0.4; 0.6]; 
 Pi1 = [0.2 0.1; 0.8 0.9]; 
 Pi2 = [0.8 0.5; 0.2 0.5]; 
 Pi3 = [0 1; 1 0]; 
 x3 = Pi3 * Pi2 * Pi1 * x0 = [0.45800; 0.54200]

  18. Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 • From the end - sum sequentially n − 1 normalize in X Y p ( x 1 | x n ) ∝ p ( x l +1 | x l ) · 1 |{z} the end x j :1 <j<n l =1 =: r n ( x n ) n − 2 X Y X = p ( x l +1 | x l ) · p ( x n | x n − 1 ) r n ( x n ) x j :1 <j<n − 1 l =1 x n | {z } =: r n − 1 ( x n − 1 ) n − 3 X Y X = p ( x l +1 | x l ) · p ( x n − 1 | x n − 2 ) r n − 1 ( x n − 1 ) x j :1 <j<n − 2 l =1 x n − 1 | {z } =: r n − 2 ( x n − 2 )

  19. Example - inferring lunch current • Initial probability 
 p(x 0 =t)=p(x 0 =b) = 0.5 • Stationary transition matrix • On fifth day observed at Tazza d’oro p(x 5 =t)=1 0.9 0.2 • Distribution on day 3 • Left messages to 3 0.1 0.8 • Right messages to 3 • Renormalize

  20. Example - inferring lunch current > Pi = [0.9, 0.2; 0.1 0.8] Pi = 0.90000 0.20000 0.10000 0.80000 > l1 = [0.5; 0.5]; > l3 = Pi * Pi * l1 l3 = 0.58500 0.9 0.2 0.41500 > r5 = [1; 0]; > r3 = Pi' * Pi' * r5 r3 = 0.83000 0.1 0.8 0.34000 > (l3 .* r3) / sum(l3 .* r3) ans = 0.77483 0.22517

  21. 
 Message Passing l i = Π i l i � 1 x 0 x 1 x 2 x 3 x 4 x 5 r i = Π > i r i +1 • Send forward messages starting from left node 
 X m i − 1 → i ( x i ) = m i − 2 → i − 1 ( x i − 1 ) f ( x i − 1 , x i ) x i − 1 • Send backward messages starting from right node X m i +1 → i ( x i ) = m i +2 → i +1 ( x i +1 ) f ( x i , x i +1 ) x i +1

  22. Higher Order Markov Chains • First order chain x 0 x 1 x 2 x 3 Y p ( X ) = p ( x 0 ) p ( x i +1 | x i ) i • Second order x 0 x 1 x 2 x 3 Y p ( X ) = p ( x 0 , x 1 ) p ( x i +1 | x i , x i − 1 ) i

  23. Higher Order Markov Chains • First order chain x 0 x 1 x 2 x 3 Y p ( X ) = p ( x 0 ) p ( x i +1 | x i ) i • Second order x 0 x 1 x 2 x 3 Y p ( X ) = p ( x 0 , x 1 ) p ( x i +1 | x i , x i − 1 ) i

  24. Trees x3 x4 x5 x0 x1 x2 x7 x8 x6 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ...

  25. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  26. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  27. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  28. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  29. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  30. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  31. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

  32. Junction Template • Order of computation • Dependence does not matter 
 (only matters for parametrization) X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 4 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 3 out in 1 2 i n 4

  33. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  34. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  35. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  36. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  37. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  38. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  39. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  40. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  41. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  42. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  43. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  44. Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

  45. Summary • Markov chains • Present only depends on recent past • Higher order - longer history. • Dynamic programming • Exponential if brute force. • Linear in chain if we iterate. • For junctions treat like chains but 
 3 integrate signals from all sources. out in • Exponential in the history size. 1 2 i n 4

  46. Hidden Markov Models

  47. Clustering and 
 Hidden Markov Models x 1 x 2 x 3 x 4 x m x i x i+1 ... y 1 y 2 y 3 y 4 y m y i i=1..m x 1 x 2 x 3 x 4 x m x i x i+1 ... y 1 y 2 y 3 y 4 y m y i i=1..m • Clustering - no dependence between observations • Hidden Markov Model - dependence between states

  48. Applications x 1 x 2 x 3 x 4 x m x i x i+1 ... y 1 y 2 y 3 y 4 y m y i i=1..m • Speech recognition (sound|text) • Optical character recognition (writing|text) • Gene finding (DNA sequence|genes) • Activity recognition (accelerometer|activity)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend