Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org - PowerPoint PPT Presentation

Example - PCA/ICA Latent Factors Observed Effects • Observed effects   Click behavior, queries, watched news, emails d ! d X Y y i v i , σ 2 1 x ∼ N and p ( y ) = p ( y i ) i =1 i =1

Example - PCA/ICA Latent Factors Observed Effects • Observed effects   Click behavior, queries, watched news, emails d ! d X Y y i v i , σ 2 1 x ∼ N and p ( y ) = p ( y i ) i =1 i =1 • p(y) is Gaussian for PCA. General for ICA

Cocktail party problem

Recommender Systems u m r

Recommender Systems u m r • Users u • Movies m • Ratings r (but only for a subset of users)

Recommender Systems u m r ... intersecting plates ... (like nested FOR loops) • Users u • Movies m • Ratings r (but only for a subset of users)

Recommender Systems news, SearchMonkey u m answers social ranking r ... intersecting plates ... OMG personals (like nested FOR loops) • Users u • Movies m • Ratings r (but only for a subset of users)

Challenges engineering machine learning

Challenges • How to design models engineering • Common (engineering) sense • Computational tractability machine learning

Challenges • How to design models engineering • Common (engineering) sense • Computational tractability • Dependency analysis machine learning

Challenges • How to design models engineering • Common (engineering) sense • Computational tractability • Dependency analysis machine learning • Inference • Easy for fully observed situations • Many algorithms if not fully observed • Dynamic programming / message passing

Summary • Repeated structure - encode with plate • Chains, bipartite graphs, etc (more later) • Plates can intersect • Not all variables are observed Y Θ p ( X, θ ) = p ( θ ) p ( x i | θ ) Θ i x1 x2 x3 x4 xi

Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 x 0 x 1 x 2 Transition Matrices 0 1 0 1 0 1 x 0 x 2 x 3 0 0.4 x1 0 0.2 0.1 0 0.8 0.5 0 0 1 1 0.6 1 0.8 0.9 1 0.2 0.5 1 1 0 Unraveling the chain X p ( x 1 ) = p ( x 1 | x 0 ) p ( x 0 ) ⇐ ⇒ π 1 = Π 0 → 1 π 0 x 0 X p ( x 2 ) = p ( x 2 | x 1 ) p ( x 1 ) ⇐ ⇒ π 2 = Π 1 → 2 π 1 = Π 1 → 2 Π 0 → 1 π 0 x 1

Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 • From the start - sum sequentially i − 1 X Y p ( x i | x 1 ) = p ( x l +1 | x l ) · p ( x 2 | x 1 ) | {z } x j :1 <j<i l =2 =: l 2 ( x 2 ) i − 1 X Y X = p ( x l +1 | x l ) · p ( x 3 | x 2 ) l 2 ( x 2 ) x j :2 <j<i x 2 l =3 | {z } =: l 3 ( x 3 ) i − 1 X Y X = p ( x l +1 | x l ) · p ( x 4 | x 3 ) l 3 ( x 3 ) x j :3 <j<i l =4 x 3 | {z } =: l 4 ( x 4 )

  Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 x 0 x 1 x 2 Transition Matrices 0 1 0 1 0 1 x 0 x 2 x 3 0 0.4 x1 0 0.2 0.1 0 0.8 0.5 0 0 1 1 0.6 1 0.8 0.9 1 0.2 0.5 1 1 0 Unraveling the chain only need matrix-vector x0 = [0.4; 0.6];   Pi1 = [0.2 0.1; 0.8 0.9];   Pi2 = [0.8 0.5; 0.2 0.5];   Pi3 = [0 1; 1 0];   x3 = Pi3 * Pi2 * Pi1 * x0 = [0.45800; 0.54200]

Markov Chains n − 1 Y x 0 x 1 x 2 x 3 p ( x ; θ ) = p ( x 0 ; θ ) p ( x i +1 | x i ; θ ) i =1 • From the end - sum sequentially n − 1 normalize in X Y p ( x 1 | x n ) ∝ p ( x l +1 | x l ) · 1 |{z} the end x j :1 <j<n l =1 =: r n ( x n ) n − 2 X Y X = p ( x l +1 | x l ) · p ( x n | x n − 1 ) r n ( x n ) x j :1 <j<n − 1 l =1 x n | {z } =: r n − 1 ( x n − 1 ) n − 3 X Y X = p ( x l +1 | x l ) · p ( x n − 1 | x n − 2 ) r n − 1 ( x n − 1 ) x j :1 <j<n − 2 l =1 x n − 1 | {z } =: r n − 2 ( x n − 2 )

Example - inferring lunch current • Initial probability   p(x 0 =t)=p(x 0 =b) = 0.5 • Stationary transition matrix • On fifth day observed at Tazza d’oro p(x 5 =t)=1 0.9 0.2 • Distribution on day 3 • Left messages to 3 0.1 0.8 • Right messages to 3 • Renormalize

Example - inferring lunch current > Pi = [0.9, 0.2; 0.1 0.8] Pi = 0.90000 0.20000 0.10000 0.80000 > l1 = [0.5; 0.5]; > l3 = Pi * Pi * l1 l3 = 0.58500 0.9 0.2 0.41500 > r5 = [1; 0]; > r3 = Pi' * Pi' * r5 r3 = 0.83000 0.1 0.8 0.34000 > (l3 .* r3) / sum(l3 .* r3) ans = 0.77483 0.22517

  Message Passing l i = Π i l i � 1 x 0 x 1 x 2 x 3 x 4 x 5 r i = Π > i r i +1 • Send forward messages starting from left node   X m i − 1 → i ( x i ) = m i − 2 → i − 1 ( x i − 1 ) f ( x i − 1 , x i ) x i − 1 • Send backward messages starting from right node X m i +1 → i ( x i ) = m i +2 → i +1 ( x i +1 ) f ( x i , x i +1 ) x i +1

Higher Order Markov Chains • First order chain x 0 x 1 x 2 x 3 Y p ( X ) = p ( x 0 ) p ( x i +1 | x i ) i • Second order x 0 x 1 x 2 x 3 Y p ( X ) = p ( x 0 , x 1 ) p ( x i +1 | x i , x i − 1 ) i

Trees x3 x4 x5 x0 x1 x2 x7 x8 x6 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ...

Trees x 3 x 4 x 5 x 0 x 1 x 2 x 7 x 8 x 6 X X l 1 ( x 1 ) = p ( x 0 ) p ( x 1 | x 0 ) r 7 ( x 7 ) = p ( x 8 | x 7 ) x 0 x 8 X X l 2 ( x 2 ) = l 1 ( x 1 ) p ( x 2 | x 1 ) r 6 ( x 6 ) = r 7 ( x 7 ) p ( x 7 | x 6 ) x 1 x 7 X r 2 ( x 2 ) = r 6 ( x 6 ) p ( x 6 | x 2 ) x 6 X l 3 ( x 3 ) = l 2 ( x 2 ) p ( x 3 | x 2 ) r 2 ( x 2 ) x 2 . . .

Junction Template • Order of computation • Dependence does not matter   (only matters for parametrization) X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 4 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 3 out in 1 2 i n 4

Trees x 3 x 4 x 5 x 0 x 1 x 2 x 6 x 7 x 8 • Forward/Backward messages as normal for chain • When we have more edges for a vertex use ... X m 2 → 3 ( x 3 ) = m 1 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 2 , x 3 ) x 2 X m 2 → 6 ( x 6 ) = m 1 → 2 ( x 2 ) m 3 → 2 ( x 2 ) f ( x 2 , x 6 ) x 2 X m 2 → 1 ( x 1 ) = m 3 → 2 ( x 2 ) m 6 → 2 ( x 2 ) f ( x 1 , x 2 ) x 2

Summary • Markov chains • Present only depends on recent past • Higher order - longer history. • Dynamic programming • Exponential if brute force. • Linear in chain if we iterate. • For junctions treat like chains but   3 integrate signals from all sources. out in • Exponential in the history size. 1 2 i n 4

Hidden Markov Models

Clustering and   Hidden Markov Models x 1 x 2 x 3 x 4 x m x i x i+1 ... y 1 y 2 y 3 y 4 y m y i i=1..m x 1 x 2 x 3 x 4 x m x i x i+1 ... y 1 y 2 y 3 y 4 y m y i i=1..m • Clustering - no dependence between observations • Hidden Markov Model - dependence between states

Applications x 1 x 2 x 3 x 4 x m x i x i+1 ... y 1 y 2 y 3 y 4 y m y i i=1..m • Speech recognition (sound|text) • Optical character recognition (writing|text) • Gene finding (DNA sequence|genes) • Activity recognition (accelerometer|activity)

Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org - PowerPoint PPT Presentation

Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org Office hours - after class in my office Marianas Labs Directed Graphical Models Brain & Brawn p (brain) = 0 . 1 p (sports) = 0 . 2 smart strong 0 1 0 0.1 0.8 1

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Hypernudge: Big Data as a Mode of Regulation by Design Karen Yeung Professor of Law, Director,

Taylor Models and Their Applications Martin Berz and Kyoko Makino Department of Physics and

1 Convolution Convolution is an important operation in signal and image processing. Convolution

into 20/20: Perfect Vision or Blurred Uncertainty? Spring Spring 2019 Dr Angus McIntosh 2019

Unequal Gains, Prolonged Pain A Model of Protectionist Overshooting and Escalation Emily J.

Optical Cluster Beam Studies & Production of Laval Nozzles Silke Grieser Westflische

EMPIRICISM & EMPIRICAL PHILOSOPHY One of the most remarkable features of the

Analysis of Security APIs (part I) Riccardo Focardi Universit` a Ca Foscari di Venezia,

Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org - PowerPoint PPT Presentation

Graphical Models 10-715 Fall 2015 Alexander Smola alex@smola.org Office hours - after class in my office Marianas Labs Directed Graphical Models Brain & Brawn p (brain) = 0 . 1 p (sports) = 0 . 2 smart strong 0 1 0 0.1 0.8 1

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Hypernudge: Big Data as a Mode of Regulation by Design Karen Yeung Professor of Law, Director,

Taylor Models and Their Applications Martin Berz and Kyoko Makino Department of Physics and

1 Convolution Convolution is an important operation in signal and image processing. Convolution

into 20/20: Perfect Vision or Blurred Uncertainty? Spring Spring 2019 Dr Angus McIntosh 2019

Unequal Gains, Prolonged Pain A Model of Protectionist Overshooting and Escalation Emily J.

Optical Cluster Beam Studies &amp; Production of Laval Nozzles Silke Grieser Westflische

EMPIRICISM &amp; EMPIRICAL PHILOSOPHY One of the most remarkable features of the

Analysis of Security APIs (part I) Riccardo Focardi Universit` a Ca Foscari di Venezia,

Graphical Models Graphical Models Relationship between the directed & undirected models

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Optical Cluster Beam Studies & Production of Laval Nozzles Silke Grieser Westflische

EMPIRICISM & EMPIRICAL PHILOSOPHY One of the most remarkable features of the