Probabilistic Graphical Models Lecture 10 Undirected Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Lecture 10 – Undirected Models CS/CNS/EE 155 Andreas Krause

Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half the work should be done 4 pages of writeup, NIPS format http://nips.cc/PaperInformation/StyleFiles 2

Markov Networks (a.k.a., Markov Random Field, Gibbs Distribution, …) A Markov Network consists of An undirected graph, where each node represents a RV A collection of factors defined over cliques in the graph X 1 Joint probability X 2 X 4 X 3 X 6 X 5 X 7 X 8 X 9 A distribution factorizes over undirected graph G if 3

Computing Joint Probabilities Computing joint probabilities in BNs Computing joint probabilities in Markov Nets 4

Local Markov Assumption for MN The Markov Blanket MB(X) of a node X is the set of neighbors of X X 1 Local Markov Assumption: X � EverythingElse | MB(X) X 2 X 4 I loc (G) = set of all local independences X 3 X 6 X 5 G is called an I-map of X 7 X 8 distribution P if I loc (G) � I(P) X 9 5

Factorization Theorem for Markov Nets “ � ” � � � � � � � � � � � � � � � � � � � � � � � �� True distribution P can be represented exactly as I loc (G) � I(P) a Markov net (G,P) G is an I-map of P (independence map) 6

Factorization Theorem for Markov Nets “  ” Hammersley-Clifford Theorem � � � � � � � � � � � � � � � � � � � � � � � � � �� True distribution P can be represented exactly as I loc (G) � I(P) i.e., P can be represented as G is an I-map of P a Markov net (G,P) (independence map) and P>0 7

Global independencies A trail X—X 1 —…—X m —Y is X 1 called active for evidence E, if none of X 1 ,…,X m � E Variables X and Y are called X 2 X 4 separated by E if there is no active trail for E connecting X, Y X 3 X 6 X 5 Write sep(X,Y | E ) X 7 X 8 I(G) = {X � Y | E : sep(X,Y| E )} X 9 8

Soundness of separation Know: For positive distributions P>0 I loc (G) � I(P) � P factorizes over G Theorem : Soundness of separation For positive distributions P>0 I loc (G) � I(P) � I(G) � I(P) Hence, separation captures only true independences How about I(G) = I(P)? 9

Completeness of separation Theorem : Completeness of separation I(G) = I(P) for “almost all” distributions P that factorize over G “almost all”: Except for of potential parameterizations of measure 0 (assuming no finite set have positive measure) 10

Minimal I-maps For BNs: Minimal I-map not unique E B E B J M A A E B J M J M A For MNs: For positive P, minimal I-map is unique!! 11

P-maps Do P-maps always exist? For BNs: no How about Markov Nets? 12

Exact inference in MNs Variable elimination and junction tree inference work exactly the same way! Need to construct junction trees by obtaining chordal graph through triangulation 13

Pairwise MNs A pairwise MN is a MN where all factors are defined over single variables or pairs of variables Can reduce any MN to pairwise MN! X 1 X 2 X 4 X 3 X 5 14

Logarithmic representation Can represent any positive distribution in log domain 15

Log-linear models Feature functions φ i (D) defined over cliques Log linear model over undirected graph G Feature functions φ 1 (D 1 ),…, φ k (D k ) Domains D i can overlap Set of weights w i learnt from data 16

Converting BNs to MNs C D I G S L J H Theorem : Moralized Bayes net is minimal Markov I-map 17

Converting MNs to BNs X 1 X 2 X 3 X 6 X 7 X 8 X 9 Theorem : Minimal Bayes I-map for MN must be chordal 18

So far Markov Network Representation Local/Global Markov assumptions; Separation Soundness and completeness of separation Markov Network Inference Variable elimination and Junction Tree inference work exactly as in Bayes Nets How about Learning Markov Nets? 19

Parameter Learning for Bayes nets 20

Algorithm for BN MLE 21

MLE for Markov Nets Log likelihood of the data 22

Log-likelihood doesn’t decompose Log likelihood l(D | θ ) is concave function! Log Partition function log Z( θ ) doesn’t decompose 23

Derivative of log-likelihood 24

Derivative of log-likelihood 25

Computing the derivative Derivative C D I G S L J H Computing P(c i | � ) requires inference! Can optimize using conjugate gradient etc. 26

Alternative approach: Iterative Proportional Fitting (IPF) At optimum, it must hold that � Solve fixed point equation Must recompute parameters every iteration 27

Parameter learning for log-linear models Feature functions � � (C i ) defined over cliques Log linear model over undirected graph G Feature functions � 1 (C 1 ),…, � k (C k ) Domains C i can overlap Joint distribution How do we get weights w i ? 28

Derivative of Log-likelihood 1 29

Derivative of Log-likelihood 2 30

Optimizing parameters Gradient of log-likelihood Thus, w is MLE � 31

Regularization of parameters Put prior on parameters w 32

Summary: Parameter learning in MN MLE in BN is easy (score decomposes) MLE in MN requires inference (score doesn’t decompose) Can optimize using gradient ascent or IPF 33

Tasks Read Koller & Friedman Chapters 20.1-20.3, 4.6.1 34

Probabilistic Graphical Models Lecture 10 Undirected Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half the work should be done 4 pages

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Undirected Graphical Models Undirected Graphs Chris Williams, School of Informatics, University

Markov Networks Asthma Cough Potential functions defined over cliques Smoking Cancer

Graphical Models Aarti Singh Slides Courtesy: Carlos Guestrin Machine Learning 10-701/15-781

Two types of GMs Directed edges give causality relationships ( Bayesian Network or Directed

Belief Propagation for Spatial Network Embeddings Andrew Frank Alex Ihler Padhraic Smyth

Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim aki Mikko

Advanced Machine Learning Introduction to Probabilistic Graphical Models Amit Sethi Electrical

Overview Filtering images MAP, Tikhonov and Poisson model of the noise A-priori and Markov