Probabilistic Graphical Models David Sontag New York University - PowerPoint PPT Presentation

Probabilistic Graphical Models David Sontag New York University Lecture 2, February 2, 2012 David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 1 / 36

Bayesian networks Reminder of last lecture A Bayesian network is specified by a directed acyclic graph G = ( V , E ) with: One node i ∈ V for each random variable X i 1 One conditional probability distribution (CPD) per node, p ( x i | x Pa ( i ) ), 2 specifying the variable’s probability conditioned on its parents’ values Corresponds 1-1 with a particular factorization of the joint distribution: � p ( x i | x Pa ( i ) ) p ( x 1 , . . . x n ) = i ∈ V Powerful framework for designing algorithms to perform probability computations David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 2 / 36

Example Consider the following Bayesian network: d 0 d 1 i 0 i 1 0.7 0.3 0.6 0.4 Difficulty Intelligence g 1 g 2 g 3 Grade SAT i 0 , d 0 0.3 0.4 0.3 i 0 , d 1 0.05 0.25 0.7 s 0 s 1 i 0 , d 0 0.9 0.08 0.02 Letter i 0 , d 1 i 0 0.5 0.3 0.2 0.95 0.05 i 1 0.2 0.8 l 0 l 1 g 1 0.1 0.9 g 2 0.4 0.6 g 2 0.99 0.01 What is its joint distribution? � p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V p ( d , i , g , s , l ) = p ( d ) p ( i ) p ( g | i , d ) p ( s | i ) p ( l | g ) David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 3 / 36

D-separation (“directed separated”) in Bayesian networks Algorithm to calculate whether X ⊥ Z | Y by looking at graph separation Look to see if there is active path between X and Y when variables Y are observed: X Y Z X Y Z (a) (b) If no such path, then X and Z are d-separated with respect to Y d-separation reduces statistical independencies (hard) to connectivity in graphs (easy) Important because it allows us to quickly prune the Bayesian network, finding just the relevant variables for answering a query David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 4 / 36

Independence maps Let I ( G ) be the set of all conditional independencies implied by the directed ayclic graph (DAG) G Let I ( p ) denote the set of all conditional independencies that hold for the joint distribution p . A DAG G is an I-map (independence map) of a distribution p if I ( G ) ⊆ I ( p ) A fully connected DAG G is an I-map for any distribution, since I ( G ) = ∅ ⊆ I ( p ) for all p G is a minimal I-map for p if the removal of even a single edge makes it not an I-map A distribution may have several minimal I-maps Each corresponds to a specific node-ordering G is a perfect map (P-map) for distribution p if I ( G ) = I ( p ) David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 5 / 36

Equivalent structures Different Bayesian network structures can be equivalent in that they encode precisely the same conditional independence assertions (and thus the same distributions) Which of these are equivalent? X Y Z Z Z X Y Y X X Y Z (a) (b) (c) (d) David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 6 / 36

Equivalent structures Different Bayesian network structures can be equivalent in that they encode precisely the same conditional independence assertions (and thus the same distributions) Which of these are equivalent? V X Z V X Z W Y W Y A causal network is a Bayesian network with an explicit requirement that the relationships be causal Bayesian networks are not the same as causal networks David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 7 / 36

What are some frequently used graphical models? David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 8 / 36

Quick Medical Reference (decision theoretic) (Miller et al. ’86, Shwe et al. ’91) !"#$%#$#& diseases d 1 d n f 1 f m '(!"()#& findings Joint distribution factors as p ( f , d ) = � j p ( d j ) � i p ( f i | d ) p ( d j = 1) is the prior probability of having disease j Model assumes the following independencies: d i ⊥ d j , f i ⊥ f j | d Common findings can be caused by hundreds of diseases – too many parameters required to specify the CPD p ( f i | d ) as a table David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 9 / 36

Quick Medical Reference (decision theoretic) (Miller et al. ’86, Shwe et al. ’91) !"#$%#$#& diseases d 1 d n f 1 f m '(!"()#& findings Instead, we use a noisy-or parameterization : � (1 − q ij ) d j p ( f i = 0 | d ) = (1 − q i 0 ) j ∈ Pa ( i ) q ij = p ( f i = 1 | d j = 1) is the probability that the disease j , if present, could alone cause the finding to have a positive outcome q i 0 = p ( f i = 1 | L ) is the “leak” probability – the probability that the finding is caused by something other than the diseases in the model David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 10 / 36

Hidden Markov models Y5 Y6 Y2 Y3 Y4 Y1 X1 X2 X3 X4 X5 X6 Frequently used for speech recognition and part-of-speech tagging Joint distribution factors as: T � p ( y , x ) = p ( y 1 ) p ( x 1 | y 1 ) p ( y t | y t − 1 ) p ( x t | y t ) t =2 p ( y 1 ) is the distribution for the starting state p ( y t | y t − 1 ) is the transition probability between any two states p ( x t | y t ) is the emission probability What are the conditional independencies here? For example, Y 1 ⊥ { Y 3 , . . . , Y 6 } | Y 2 David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 11 / 36

Hidden Markov models Y6 Y3 Y4 Y5 Y1 Y2 X1 X2 X3 X4 X5 X6 Joint distribution factors as: T � p ( y , x ) = p ( y 1 ) p ( x 1 | y 1 ) p ( y t | y t − 1 ) p ( x t | y t ) t =2 A homogeneous HMM uses the same parameters ( β and α below) for each transition and emission distribution ( parameter sharing ): T � p ( y , x ) = p ( y 1 ) α x 1 , y 1 β y t , y t − 1 α x t , y t t =2 How many parameters need to be learned? David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 12 / 36

Mixture of Gaussians The N -dim. multivariate normal distribution, N ( µ, Σ), has density: 1 − 1 � � 2( x − µ ) T Σ − 1 ( x − µ ) p ( x ) = (2 π ) N / 2 | Σ | 1 / 2 exp Suppose we have k Gaussians given by µ k and Σ k , and a distribution θ over the numbers 1 , . . . , k Mixture of Gaussians distribution p ( y , x ) given by Sample y ∼ θ (specifies which Gaussian to use) 1 Sample x ∼ N ( µ y , Σ y ) 2 David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 13 / 36

Mixture of Gaussians The marginal distribution over x looks like: David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 14 / 36

Latent Dirichlet allocation (LDA) Topic models are powerful tools for exploring large data sets and for making inferences about the content of documents !"#$%&'() *"+,#) .&/,0,"'1 )+".()1 +"/,9#)1 65)&65//1 +.&),3&'(1 2,'3$1 )"##&.1 "65%51 4$3,5)%1 &(2,#)1 65)7&(65//1 :5)2,'0("'1 6$332,)%1 8""(65//1 .&/,0,"'1 - - - Many applications in information retrieval, document summarization, and classification New+document+ What+is+this+document+about?+ weather+ .50+ finance+ .49+ sports+ .01+ Words+w 1 ,+…,+w N+ Distribu6on+of+topics + θ LDA is one of the simplest and most widely used topic models David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 15 / 36

Generative model for a document in LDA 1 Sample the document’s topic distribution θ (aka topic vector) θ ∼ Dirichlet ( α 1: T ) where the { α t } T t =1 are fixed hyperparameters. Thus θ is a distribution over T topics with mean θ t = α t / � t ′ α t ′ 2 For i = 1 to N , sample the topic z i of the i ’th word z i | θ ∼ θ 3 ... and then sample the actual word w i from the z i ’th topic w i | z i , ... ∼ β z i where { β t } T t =1 are the topics (a fixed collection of distributions on words) David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 16 / 36

Generative model for a document in LDA 1 Sample the document’s topic distribution θ (aka topic vector) θ ∼ Dirichlet ( α 1: T ) where the { α t } T t =1 are hyperparameters.The Dirichlet density is: T � θ α t − 1 p ( θ 1 , . . . , θ T ) ∝ t t =1 α 1 = α 2 = α 1 = α 2 = log Pr( θ ) log Pr( θ ) θ 1 θ 2 θ 1 θ 2 David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 17 / 36

Generative model for a document in LDA 3 ... and then sample the actual word w i from the z i ’th topic w i | z i , ... ∼ β z i where { β t } T t =1 are the topics (a fixed collection of distributions on words) Documents+ Topics+ poli6cs+.0100+ religion+.0500+ sports+.0105+ president+.0095+ hindu+.0092+ baseball+.0100+ obama+.0090+ judiasm+.0080+ soccer+.0055+ washington+.0085+ ethics+.0075+ basketball+.0050+ religion+.0060+ buddhism+.0016+ football+.0045+ …+ …+ …+ � � β t = p ( w | z = t ) David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 18 / 36 θ

Example of using LDA Topic proportions and Topics Documents assignments gene 0.04 β 1 dna 0.02 z 1 d genetic 0.01 .,, θ d life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... z Nd data 0.02 number 0.02 β T computer 0.01 .,, (Blei, Introduction to Probabilistic Topic Models , 2011) David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 19 / 36

Probabilistic Graphical Models David Sontag New York University - PowerPoint PPT Presentation

Probabilistic Graphical Models David Sontag New York University Lecture 2, February 2, 2012 David Sontag (NYU) Graphical Models Lecture 2, February 2, 2012 1 / 36 Bayesian networks Reminder of last lecture A Bayesian network is specified by

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Human-in-the-Loop Interpretability Prior Isaac Lage 1 , Andrew Slavin Ross 1 , Been Kim 2 , Samuel

Probabilistic Graphical Models Lecture 3 Bayesian Networks Semantics CS/CNS/EE 155 Andreas

Planning.Maryland.gov Planning.Maryland.gov E XAMPLE U SES THE L AND U SE M AP Planning Activity

Discrete Laplace-Darboux sequences, Menelaus theorem and the pentagram map by W.K. Schief

for mapping environments Joo F. Henriques, Andrea Vedaldi Visual Geometry Group Motivation

Supercompilation for Haskell Neil Mitchell, Colin Runciman www.cs.york.ac.uk/~ndm/supero The

Punctured logarithmic maps and punctured invariants Dan Abramovich, Brown University Work with

Notes: Introduction of the tools used in GIS for Emergency Management and Fire Incident Mapping.

Sambuz

Useful Links

Newsletter

Mail Us