graphical models
play

Graphical Models L eon Bottou COS 424 4/15/2010 Introduction - PowerPoint PPT Presentation

Graphical Models L eon Bottou COS 424 4/15/2010 Introduction People like drawings better than equations A graphical model is a diagram representing certain aspects of the algebraic structure of a probabilistic model. Purposes


  1. Graphical Models L´ eon Bottou COS 424 – 4/15/2010

  2. Introduction People like drawings better than equations – A graphical model is a diagram representing certain aspects of the algebraic structure of a probabilistic model. Purposes – Visualize the structure of a model. – Investigate conditional independence properties. – Some computations are more easily expressed on a graph than written as equations with complicated subscripts. L´ eon Bottou 2/37 COS 424 – 4/15/2010

  3. Summary Summary I. Directed graphical models II. Undirected graphical models III. Inference in graphical models More – David Blei runs a complete course on graphical models. L´ eon Bottou 3/37 COS 424 – 4/15/2010

  4. I. Directed graphical models “Bayesian Networks” (Pearl 1988) L´ eon Bottou 4/37 COS 424 – 4/15/2010

  5. A pattern for independence assumptions Probability distribution P ( x 1 , x 2 , x 3 , x 4 ) Bayesian chain theorem P ( x 1 , x 2 , x 3 , x 4 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 1 , x 2 , x 3 ) Independence assumptions P ( x 1 , x 2 , x 3 , x 4 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 1 , x 2 , x 3 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 ) P ( x 4 | x 1 , x 2 ) L´ eon Bottou 5/37 COS 424 – 4/15/2010

  6. Graphical representation Bayesian chain theorem P ( x 1 , x 2 , x 3 , x 4 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 1 , x 2 , x 3 ) Directed acyclic graph � � � � � � � � Arrows do not represent causality! L´ eon Bottou 6/37 COS 424 – 4/15/2010

  7. Graphical representation Independence assumptions P ( x 1 , x 2 , x 3 , x 4 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 1 , x 2 , x 3 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 ) P ( x 4 | x 1 , x 2 ) � � � � � � � � Missing links represent independence assumptions L´ eon Bottou 7/37 COS 424 – 4/15/2010

  8. A more complicated example P ( x 1 ) P ( x 2 ) P ( x 3 ) P ( x 4 | x 1 , x 2 ) P ( x 5 | x 1 , x 2 , x 3 ) P ( x 6 | x 4 ) P ( x 7 | x 4 , x 5 ) � � � � � � � � � � � � � � Parametrization The graph says nothing about the parametric form of the probabilities. – Discrete distributions – Continuous distributions L´ eon Bottou 8/37 COS 424 – 4/15/2010

  9. Discrete distributions Input x = ( x 1 , x 2 . . . x d ) ∈ { 0 , 1 } d . Class y ∈ { A 1 , . . . , A k } . General generative model Na ¨ ıve Bayes model P ( x , y ) = P ( y ) P ( x | y ) P ( x , y ) = P ( y ) P ( x 1 | y ) . . . P ( x d | y ) � � � � � � � – k parameters for P ( y ) – k 2 d parameters for P ( x | y ) � � – k parameters for P ( y ) – k d parameters for P ( x | y ) L´ eon Bottou 9/37 COS 424 – 4/15/2010

  10. Discrete distributions Na ¨ ıve Bayes model Linear discriminant model P ( x , y ) = P ( y ) P ( x 1 | y ) . . . P ( x d | y ) P ( x , y ) = P ( x ) P ( y | x ) � � � � � � � y ( x ) = arg max ˆ P ( x , y ) y � � = arg max P ( y | x ) y y ( x ) = arg max ˆ P ( x , y ) y – k parameters for P ( y ) . – k ( d + 1) parameters for P ( y | x ) . – 2 d unused parameters for P ( x ) . – k d parameters for P ( x | y ) . Fails when the x i are correlated ! Works when the x i are correlated ! L´ eon Bottou 10/37 COS 424 – 4/15/2010

  11. Continuous distributions Linear regression – Input x = ( x 1 , x 2 . . . x d ) ∈ R d . – Output y ∈ R . P ( x , y ) = P ( y | x ) P ( x ) � � � � − 1 � 2 � y − w ⊤ x P ( y | x ) ∝ exp 2 σ 2 No need to model P ( x ) . L´ eon Bottou 11/37 COS 424 – 4/15/2010

  12. Bayesian regression Consider a dataset D = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } . n � P ( D , w ) = P ( w ) P ( D| w ) = P ( w ) P ( y i | x i , w ) P ( x i ) i =1 � � � � � � � � ��� � Plates represent repeated subgraphs. Although the parameter w is explicit, other details about the distributions are not. L´ eon Bottou 12/37 COS 424 – 4/15/2010

  13. Hidden Markov Models P ( x 1 . . . x T , s 1 . . . s T ) = P ( s 1 ) P ( x 1 | s 1 ) P ( s 2 | s 1 ) P ( x 2 | s 2 ) . . . P ( s T | s T − 1 ) P ( x T | s T ) � � � � � � � � � � � � � � � � What is the relation between this graph and that graph? ��� � � L´ eon Bottou 13/37 COS 424 – 4/15/2010

  14. Conditional independence patterns (1) Tail-to-tail � � � � � � P ( a, b, c ) = P ( a | c ) P ( b | c ) P ( c ) P ( a, b, c ) = P ( a | c ) P ( b | c ) P ( c ) � P ( a, b ) = P ( a | c ) P ( b | c ) P ( c ) P ( a, b | c ) = P ( a, b, c ) /P ( c ) c = P ( a | c ) P ( b | c ) � = P ( a ) P ( b ) in general a ⊥ �⊥ b | ∅ a ⊥ ⊥ b | c L´ eon Bottou 14/37 COS 424 – 4/15/2010

  15. Conditional independence patterns (2) Head-to-tail � � � � � � P ( a, b, c ) = P ( a ) P ( c | a ) P ( b | c ) P ( a, b, c ) = P ( a ) P ( c | a ) P ( b | c ) = P ( a, c ) P ( b | c ) � P ( a, b ) = P ( a ) P ( c | a ) P ( b | c ) c P ( a, b | c ) = P ( a, b, c ) /P ( c ) � = P ( a ) P ( b, c | a ) = P ( a | c ) P ( b | c ) c = P ( a ) P ( b | a ) � = P ( a ) P ( b ) in general a ⊥ �⊥ b | ∅ a ⊥ ⊥ b | c L´ eon Bottou 15/37 COS 424 – 4/15/2010

  16. Conditional independence patterns (3) Head-to-head � � � � � � P ( a, b, c ) = P ( a ) P ( b ) P ( c | a, b ) P ( a, b, c ) = P ( a ) P ( b ) P ( c | a, b ) � P ( a, b | c ) � = P ( a | c ) P ( b | c ) in general P ( a, b ) = P ( a ) P ( b ) P ( c | a, b ) c Example: � = P ( a ) P ( b ) P ( c | a, b )) c = “the house is shaking” c a = “there is an earthquake” = P ( a ) P ( b ) b = “a truck hits the house” a ⊥ ⊥ b | ∅ a ⊥ �⊥ b | c L´ eon Bottou 16/37 COS 424 – 4/15/2010

  17. D-separation Problem – Consider three disjoint sets of nodes: A , B , C . – When do we have A ⊥ ⊥ B | C ? Definition A and B are d-separated by C if all paths from a ∈ A to b ∈ B – contain a head-to-tail or tail-to-tail node c ∈ C , or – contain a head-to-head node c such that neither c nor any of its descendants belongs to C . Theorem A and B are d-separated by C ⇐ ⇒ A ⊥ ⊥ B | C L´ eon Bottou 17/37 COS 424 – 4/15/2010

  18. II. Undirected graphical models “Markov Random Fields” L´ eon Bottou 18/37 COS 424 – 4/15/2010

  19. Another independence assumption pattern Boltzmann distribution P ( x ) = 1 � � � � � Z exp − E ( x ) Z = exp − E ( x ) with x – The function E ( x ) is called energy function . – The quantity Z is called the partition function . Markov Random Field – Let { x C } be a family of subsets of the variables x . – The distribution P ( x ) is a Markov Random Field with cliques { x C } if � there are functions E C ( x C ) such that E ( x ) = E C ( x C ) . C Equivalently, P ( x ) = 1 � Ψ C ( x C ) with Ψ C ( x C ) = exp( − E C ( x C )) > 0 . Z C L´ eon Bottou 19/37 COS 424 – 4/15/2010

  20. Graphical representation P ( x 1 , x 2 , x 3 , x 4 , x 5 ) = 1 Z Ψ 1 ( x 1 , x 2 ) Ψ 2 ( x 2 , x 3 ) Ψ 3 ( x 3 , x 4 , x 5 ) � � � � � � � � � � � � � � � � – Completely connect the nodes belonging to each x C . – Each subset x C forms a clique of the graph. L´ eon Bottou 20/37 COS 424 – 4/15/2010

  21. Markov Blanket Definition – The Markov blanket of x is the minimal subset of variables B x of the variables x such that P ( x | x \ x ) = P ( x | B x ) . Example Ψ 1 ( x 1 , x 2 ) Ψ 2 ( x 2 , x 3 ) Ψ 3 ( x 3 , x 4 , x 5 ) P ( x 3 | x 1 , x 2 , x 4 , x 5 ) = � Ψ 1 ( x 1 , x 2 ) Ψ 2 ( x 2 , x ′ 3 ) Ψ 3 ( x ′ 3 , x 4 , x 5 ) x ′ 3 Ψ 2 ( x 2 , x 3 ) Ψ 3 ( x 3 , x 4 , x 5 ) = � Ψ 2 ( x 2 , x ′ 3 ) Ψ 3 ( x ′ 3 , x 4 , x 5 ) x ′ 3 = P ( x 3 | x 2 , x 4 , x 5 ) L´ eon Bottou 21/37 COS 424 – 4/15/2010

  22. Graph and Markov blanket The Markov blanket of a MRF variable is the set of its neighbors. P ( x 3 | x 1 , x 2 , x 4 , x 5 ) = P ( x 3 | x 2 , x 4 , x 5 ) � � � � � � � � � � � � � � � � Consequence – Consider three disjoint sets of nodes: A , B , C . � Any path between a ∈ A and b ∈ B A ⊥ ⊥ B | C ⇐ ⇒ passes through a node c ∈ C. Conversely (Hammersley-Clifford theorem) – Any distribution that satisfies such properties with respect to an undirected graph is a Markov Random Field. L´ eon Bottou 22/37 COS 424 – 4/15/2010

  23. Directed vs. undirected graphs Consider a directed graph. P ( x ) = P ( x 1 ) P ( x 2 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 2 ) � �� � � �� � � �� � � �� � Ψ 1 ( x 1 ) Ψ 2 ( x 2 ) Ψ 3 ( x 1 , x 2 , x 3 ) Ψ 4 ( x 2 , x 4 ) ( Z = 1 ) � � � � � � � � � � � � � � � � The opposite inclusion is not true because the undirected graph marries the parents of x 3 with a moralization link. Directed and undirected graphs represent different sets of distributions. Neither set is included in the other one. L´ eon Bottou 23/37 COS 424 – 4/15/2010

  24. Example: image denoising Noise model: randomly flipping a small proportion of the pixels. Image model: pixel distribution given its four neighbors. ������������ ���������� ����������� �������� Inference problem – Given the observed noisy pixels, reconstruct the true pixel distributions. L´ eon Bottou 24/37 COS 424 – 4/15/2010

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend