natural language processing cse 517 graphical models
play

Natural Language Processing (CSE 517): Graphical Models Noah Smith - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Graphical Models Noah Smith 2016 c University of Washington nasmith@cs.washington.edu February 810, 2016 1 / 77 Notation Let V = V 1 , V 2 , . . . , V be a collection of random


  1. Natural Language Processing (CSE 517): Graphical Models Noah Smith � 2016 c University of Washington nasmith@cs.washington.edu February 8–10, 2016 1 / 77

  2. Notation Let V = � V 1 , V 2 , . . . , V ℓ � be a collection of random variables (not necessarily a sequence). Val( V ) will denote the values of a r.v. V . V I denotes a subset of the r.v.s V with indices i ∈ I . V ¬ I = V \ V I Recall: ◮ p ( V ) = � ℓ i =1 p ( V i | V 1 , . . . , V i − 1 ) (always true, for any ordering) ◮ p ( V I , V J | V K ) = p ( V I | V K ) · p ( V J | V K ) if and only if V I ⊥ V J | V K (conditional independence) ◮ p ( V I = v I ) = � v ¬ I ∈ Val( V ¬ I ) p ( V I = v I , V ¬ I = v ¬ I ) (marginalization) 2 / 77

  3. Factor Graphs Two kinds of vertices: ◮ Random variables (denoted by circles, “ V i ”) ◮ Factors (denoted by squares, “ f j ”) The graph is bipartite ; every edge connects some variable to some factor. Let I j ⊆ { 1 , . . . , ℓ } be the set of variables f j is connected to. Factor f j defines a map Val( V I j ) → R ≥ 0 . The graph and factors define a probability distribution: � p ( V = v ) ∝ f j ( v I j ) j 3 / 77

  4. Factor Graphs We’ve Seen Before Hidden Markov model: y 0 y 1 y 2 y 3 y 4 y 5 x 1 x 2 x 3 x 4 General first-order sequence model: y 0 y 1 y 2 y 3 y 4 y 5 x 4 / 77

  5. Two Kinds of Factors Conditional probability tables. E.g., if I j = { 1 , 2 , 3 } : f j ( v 1 , v 2 , v 3 ) = p ( V 3 = v 3 | V 1 = v 1 , V 2 = v 2 ) Lead to Bayesian networks (with some constraints). Potential functions (arbitrary nonnegative values). Lead to Markov random fields (a.k.a. Markov networks). 5 / 77

  6. Yucky Bayesian Network Influenza Allergies Sinus Inflamm. Runny Headache Nose Sinus inflammation is caused by flu, but also by allergies. Runny nose and headache are both caused by sinus inflammation. 6 / 77

  7. Yucky Factor Graph Influenza Allergies Sinus Inflamm. Runny Headache Nose Sinus inflammation is caused by flu, but also by allergies. Runny nose and headache are both caused by sinus inflammation. 7 / 77

  8. Yucky Factor Graph Influenza Allergies Sinus Inflamm. Runny Headache Nose S I A f S,I,A 0 0 0 0 0 1 R S f R,S H S f H,S I f I I f A 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 8 / 77

  9. Yucky Factor Graph Influenza Allergies Influenza Allergies Sinus Sinus Inflamm. Inflamm. Runny Runny Headache Headache Nose Nose S I A f S,I,A 0 0 0 0 0 1 R S f R,S H S f H,S I f I I f A 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 p ( i, a, s, r, h ) = f I ( i ) · f A ( a ) · f S,I,A ( s, i, a ) · f R,S ( r, s ) · f H,S ( h, s ) = p ( i ) · p ( a ) · p ( s | i, a ) · p ( r | s ) · p ( h | s ) 9 / 77

  10. Naughty Markov Random Field Adrian Dana Brook Chris Independencies: A ⊥ C | B, D ; B ⊥ D | A, C ; ¬ A ⊥ C ; ¬ B ⊥ D 10 / 77

  11. Naughty Factor Graph Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 p ( a, b, c, d ) = f A,B ( a, b ) · f B,C ( b, c ) · f C,D ( c, d ) · f D,A ( d, a ) � � � � f A,B ( a ′ , b ′ ) · f B,C ( b ′ , c ′ ) · f C,D ( c ′ , d ′ ) · f D,A ( d ′ , a ′ ) a ′ ∈ b ′ ∈ c ′ ∈ d ′ ∈ Val( A ) Val( B ) Val( C ) Val( D ) 11 / 77

  12. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 12 / 77

  13. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 � � � � f A,B ( a ′ , b ′ ) · f B,C ( b ′ , c ′ ) · f C,D ( c ′ , d ′ ) · f D,A ( d ′ , a ′ ) a ′ ∈ b ′ ∈ c ′ ∈ d ′ ∈ Val( A ) Val( B ) Val( C ) Val( D ) = 7 , 201 , 840 13 / 77

  14. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 p ( A = 0 , B = 1 , C = 1 , D = 0) = 5 , 000 , 000 7 , 201 , 840 ≈ 0 . 69 14 / 77

  15. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 10 p ( A = 1 , B = 1 , C = 0 , D = 0) = 7 , 201 , 840 ≈ 0 . 0000014 15 / 77

  16. Structure and Independence Bayesian networks: ◮ A variable is conditionally independent of its non-descendants given its parents. Markov networks: ◮ Conditional independence derived from “Markov blanket” and separation properties. Local configurations can be used to check all conditional independence questions; almost no need to look at the values in the factors! 16 / 77

  17. Independence “Spectrum” ℓ � f V i ( V i ) f V ( V ) i =1 everything is independent everything can be interdependent minimal expressive power arbitrary expressive power fewer parameters more parameters 17 / 77

  18. Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 0 0 1 30 A B f A,B B C f B,C 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 18 / 77

  19. Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 A B f A,B B C f B,C 0 0 1 30 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 This might remind you of a join operation on a database. 19 / 77

  20. Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 A B f A,B B C f B,C 0 0 1 30 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 What happens if you multiply out all the factors in a factor graph? 20 / 77

  21. Operations on Factors: Maximization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: f U ( u ) = v ∈ Val( V ) f U ,V ( u , v ) max for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 0 0 1 30 A C f A,C 0 0 3,000 B = 0 0 1 0 5 = max 0 1 500 B = 1 0 1 1 500 B 1 0 100 B = 0 1 0 0 100 1 1 1,000 B = 1 1 0 1 1 1 1 0 10 1 1 1 1,000 21 / 77

  22. Operations on Factors: Marginalization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: � f U ( u ) = f U ,V ( u , v ) v ∈ Val( V ) for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 A C f A,C 0 0 1 30 � 0 0 3,000 + 5 0 1 0 5 = 0 1 30 + 500 0 1 1 500 1 0 100 + 10 1 0 0 100 B 1 1 1 + 1,000 1 0 1 1 1 1 0 10 1 1 1 1,000 22 / 77

  23. Operations on Factors: Marginalization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: � f U ( u ) = f U ,V ( u , v ) v ∈ Val( V ) for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 A C f A,C 0 0 1 30 0 0 3,000 + 5 � 0 1 0 5 = 0 1 30 + 500 0 1 1 500 1 0 100 + 10 1 0 0 100 B 1 1 1 + 1,000 1 0 1 1 1 1 0 10 1 1 1 1,000 If you multiply out all the factors in a factor graph, then sum out each variable, one by one, until none are left, what do you get? 23 / 77

  24. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 24 / 77

  25. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) 25 / 77

  26. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) � � � � ◮ Sums are commutative: f = f X Y Y X 26 / 77

  27. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) � � � � ◮ Sums are commutative: f = f X Y Y X ◮ Maximizations are commutative: max max f = max max f X Y Y X 27 / 77

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend