Natural Language Processing (CSE 517): Graphical Models Noah Smith - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Graphical Models Noah Smith � 2016 c University of Washington nasmith@cs.washington.edu February 8–10, 2016 1 / 77

Notation Let V = � V 1 , V 2 , . . . , V ℓ � be a collection of random variables (not necessarily a sequence). Val( V ) will denote the values of a r.v. V . V I denotes a subset of the r.v.s V with indices i ∈ I . V ¬ I = V \ V I Recall: ◮ p ( V ) = � ℓ i =1 p ( V i | V 1 , . . . , V i − 1 ) (always true, for any ordering) ◮ p ( V I , V J | V K ) = p ( V I | V K ) · p ( V J | V K ) if and only if V I ⊥ V J | V K (conditional independence) ◮ p ( V I = v I ) = � v ¬ I ∈ Val( V ¬ I ) p ( V I = v I , V ¬ I = v ¬ I ) (marginalization) 2 / 77

Factor Graphs Two kinds of vertices: ◮ Random variables (denoted by circles, “ V i ”) ◮ Factors (denoted by squares, “ f j ”) The graph is bipartite ; every edge connects some variable to some factor. Let I j ⊆ { 1 , . . . , ℓ } be the set of variables f j is connected to. Factor f j defines a map Val( V I j ) → R ≥ 0 . The graph and factors define a probability distribution: � p ( V = v ) ∝ f j ( v I j ) j 3 / 77

Factor Graphs We’ve Seen Before Hidden Markov model: y 0 y 1 y 2 y 3 y 4 y 5 x 1 x 2 x 3 x 4 General first-order sequence model: y 0 y 1 y 2 y 3 y 4 y 5 x 4 / 77

Two Kinds of Factors Conditional probability tables. E.g., if I j = { 1 , 2 , 3 } : f j ( v 1 , v 2 , v 3 ) = p ( V 3 = v 3 | V 1 = v 1 , V 2 = v 2 ) Lead to Bayesian networks (with some constraints). Potential functions (arbitrary nonnegative values). Lead to Markov random fields (a.k.a. Markov networks). 5 / 77

Yucky Bayesian Network Influenza Allergies Sinus Inflamm. Runny Headache Nose Sinus inflammation is caused by flu, but also by allergies. Runny nose and headache are both caused by sinus inflammation. 6 / 77

Yucky Factor Graph Influenza Allergies Sinus Inflamm. Runny Headache Nose Sinus inflammation is caused by flu, but also by allergies. Runny nose and headache are both caused by sinus inflammation. 7 / 77

Yucky Factor Graph Influenza Allergies Sinus Inflamm. Runny Headache Nose S I A f S,I,A 0 0 0 0 0 1 R S f R,S H S f H,S I f I I f A 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 8 / 77

Yucky Factor Graph Influenza Allergies Influenza Allergies Sinus Sinus Inflamm. Inflamm. Runny Runny Headache Headache Nose Nose S I A f S,I,A 0 0 0 0 0 1 R S f R,S H S f H,S I f I I f A 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 p ( i, a, s, r, h ) = f I ( i ) · f A ( a ) · f S,I,A ( s, i, a ) · f R,S ( r, s ) · f H,S ( h, s ) = p ( i ) · p ( a ) · p ( s | i, a ) · p ( r | s ) · p ( h | s ) 9 / 77

Naughty Markov Random Field Adrian Dana Brook Chris Independencies: A ⊥ C | B, D ; B ⊥ D | A, C ; ¬ A ⊥ C ; ¬ B ⊥ D 10 / 77

Naughty Factor Graph Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 p ( a, b, c, d ) = f A,B ( a, b ) · f B,C ( b, c ) · f C,D ( c, d ) · f D,A ( d, a ) � � � � f A,B ( a ′ , b ′ ) · f B,C ( b ′ , c ′ ) · f C,D ( c ′ , d ′ ) · f D,A ( d ′ , a ′ ) a ′ ∈ b ′ ∈ c ′ ∈ d ′ ∈ Val( A ) Val( B ) Val( C ) Val( D ) 11 / 77

Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 12 / 77

Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 � � � � f A,B ( a ′ , b ′ ) · f B,C ( b ′ , c ′ ) · f C,D ( c ′ , d ′ ) · f D,A ( d ′ , a ′ ) a ′ ∈ b ′ ∈ c ′ ∈ d ′ ∈ Val( A ) Val( B ) Val( C ) Val( D ) = 7 , 201 , 840 13 / 77

Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 p ( A = 0 , B = 1 , C = 1 , D = 0) = 5 , 000 , 000 7 , 201 , 840 ≈ 0 . 69 14 / 77

Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 10 p ( A = 1 , B = 1 , C = 0 , D = 0) = 7 , 201 , 840 ≈ 0 . 0000014 15 / 77

Structure and Independence Bayesian networks: ◮ A variable is conditionally independent of its non-descendants given its parents. Markov networks: ◮ Conditional independence derived from “Markov blanket” and separation properties. Local configurations can be used to check all conditional independence questions; almost no need to look at the values in the factors! 16 / 77

Independence “Spectrum” ℓ � f V i ( V i ) f V ( V ) i =1 everything is independent everything can be interdependent minimal expressive power arbitrary expressive power fewer parameters more parameters 17 / 77

Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 0 0 1 30 A B f A,B B C f B,C 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 18 / 77

Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 A B f A,B B C f B,C 0 0 1 30 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 This might remind you of a join operation on a database. 19 / 77

Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 A B f A,B B C f B,C 0 0 1 30 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 What happens if you multiply out all the factors in a factor graph? 20 / 77

Operations on Factors: Maximization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: f U ( u ) = v ∈ Val( V ) f U ,V ( u , v ) max for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 0 0 1 30 A C f A,C 0 0 3,000 B = 0 0 1 0 5 = max 0 1 500 B = 1 0 1 1 500 B 1 0 100 B = 0 1 0 0 100 1 1 1,000 B = 1 1 0 1 1 1 1 0 10 1 1 1 1,000 21 / 77

Operations on Factors: Marginalization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: � f U ( u ) = f U ,V ( u , v ) v ∈ Val( V ) for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 A C f A,C 0 0 1 30 � 0 0 3,000 + 5 0 1 0 5 = 0 1 30 + 500 0 1 1 500 1 0 100 + 10 1 0 0 100 B 1 1 1 + 1,000 1 0 1 1 1 1 0 10 1 1 1 1,000 22 / 77

Operations on Factors: Marginalization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: � f U ( u ) = f U ,V ( u , v ) v ∈ Val( V ) for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 A C f A,C 0 0 1 30 0 0 3,000 + 5 � 0 1 0 5 = 0 1 30 + 500 0 1 1 500 1 0 100 + 10 1 0 0 100 B 1 1 1 + 1,000 1 0 1 1 1 1 0 10 1 1 1 1,000 If you multiply out all the factors in a factor graph, then sum out each variable, one by one, until none are left, what do you get? 23 / 77

Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 24 / 77

Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) 25 / 77

Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) � � � � ◮ Sums are commutative: f = f X Y Y X 26 / 77

Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) � � � � ◮ Sums are commutative: f = f X Y Y X ◮ Maximizations are commutative: max max f = max max f X Y Y X 27 / 77

Natural Language Processing (CSE 517): Graphical Models Noah Smith - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Graphical Models Noah Smith 2016 c University of Washington nasmith@cs.washington.edu February 810, 2016 1 / 77 Notation Let V = V 1 , V 2 , . . . , V be a collection of random

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

CSE 517 Natural Language Processing Winter 2019 Hidden Markov Models Yejin Choi University of

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

CSE 517: Natural Language Processing New Quals Course! Instructor: Luke Zettlemoyer Winter 2013

CSE 517 Natural Language Processing Winter 2017 Introduction Yejin Choi Slides adapted from

CSE 517 Natural Language Processing - Winter 2018! - Yejin Choi Computer Science &

Natural Language Processing (CSE 517): Sequence Models Noah Smith 2018 c University of

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing (CSE 517): Machine Translation Noah Smith 2018 c University of

CSE 517 Natural Language Processing Winter 2017 Parts of Speech Yejin Choi [Slides adapted

Compilers Spilling Alex Aiken Spilling What happens if the graph coloring heuristic fails to

Web Intelligence (WI) Web Intelligence (WI) Some Research Challenges Some Research Challenges

Safeguarding Adults at Risk Laura Thorpe Safeguarding Adults in Sport Manager Introducing ACT

Outline Additive Manufacturing personal involvement Industrial visits Undergraduate

EP 222: Classical Mechanics - Lecture 6 Dipan K. Ghosh Indian Institute of Technology Bombay

Comparing Discrete and Piecewise Affine Models of Gene Regulatory Networks Shahrad Jamshidi,

4 Analysis of Example Truss by a CAS IFEM Ch 4 Slide 1 Introduction to FEM Computer

Presented by Jaime Lizotte HR & Compliance Manager Welcome! Before we get started

Natural Language Processing (CSE 517): Graphical Models Noah Smith - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Graphical Models Noah Smith 2016 c University of Washington nasmith@cs.washington.edu February 810, 2016 1 / 77 Notation Let V = V 1 , V 2 , . . . , V be a collection of random

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

CSE 517 Natural Language Processing Winter 2019 Hidden Markov Models Yejin Choi University of

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

CSE 517: Natural Language Processing New Quals Course! Instructor: Luke Zettlemoyer Winter 2013

CSE 517 Natural Language Processing Winter 2017 Introduction Yejin Choi Slides adapted from

CSE 517 Natural Language Processing - Winter 2018! - Yejin Choi Computer Science &amp;

Natural Language Processing (CSE 517): Sequence Models Noah Smith 2018 c University of

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing (CSE 517): Machine Translation Noah Smith 2018 c University of

CSE 517 Natural Language Processing Winter 2017 Parts of Speech Yejin Choi [Slides adapted

Compilers Spilling Alex Aiken Spilling What happens if the graph coloring heuristic fails to

Web Intelligence (WI) Web Intelligence (WI) Some Research Challenges Some Research Challenges

Safeguarding Adults at Risk Laura Thorpe Safeguarding Adults in Sport Manager Introducing ACT

Outline Additive Manufacturing personal involvement Industrial visits Undergraduate

EP 222: Classical Mechanics - Lecture 6 Dipan K. Ghosh Indian Institute of Technology Bombay

Comparing Discrete and Piecewise Affine Models of Gene Regulatory Networks Shahrad Jamshidi,

4 Analysis of Example Truss by a CAS IFEM Ch 4 Slide 1 Introduction to FEM Computer

Presented by Jaime Lizotte HR &amp; Compliance Manager Welcome! Before we get started

CSE 517 Natural Language Processing - Winter 2018! - Yejin Choi Computer Science &

Presented by Jaime Lizotte HR & Compliance Manager Welcome! Before we get started