Natural Language Processing (CSE 517): Graphical Models
Noah Smith
c 2016 University of Washington nasmith@cs.washington.edu
February 8–10, 2016
1 / 77
Natural Language Processing (CSE 517): Graphical Models Noah Smith - - PowerPoint PPT Presentation
Natural Language Processing (CSE 517): Graphical Models Noah Smith 2016 c University of Washington nasmith@cs.washington.edu February 810, 2016 1 / 77 Notation Let V = V 1 , V 2 , . . . , V be a collection of random
1 / 77
2 / 77
3 / 77
4 / 77
5 / 77
6 / 77
Influenza Allergies Sinus Inflamm. Runny Nose Headache
7 / 77
Influenza Allergies Sinus Inflamm. Runny Nose Headache
I fI 1 I fA 1 S I A fS,I,A 1 1 1 1 1 1 1 1 1 1 1 1 R S fR,S 1 1 1 1 H S fH,S 1 1 1 1
8 / 77
Influenza Allergies Sinus Inflamm. Runny Nose Headache Influenza Allergies Sinus Inflamm. Runny Nose Headache
I fI 1 I fA 1 S I A fS,I,A 1 1 1 1 1 1 1 1 1 1 1 1 R S fR,S 1 1 1 1 H S fH,S 1 1 1 1
9 / 77
Adrian Brook Chris Dana
10 / 77
Adrian Brook Chris Dana A B fA,B 1 1 1 1 B C fB,C 1 1 1 1 C D fC,D 1 1 1 1 D A fD,A 1 1 1 1
Val(A)
Val(B)
Val(C)
Val(D)
11 / 77
Adrian Brook Chris Dana A B fA,B 30 1 5 1 1 1 1 10 B C fB,C 100 1 1 1 1 1 1 100 C D fC,D 1 1 100 1 100 1 1 1 D A fD,A 100 1 1 1 1 1 1 100
12 / 77
Adrian Brook Chris Dana A B fA,B 30 1 5 1 1 1 1 10 B C fB,C 100 1 1 1 1 1 1 100 C D fC,D 1 1 100 1 100 1 1 1 D A fD,A 100 1 1 1 1 1 1 100
Val(A)
Val(B)
Val(C)
Val(D)
13 / 77
Adrian Brook Chris Dana A B fA,B 30 1 5 1 1 1 1 10 B C fB,C 100 1 1 1 1 1 1 100 C D fC,D 1 1 100 1 100 1 1 1 D A fD,A 100 1 1 1 1 1 1 100
14 / 77
Adrian Brook Chris Dana A B fA,B 30 1 5 1 1 1 1 10 B C fB,C 100 1 1 1 1 1 1 100 C D fC,D 1 1 100 1 100 1 1 1 D A fD,A 100 1 1 1 1 1 1 100
15 / 77
16 / 77
17 / 77
A B fA,B 30 1 5 1 1 1 1 10
B C fB,C 100 1 1 1 1 1 1 100
A B C fA,B,C 3,000 1 30 1 5 1 1 500 1 100 1 1 1 1 1 10 1 1 1 1,000
18 / 77
A B fA,B 30 1 5 1 1 1 1 10
B C fB,C 100 1 1 1 1 1 1 100
A B C fA,B,C 3,000 1 30 1 5 1 1 500 1 100 1 1 1 1 1 10 1 1 1 1,000
19 / 77
A B fA,B 30 1 5 1 1 1 1 10
B C fB,C 100 1 1 1 1 1 1 100
A B C fA,B,C 3,000 1 30 1 5 1 1 500 1 100 1 1 1 1 1 10 1 1 1 1,000
20 / 77
A C fA,C 3,000 B = 0 1 500 B = 1 1 100 B = 0 1 1 1,000 B = 1
A B C fA,B,C 3,000 1 30 1 5 1 1 500 1 100 1 1 1 1 1 10 1 1 1 1,000
21 / 77
A C fA,C 3,000 + 5 1 30 + 500 1 100 + 10 1 1 1 + 1,000
A B C fA,B,C 3,000 1 30 1 5 1 1 500 1 100 1 1 1 1 1 10 1 1 1 1,000
22 / 77
A C fA,C 3,000 + 5 1 30 + 500 1 100 + 10 1 1 1 + 1,000
A B C fA,B,C 3,000 1 30 1 5 1 1 500 1 100 1 1 1 1 1 10 1 1 1 1,000
23 / 77
24 / 77
25 / 77
26 / 77
27 / 77
28 / 77
29 / 77
30 / 77
◮ Related: draw samples from that distribution. 31 / 77
◮ Related: draw samples from that distribution.
32 / 77
◮ Related: draw samples from that distribution.
◮ Related: what is the most dangerous assignment to U? 33 / 77
◮ Related: draw samples from that distribution.
◮ Related: what is the most dangerous assignment to U?
34 / 77
◮ Related: draw samples from that distribution.
◮ Related: what is the most dangerous assignment to U?
◮ Related: what values of Q have the lowest expected cost? 35 / 77
V1 V2 V3 V4
V1 fV1 1 V1 V2 fV1,V2 1 1 1 1 V2 V3 fV2,V3 1 1 1 1 V3 V4 fV3,V4 1 1 1 1
36 / 77
37 / 77
38 / 77
◮ But that multiplied-out factor would have
39 / 77
◮ But that multiplied-out factor would have
40 / 77
V1 V2 V3 V4
V1 fV1 1 V1 V2 fV1,V2 1 1 1 1 V2 V3 fV2,V3 1 1 1 1 V3 V4 fV3,V4 1 1 1 1
41 / 77
V1 V2 V3 V4
V2 fV2 1 V2 V3 fV2,V3 1 1 1 1 V3 V4 fV3,V4 1 1 1 1
42 / 77
V1 V2 V3 V4
V3 fV3 1 V3 V4 fV3,V4 1 1 1 1
43 / 77
V1 V2 V3 V4
V4 fV4 1
44 / 77
45 / 77
◮ Eliminate V ; i.e., remove factors connected to V and replace
46 / 77
◮ Eliminate V ; i.e., remove factors connected to V and replace
47 / 77
V1 V2 V3 V4
48 / 77
V1 V2 V3 V4
49 / 77
50 / 77
51 / 77
52 / 77
V1 V2 V3 V4
V1 fV1 1 V1 V2 fV1,V2 1 1 1 1 V2 V3 fV2,V3 1 1 1 1 V3 V4 fV3,V4 1 1 1 1
53 / 77
V1 V2 V3 V4
V1 fV1 1 V1 V2 fV1,V2 1 1 1 1 V2 V3 fV2,V3 1 1 1 1 V3 V4 fV3,V4 1 1 1 1
54 / 77
V1 V2 V3 V4
V1 fV1 1 V1 V2 fV1,V2 1 1 1 1 V2 V3 fV2,V3 1 1 1 1 V3 V4 fV3,V4 1 1 1 1
55 / 77
◮ Eliminate V ; i.e., remove factors connected to V and replace
56 / 77
57 / 77
58 / 77
◮ Given an observed sequence x, however, an HMM provides a
59 / 77
◮ Given an observed sequence x, however, an HMM provides a
◮ Sometimes called “dynamic graphical models.” 60 / 77
◮ Given an observed sequence x, however, an HMM provides a
◮ Sometimes called “dynamic graphical models.”
61 / 77
◮ Given an observed sequence x, however, an HMM provides a
◮ Sometimes called “dynamic graphical models.”
◮ All variables share some computation with those to their right
62 / 77
◮ Given an observed sequence x, however, an HMM provides a
◮ Sometimes called “dynamic graphical models.”
◮ All variables share some computation with those to their right
◮ This is called the forward-backward algorithm. 63 / 77
◮ Given an observed sequence x, however, an HMM provides a
◮ Sometimes called “dynamic graphical models.”
◮ All variables share some computation with those to their right
◮ This is called the forward-backward algorithm. ◮ This is useful when we want to apply EM to HMMs
64 / 77
◮ Given an observed sequence x, however, an HMM provides a
◮ Sometimes called “dynamic graphical models.”
◮ All variables share some computation with those to their right
◮ This is called the forward-backward algorithm. ◮ This is useful when we want to apply EM to HMMs
◮ It is also useful in supervised learning. 65 / 77
66 / 77
67 / 77
68 / 77
69 / 77
70 / 77
71 / 77
72 / 77
73 / 77
74 / 77
75 / 77
76 / 77
77 / 77