CS 6355: Structured Prediction
Inference: Graph Search
1
Inference: Graph Search CS 6355: Structured Prediction 1 So far in - - PowerPoint PPT Presentation
Inference: Graph Search CS 6355: Structured Prediction 1 So far in the class Thinking about structures A graph, a collection of parts that are labeled jointly, a collection of decisions Algorithms for learning Local learning
1
2
– An overview of what we have seen before – Combinatorial optimization – Different views of inference
– Dynamic programming, greedy algorithms, search
– Sampling
3
– An overview of what we have seen before – Combinatorial optimization – Different views of inference
– Dynamic programming, greedy algorithms, search
– Sampling
4
5
6
7
Viterbi is an instance of max-product variable elimination
8
y2 y3 y1 yn …
A B C D A B C D A B C D
emissions 𝑧0 transitions(𝑧0, 𝑧4)
9
y2 y3 y1 yn …
A B C D A B C D A B C D
score-local 𝑧8, 𝑧890 = emissions 𝑧890 + transitions(𝑧8, 𝑧890) emissions 𝑧0 transitions(𝑧0, 𝑧4)
10
y2 y3 y1 yn … First eliminate y1
A B C D A B C D A B C D
score4 𝑧4 = max
<=
score0 𝑧0 + score-local 𝑧0, 𝑧4 score-local 𝑧8, 𝑧890 = emissions 𝑧890 + transitions(𝑧8, 𝑧890) emissions 𝑧0 transitions(𝑧0, 𝑧4)
11
y2 y3 yn …
A B C D A B C D A B C D
score-local 𝑧8, 𝑧890 = emissions 𝑧890 + transitions(𝑧8, 𝑧890) score4 𝑧4 transitions(𝑧4, 𝑧>)
12
y2 y3 yn …
A B C D A B C D A B C D
Next eliminate y2 score> 𝑧> = max
<?
score4 𝑧4 + score-local 𝑧4, 𝑧> score-local 𝑧8, 𝑧890 = emissions 𝑧890 + transitions(𝑧8, 𝑧890) score4 𝑧4 transitions(𝑧4, 𝑧>)
13
y3 yn …
A B C D A B C D A B C D
score-local 𝑧8, 𝑧890 = emissions 𝑧890 + transitions(𝑧8, 𝑧890) score> 𝑧> transitions(𝑧>, 𝑧@)
14
y3 yn …
A B C D A B C D A B C D
Next eliminate y3 score@ 𝑧@ = max
<A
score> 𝑧> + score-local 𝑧>, 𝑧@ score-local 𝑧8, 𝑧890 = emissions 𝑧890 + transitions(𝑧8, 𝑧890) score> 𝑧> transitions(𝑧>, 𝑧@)
15
yn
A B C D
We have all the information to make a decision for yn scoreB 𝑧C After n such steps
16
Viterbi is an instance of max-product variable elimination
17
Viterbi is an instance of max-product variable elimination Challenge: What makes a good order?
18
19
– Viterbi, CKY algorithm, Dijkstra’s algorithm, and many more
– Memoization: Don’t re-compute something you already have – Requires an ordering of the variables
– The hypergraph may not allow for the best ordering of the variables – Existence of a dynamic programming algorithm does not mean polynomial time/space.
20
21
22
23
Goal: To find the highest scoring path in this trellis Time steps Different labels for each step
24
Goal: To find the highest scoring path in this trellis Different labels for each step
25
Goal: To find the highest scoring path in this trellis No cycles Nodes and edges have a specific meaning Ordering helps Different labels for each step
26
27
28
x1 x2 x3 y3 y2 y1
Suppose each y can be one
29
x1 x2 x3 y3 y2 y1
(-,-,-) Suppose each y can be one
Start state: No assignments
30
x1 x2 x3 y3 y2 y1
(-,-,-) (A,-,-) (B,-,-) (C,-,-) Suppose each y can be one
Fill in a label in a slot. The edge is scored by the factors that can be computed so far
31
x1 x2 x3 y3 y2 y1
(-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-) ….. Suppose each y can be one
Keep assigning values to slots
32
x1 x2 x3 y3 y2 y1
(-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-) (A,A,A) (C,C,C) ….. Suppose each y can be one
Till we reach a goal state
33
x1 x2 x3 y3 y2 y1 Suppose each y can be one
(-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-) (A,A,A) (C,C,C) ….. Note: Here we have assumed an
34
x1 x2 x3 y3 y2 y1 Suppose each y can be one
(-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-) (A,A,A) (C,C,C) ….. Note: Here we have assumed an
How do the transitions get scored?
35
x1 x2 x3 y3 y2 y1 Suppose each y can be one
(-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-) (A,A,A) (C,C,C) ….. The goal of inference: To traverse this graph from the start state and reach the end state that has the best (highest/lowest) score
36
37
Questions?
38
39
Example: Suppose we have a beam of size k = 2
40
Example: Suppose we have a beam of size k = 2 (−, −, −) At the beginning, the beam has
41
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam (A, −, −) (B, −, −) (C, −, −)
42
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states (A, −, −) (B, −, −) (C, −, −) 0.9 10
43
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states (A, −, −) (B, −, −) (C, −, −) 0.9 10
44
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states The top k new states form the new beam (sorted) (A, −, −) (𝐶, −, −) (𝐷, −, −) 0.9 10
45
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states The top k new states form the new beam (sorted) (B, −, −) (A, −, −)
46
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states The top k new states form the new beam (sorted) (B, −, −) (A, −, −) Now we are ready for the next step
47
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam (B, −, −) (A, −, −) B, A, − (B, B, −) (B, C, −) (A, A, −) (A, B, −) (A, C, −)
48
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states (B, −, −) (A, −, −) B, A, − (B, B, −) (B, C, −) (A, A, −) (A, B, −) (A, C, −) 0.1
10 20
4.1
49
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states The top k new states form the new beam (sorted) (B, −, −) (A, −, −) 0.1
10 20
4.1 B, A, − (B, B, −) (B, C, −) (A, A, −) (A, B, −) (A, C, −)
50
Example: Suppose we have a beam of size k = 2 (−, −, −) Expand all the states in the beam Score the newly created states The top k new states form the new beam (sorted) (B, −, −) (A, −, −) (A, A, −) (B, C, −)
51
Example: Suppose we have a beam of size k = 2 (−, −, −) (B, −, −) (A, −, −) (A, A, −) (B, C, −) (A, A, B) (B, C, C)
52
Example: Suppose we have a beam of size k = 2 (−, −, −) (B, −, −) (A, −, −) (A, A, −) (B, C, −) (𝐵, 𝐵, 𝐶) (B, C, C) Final answer: Top of the beam at the end of search
53
54
Questions?
55