inference graph search
play

Inference: Graph Search CS 6355: Structured Prediction 1 So far in - PowerPoint PPT Presentation

Inference: Graph Search CS 6355: Structured Prediction 1 So far in the class Thinking about structures A graph, a collection of parts that are labeled jointly, a collection of decisions Algorithms for learning Local learning


  1. Inference: Graph Search CS 6355: Structured Prediction 1

  2. So far in the class • Thinking about structures – A graph, a collection of parts that are labeled jointly, a collection of decisions • Algorithms for learning – Local learning • Learn parameters for individual components independently • Learning algorithm not aware of the full structure – Global learning • Learn parameters for the full structure • Learning algorithm “knows” about the full structure • Next: Prediction – Sets structured prediction apart from binary/multiclass 2

  3. Inference What is inference? • – An overview of what we have seen before – Combinatorial optimization – Different views of inference Graph algorithms • – Dynamic programming, greedy algorithms, search Integer programming • Heuristics for inference • – Sampling Learning to search • 3

  4. Inference What is inference? • – An overview of what we have seen before – Combinatorial optimization – Different views of inference Graph algorithms • – Dynamic programming, greedy algorithms, search Integer programming • Heuristics for inference • – Sampling Learning to search • 4

  5. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) 5

  6. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer 6

  7. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 7

  8. Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 8

  9. Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 9

  10. Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score 4 𝑧 4 = max score 0 𝑧 0 + score-local 𝑧 0 , 𝑧 4 First eliminate y 1 < = 10

  11. Variable elimination example transitions(𝑧 4 , 𝑧 > ) A B C D A B C D … y 2 y 3 y n A B C D score 4 𝑧 4 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 11

  12. Variable elimination example transitions(𝑧 4 , 𝑧 > ) A B C D A B C D … y 2 y 3 y n A B C D score 4 𝑧 4 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score > 𝑧 > = max score 4 𝑧 4 + score-local 𝑧 4 , 𝑧 > Next eliminate y 2 < ? 12

  13. Variable elimination example transitions(𝑧 > , 𝑧 @ ) A B C D A B C D … y 3 y n A B C D score > 𝑧 > score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 13

  14. Variable elimination example transitions(𝑧 > , 𝑧 @ ) A B C D A B C D … y 3 y n A B C D score > 𝑧 > score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score @ 𝑧 @ = max score > 𝑧 > + score-local 𝑧 > , 𝑧 @ Next eliminate y 3 < A 14

  15. Variable elimination example y n A B C D score B 𝑧 C After n such steps We have all the information to make a decision for y n 15

  16. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 16

  17. Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm Challenge: What makes a good order? – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 17

  18. � Max-product algorithm • Where is the “product” in max-product? 𝐱 E 𝜚 𝐲, 𝐳 = H score-local(𝑧 8 , 𝑧 890 ) 8 • Generalizes beyond sequence models – Requires a clever ordering of the output variables – Exact inference when the output is a tree • If not, no guarantees • Also works for summing over all structures – Sum-product message passing – Belief propagation 18

  19. � Max-product algorithm • Where is the “product” in max-product? 𝐱 E 𝜚 𝐲, 𝐳 = H score-local(𝑧 8 , 𝑧 890 ) 8 • Generalizes beyond sequence models – Requires a clever ordering of the output variables – Exact inference when the output is a tree • If not, no guarantees • Also works for summing over all structures – Sum-product message passing – Belief propagation 19

  20. Dynamic programming General solution strategy for inference • Examples • – Viterbi, CKY algorithm, Dijkstra’s algorithm, and many more Key ideas: • – Memoization: Don’t re-compute something you already have – Requires an ordering of the variables Remember: • – The hypergraph may not allow for the best ordering of the variables – Existence of a dynamic programming algorithm does not mean polynomial time/space. • State space may be too big. Use heuristics such as beam search 20

  21. Graph algorithms for inference • Many graph algorithms you have seen are applicable for inference • Some examples – “Best” path. Eg: Viterbi, parsing – Min-cut/max-flow. Eg: Image segmentation – Maximum spanning tree. Eg: Dependency parsing – Bipartite matching. Eg: Aligning sequences 21

  22. Best path for inference • Broad description of approach: – Construct a graph/hypergraph from the input and output – Decompose the total score along edge/hyperedges – Inference is finding the shortest/longest path in this weighted graph Viterbi algorithm finds a shortest path in a specific graph! 22

  23. Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step Time steps 23

  24. Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step 24

  25. Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step No cycles Nodes and edges have a specific meaning 25 Ordering helps

  26. Best path algorithms • Dijkstra’s algorithm – Cost functions should be non-negative • Bellman-ford algorithm – Slower than Dijkstra’s algorithm but works with negative weights • A* search – If you have a heuristic that gives the future path cost from a state but does not over-estimate it 26

  27. Inference as search: Setting • Predicting a graph as a sequence of decisions • Data structures: – State: Encodes partial structure – Transitions: Move from one partial structure to another – Start state – End state: We have a full structure • There may be more than one end state • Each transition is scored with the learned model • Goal: Find an end state that has the highest total score 27

  28. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • Transition: Fill in one of the unknowns • Start state: (-,-,-) • End state: All three y’s are assigned • 28

  29. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 Start state: No assignments x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • Transition: Fill in one of the unknowns • Start state: (-,-,-) • End state: All three y’s are assigned • 29

  30. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns • Fill in a label in a slot. The edge is scored by the factors Start state: (-,-,-) • that can be computed so far End state: All three y’s are assigned • 30

  31. Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns • ….. (A,A,-) (C,C,-) Start state: (-,-,-) • Keep assigning values to slots End state: All three y’s are assigned • 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend