Inference: Graph Search CS 6355: Structured Prediction 1 So far in - PowerPoint PPT Presentation

Inference: Graph Search CS 6355: Structured Prediction 1

So far in the class • Thinking about structures – A graph, a collection of parts that are labeled jointly, a collection of decisions • Algorithms for learning – Local learning • Learn parameters for individual components independently • Learning algorithm not aware of the full structure – Global learning • Learn parameters for the full structure • Learning algorithm “knows” about the full structure • Next: Prediction – Sets structured prediction apart from binary/multiclass 2

Inference What is inference? • – An overview of what we have seen before – Combinatorial optimization – Different views of inference Graph algorithms • – Dynamic programming, greedy algorithms, search Integer programming • Heuristics for inference • – Sampling Learning to search • 3

Inference What is inference? • – An overview of what we have seen before – Combinatorial optimization – Different views of inference Graph algorithms • – Dynamic programming, greedy algorithms, search Integer programming • Heuristics for inference • – Sampling Learning to search • 4

Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) 5

Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer 6

Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 7

Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 8

Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 9

Variable elimination example transitions(𝑧 0 , 𝑧 4 ) A B C D A B C D … y 1 y 2 y 3 y n A B C D emissions 𝑧 0 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score 4 𝑧 4 = max score 0 𝑧 0 + score-local 𝑧 0 , 𝑧 4 First eliminate y 1 < = 10

Variable elimination example transitions(𝑧 4 , 𝑧 > ) A B C D A B C D … y 2 y 3 y n A B C D score 4 𝑧 4 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 11

Variable elimination example transitions(𝑧 4 , 𝑧 > ) A B C D A B C D … y 2 y 3 y n A B C D score 4 𝑧 4 score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score > 𝑧 > = max score 4 𝑧 4 + score-local 𝑧 4 , 𝑧 > Next eliminate y 2 < ? 12

Variable elimination example transitions(𝑧 > , 𝑧 @ ) A B C D A B C D … y 3 y n A B C D score > 𝑧 > score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) 13

Variable elimination example transitions(𝑧 > , 𝑧 @ ) A B C D A B C D … y 3 y n A B C D score > 𝑧 > score-local 𝑧 8 , 𝑧 890 = emissions 𝑧 890 + transitions(𝑧 8 , 𝑧 890 ) score @ 𝑧 @ = max score > 𝑧 > + score-local 𝑧 > , 𝑧 @ Next eliminate y 3 < A 14

Variable elimination example y n A B C D score B 𝑧 C After n such steps We have all the information to make a decision for y n 15

Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 16

Variable elimination: Max-product We have a collection of inference variables that need to be assigned 𝐳 = (𝑧 1 , 𝑧 2 , ! ) General algorithm Challenge: What makes a good order? – First fix an ordering of the variables, say (𝑧 1 , 𝑧 2 , ! ) – Iteratively: • Find the best value for y i given the values of the previous neighbors – Use back pointers to find final answer Viterbi is an instance of max-product variable elimination 17

� Max-product algorithm • Where is the “product” in max-product? 𝐱 E 𝜚 𝐲, 𝐳 = H score-local(𝑧 8 , 𝑧 890 ) 8 • Generalizes beyond sequence models – Requires a clever ordering of the output variables – Exact inference when the output is a tree • If not, no guarantees • Also works for summing over all structures – Sum-product message passing – Belief propagation 18

� Max-product algorithm • Where is the “product” in max-product? 𝐱 E 𝜚 𝐲, 𝐳 = H score-local(𝑧 8 , 𝑧 890 ) 8 • Generalizes beyond sequence models – Requires a clever ordering of the output variables – Exact inference when the output is a tree • If not, no guarantees • Also works for summing over all structures – Sum-product message passing – Belief propagation 19

Dynamic programming General solution strategy for inference • Examples • – Viterbi, CKY algorithm, Dijkstra’s algorithm, and many more Key ideas: • – Memoization: Don’t re-compute something you already have – Requires an ordering of the variables Remember: • – The hypergraph may not allow for the best ordering of the variables – Existence of a dynamic programming algorithm does not mean polynomial time/space. • State space may be too big. Use heuristics such as beam search 20

Graph algorithms for inference • Many graph algorithms you have seen are applicable for inference • Some examples – “Best” path. Eg: Viterbi, parsing – Min-cut/max-flow. Eg: Image segmentation – Maximum spanning tree. Eg: Dependency parsing – Bipartite matching. Eg: Aligning sequences 21

Best path for inference • Broad description of approach: – Construct a graph/hypergraph from the input and output – Decompose the total score along edge/hyperedges – Inference is finding the shortest/longest path in this weighted graph Viterbi algorithm finds a shortest path in a specific graph! 22

Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step Time steps 23

Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step 24

Viterbi algorithm as best path Goal: To find the highest scoring path in this trellis Different labels for each step No cycles Nodes and edges have a specific meaning 25 Ordering helps

Best path algorithms • Dijkstra’s algorithm – Cost functions should be non-negative • Bellman-ford algorithm – Slower than Dijkstra’s algorithm but works with negative weights • A* search – If you have a heuristic that gives the future path cost from a state but does not over-estimate it 26

Inference as search: Setting • Predicting a graph as a sequence of decisions • Data structures: – State: Encodes partial structure – Transitions: Move from one partial structure to another – Start state – End state: We have a full structure • There may be more than one end state • Each transition is scored with the learned model • Goal: Find an end state that has the highest total score 27

Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • Transition: Fill in one of the unknowns • Start state: (-,-,-) • End state: All three y’s are assigned • 28

Example Suppose each y can be one of A, B or C y 3 y 1 y 2 Start state: No assignments x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • Transition: Fill in one of the unknowns • Start state: (-,-,-) • End state: All three y’s are assigned • 29

Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns • Fill in a label in a slot. The edge is scored by the factors Start state: (-,-,-) • that can be computed so far End state: All three y’s are assigned • 30

Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown • (A, -, -), (-, A, A), (-, -, -),… • (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns • ….. (A,A,-) (C,C,-) Start state: (-,-,-) • Keep assigning values to slots End state: All three y’s are assigned • 31

Inference: Graph Search CS 6355: Structured Prediction 1 So far in - PowerPoint PPT Presentation

Inference: Graph Search CS 6355: Structured Prediction 1 So far in the class Thinking about structures A graph, a collection of parts that are labeled jointly, a collection of decisions Algorithms for learning Local learning

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Graph Search Methods Graph Search Methods A search method starts at a given vertex v and

Graph Search + DAGs Breadth First Search Breadth First Search (BFS) Idea Explore a graph

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

TDDC17 Separation property of graph search Systematic Search through the state space Seminar 3

Graph Algorithms L.F.O.A. Lecture Full Of Acronyms Graph Search Algorithms The most basic graph

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

A* and Weighted A* Search Maxim Likhachev Carnegie Mellon University Planning as Graph Search

Lecture 13: Graphs I: Breadth First Search Lecture Overview Applications of Graph Search

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

The Elimination Algorithm Chris Williams School of Informatics, University of Edinburgh October

Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2017

Re Reas ason onin ing g Un Unde der Un Uncerta tain inty ty: Varia Va iabl ble eli

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

graphical models Class 1 Rina Dechter Dechter-Morgan&claypool book (Dbook): Chapters 1-2

Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany

Linear Systems II CS3220 Summer 2008 Jonathan Kaldor Revisiting the LU Factorization Goal:

Bayes Net Representation CS 4100: Artificial Intelligence Bayes Nets: Sampling A A di

Inference: Graph Search CS 6355: Structured Prediction 1 So far in - PowerPoint PPT Presentation

Inference: Graph Search CS 6355: Structured Prediction 1 So far in the class Thinking about structures A graph, a collection of parts that are labeled jointly, a collection of decisions Algorithms for learning Local learning

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Graph Search Methods Graph Search Methods A search method starts at a given vertex v and

Graph Search + DAGs Breadth First Search Breadth First Search (BFS) Idea Explore a graph

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

TDDC17 Separation property of graph search Systematic Search through the state space Seminar 3

Graph Algorithms L.F.O.A. Lecture Full Of Acronyms Graph Search Algorithms The most basic graph

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

A* and Weighted A* Search Maxim Likhachev Carnegie Mellon University Planning as Graph Search

Lecture 13: Graphs I: Breadth First Search Lecture Overview Applications of Graph Search

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

The Elimination Algorithm Chris Williams School of Informatics, University of Edinburgh October

Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2017

Re Reas ason onin ing g Un Unde der Un Uncerta tain inty ty: Varia Va iabl ble eli

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

graphical models Class 1 Rina Dechter Dechter-Morgan&amp;claypool book (Dbook): Chapters 1-2

Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany

Linear Systems II CS3220 Summer 2008 Jonathan Kaldor Revisiting the LU Factorization Goal:

Bayes Net Representation CS 4100: Artificial Intelligence Bayes Nets: Sampling A A di

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

graphical models Class 1 Rina Dechter Dechter-Morgan&claypool book (Dbook): Chapters 1-2