Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh - PowerPoint PPT Presentation

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018

Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference

Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x side-chain prediction as MAP inference (Yanover & Weiss)

Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x Marginal MAP arg max p ( x , y ) x ∑ y side-chain prediction as MAP inference (Yanover & Weiss) given Bayes-net for , deciding whether for p ( x , y ) p ( x ) > c decision some is complete for problem x NP PP is NP-hard even for trees a non-deterministic Turing machine that accepts if the majority of paths accept cannot use the distributive law a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)

Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ϕ ( x ) x x Z I I ~ ≡ arg max ( x ) = arg max ϕ ( x ) x ∏ I x p I I ignore the normalization constant aka max-product inference

Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ϕ ( x ) x x Z I I ~ ≡ arg max ( x ) = arg max ϕ ( x ) x ∏ I x p I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e )

Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ϕ ( x ) x x Z I I ~ ≡ arg max ( x ) = arg max ϕ ( x ) x ∏ I x p I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e ) log domain: ~ arg max p ( x ) ≡ arg max ln ϕ ( x ) ≡ arg min − ln ( x ) x ∑ I p x I I x aka max-sum inference aka min-sum inference (energy minimization)

Max-marginals Max-marginals marginal used in sum-product inference ϕ ( x , y ) ∑ x ∈ V al ( x ) is replaced with max-marginal max ϕ ( x , y ) x ∈ V al ( x ) ϕ ( a , c ) = max ϕ ( a , b , c ) ϕ ( a , b , c ) b ′

distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations

distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations save computation by factoring the operations in disguise max f ( x , y ) g ( y , z ) = max g ( y , z ) max f ( x , y ) x , y y x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )

Max-product Max-product variable elimination variable elimination the procedure is similar to VE for sum-product inference eliminate all the variables input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ϕ ( x ) x p I I go over in some order: x , … , x i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the scalar in as ~ Φ t = m max ( x ) maximizing value x p ~ Z = ( x ) ∑ x p similar to the partition function:

Decoding Decoding the max-value the max-value we need to recover the maximizing assignment x ∗ keep , produced during inference { ψ , … , ψ } t =1 t = n input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ϕ ( x ) x p I I go over in some order: x , … , x i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the scalar in as ~ Φ t = m max ( x ) x p

Decoding Decoding the max-value the max-value start from the last eliminated variable should have been a function of alone: ∗ ← arg max ψ x i n x ψ t = n i n n t =0 input: a set of factors (e.g. CPDs) Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max ϕ ( x ) x ∏ I output: x p I I go over in some order: x , … , x i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the scalar in as ~ Φ t = m max ( x ) x p

Decoding Decoding the max-value the max-value start from the last eliminated variable ∗ at this point we have x i n can only have in its domain , x ∗ ← arg max ( x , x ) x ψ t = n −1 x ψ n −1 ∗ i n −1 i n i n −1 x in −1 i n −1 i n and so on... input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max x ∏ I ϕ ( x ) output: x p I I x , … , x go over in some order: i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the product of scalars in as ~ Φ t = m max ( x ) x p

Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x

Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order

Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n

Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) { y , … , y } 1 m decode the maximizing value

Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) example: exponential complexity despite { y , … , y } 1 m low tree-width decode the maximizing value

Max-product BP Max-product BP In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same !

Max-product BP Max-product BP In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same ! main differences : replacing sum with max decoding the maximizing assignment variational interpretation

Max-product BP Max-product BP ψ {1,2,4} ψ {3,5} Example factor-graph 1 ∏ I p ( x ) = ψ ( x ) I I Z x 1 x 5 x 2 x 3 x 4

Max-product BP Max-product BP ψ {1,2,4} ψ {3,5} Example factor-graph 1 ∏ I p ( x ) = ψ ( x ) I I Z x 1 x 5 x 2 x 3 x 4 variable-to-factor message: ( x ) ∝ ∏ J ∣ i ∈ J , J ≠ I ( x ) δ δ i → I J → i i i

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh - PowerPoint PPT Presentation

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Logic, Complexity, and Infinite Computations Olivier Finkel Equipe de Logique Math ematique

Complexity 373F20 - Nisarg Shah 1 Recap Linear Programming Standard formulation

Fundamentele Informatica 3 voorjaar 2014 http://www.liacs.nl/home/rvvliet/fi3/ Rudy van Vliet

Concepts of programming languages Janus Joris ten Tusscher, Joris Burgers, Ivo Gabe de Wolff, Cas

Population protocols and Turing machines LEF` EVRE Jonas LIX May 24, 2011 LEF` EVRE Jonas

Machine Models for Stream-Based Processing of External Memory Data Nicole Schweikardt

Simulating EXPSPACE Turing machines using P systems with active membranes Artiom Alhazov 1 , 2

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh - PowerPoint PPT Presentation

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Logic, Complexity, and Infinite Computations Olivier Finkel Equipe de Logique Math ematique

Complexity 373F20 - Nisarg Shah 1 Recap Linear Programming Standard formulation

Fundamentele Informatica 3 voorjaar 2014 http://www.liacs.nl/home/rvvliet/fi3/ Rudy van Vliet

Concepts of programming languages Janus Joris ten Tusscher, Joris Burgers, Ivo Gabe de Wolff, Cas

Population protocols and Turing machines LEF` EVRE Jonas LIX May 24, 2011 LEF` EVRE Jonas

Machine Models for Stream-Based Processing of External Memory Data Nicole Schweikardt

Simulating EXPSPACE Turing machines using P systems with active membranes Artiom Alhazov 1 , 2

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Graphical Models Graphical Models Relationship between the directed & undirected models

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical