Probabilistic Graphical Models Probabilistic Graphical Models MAP - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019

Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference

Optimization Optimization ∗ x = arg max f ( x ) x

Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ...

Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ... local search heuristics hill-climbing beam search tabu search .. simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments

Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ... local search heuristics what if f(x) is structured? hill-climbing f ( x ) = ( x ) ∑ I f I I beam search MAP inference in a graphical model tabu search .. simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments

Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x side-chain prediction as MAP inference (Yanover & Weiss)

Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x Marginal MAP arg max p ( x , y ) x ∑ y side-chain prediction as MAP inference (Yanover & Weiss) given Bayes-net for , deciding whether for p ( x , y ) p ( x ) > c decision some is complete for problem x NP PP is NP-hard even for trees a non-deterministic Turing machine that accepts if the majority of paths accept a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)

Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference

Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e )

Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e ) log domain: ~ arg max p ( x ) ≡ arg max ln ϕ ( x ) ≡ arg min − ln ( x ) x ∑ I p x I I x aka max-sum inference aka min-sum inference (energy minimization)

Max-marginals Max-marginals marginal used in sum-product inference ϕ ( x , y ) ∑ x ∈ V al ( x ) is replaced with max-marginal max ϕ ( x , y ) x ∈ V al ( x ) ϕ ( a , b , c ) ϕ ( a , b , c ) b max ϕ ( a , c ) = ′

distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations

distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations save computation by factoring the operations in disguise max f ( x , y ) g ( y , z ) = max g ( y , z ) max f ( x , y ) x , y y x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )

Max-product Max-product variable elimination variable elimination the procedure is similar to VE for sum-product inference eliminate all the variables input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ( x ) x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) maximizing value x p ~ Z = ( x ) ∑ x p similar to the partition function:

Decoding Decoding the max-value the max-value we need to recover the maximizing assignment x ∗ keep , produced during inference { ψ , … , ψ } t =1 t = n input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ( x ) x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) x p

Decoding Decoding the max-value the max-value start from the last eliminated variable should have been a function of alone: ∗ ← arg max ψ x x ψ t = n i i n n n t =0 input: a set of factors (e.g. CPDs) Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max ( x ) x ∏ I output: x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) x p

Decoding Decoding the max-value the max-value start from the last eliminated variable ∗ at this point we have x i n can only have in its domain , x ∗ ← arg max ( x , x ) x ψ x ψ t = n −1 n −1 ∗ i i i x i i n −1 n −1 n −1 n i n −1 n and so on... input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max x ∏ I ( x ) output: x p ϕ I I , … , x go over in some order: x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the product of scalars in as ~ Φ t = m max ( x ) x p

Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y )  = max ϕ ( x , y ) x ∑ y ∑ y x

Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y )  = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order

Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y )  = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n

Marginal-MAP variable elimination Marginal-MAP variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y ) =  max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) { y , … , y } 1 m decode the maximizing value

Probabilistic Graphical Models Probabilistic Graphical Models MAP - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

A Memetic Algorithm for Water Distribution Network Design R. Baos* , C. Gil, J. I. Agulleiro,

Problem Solving by General Purpose Solvers Toshihide IBARAKI Kwansei Gakuin University Topics

Chapter 7 Stochastic Local Search Michaja Pressmar 13.11.2014 Motivation n -queens with

Hemodialysis Initial reasonable 0 with associated MSE 0 Iterative approach Iteration i

but not as we know it tsp But first, an example TSP given n cities with x/y coordinates

Asymptotic Behaviour of the Quadratic Knapsack Problem Joachim Schauer Department of Statistics

Matplotlib October 9, 2018 1 Lecture 16: Visualization with matplotlib CBIO (CSCI) 4835/6835:

Module 2 Image acquisition & preprocessing Uwe Springmann Centrum fr Informations- und

Probabilistic Graphical Models Probabilistic Graphical Models MAP - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

A Memetic Algorithm for Water Distribution Network Design R. Baos* , C. Gil*, J. I. Agulleiro*,

Problem Solving by General Purpose Solvers Toshihide IBARAKI Kwansei Gakuin University Topics

Chapter 7 Stochastic Local Search Michaja Pressmar 13.11.2014 Motivation n -queens with

Hemodialysis Initial reasonable 0 with associated MSE 0 Iterative approach Iteration i

but not as we know it tsp But first, an example TSP given n cities with x/y coordinates

Asymptotic Behaviour of the Quadratic Knapsack Problem Joachim Schauer Department of Statistics

Matplotlib October 9, 2018 1 Lecture 16: Visualization with matplotlib CBIO (CSCI) 4835/6835:

Module 2 Image acquisition &amp; preprocessing Uwe Springmann Centrum fr Informations- und

A Memetic Algorithm for Water Distribution Network Design R. Baos* , C. Gil, J. I. Agulleiro,

Module 2 Image acquisition & preprocessing Uwe Springmann Centrum fr Informations- und