probabilistic graphical models probabilistic graphical
play

Probabilistic Graphical Models Probabilistic Graphical Models MAP - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum


  1. Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019

  2. Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference

  3. Optimization Optimization ∗ x = arg max f ( x ) x

  4. Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ...

  5. Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ... local search heuristics hill-climbing beam search tabu search .. simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments

  6. Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ... local search heuristics what if f(x) is structured? hill-climbing f ( x ) = ( x ) ∑ I f I I beam search MAP inference in a graphical model tabu search .. simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments

  7. Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x side-chain prediction as MAP inference (Yanover & Weiss)

  8. Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x Marginal MAP arg max p ( x , y ) x ∑ y side-chain prediction as MAP inference (Yanover & Weiss) given Bayes-net for , deciding whether for p ( x , y ) p ( x ) > c decision some is complete for problem x NP PP is NP-hard even for trees a non-deterministic Turing machine that accepts if the majority of paths accept a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)

  9. Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference

  10. Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e )

  11. Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e ) log domain: ~ arg max p ( x ) ≡ arg max ln ϕ ( x ) ≡ arg min − ln ( x ) x ∑ I p x I I x aka max-sum inference aka min-sum inference (energy minimization)

  12. Max-marginals Max-marginals marginal used in sum-product inference ϕ ( x , y ) ∑ x ∈ V al ( x ) is replaced with max-marginal max ϕ ( x , y ) x ∈ V al ( x ) ϕ ( a , b , c ) ϕ ( a , b , c ) b max ϕ ( a , c ) = ′

  13. distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations

  14. distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations save computation by factoring the operations in disguise max f ( x , y ) g ( y , z ) = max g ( y , z ) max f ( x , y ) x , y y x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )

  15. Max-product Max-product variable elimination variable elimination the procedure is similar to VE for sum-product inference eliminate all the variables input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ( x ) x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) maximizing value x p ~ Z = ( x ) ∑ x p similar to the partition function:

  16. Decoding Decoding the max-value the max-value we need to recover the maximizing assignment x ∗ keep , produced during inference { ψ , … , ψ } t =1 t = n input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ( x ) x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) x p

  17. Decoding Decoding the max-value the max-value start from the last eliminated variable should have been a function of alone: ∗ ← arg max ψ x x ψ t = n i i n n n t =0 input: a set of factors (e.g. CPDs) Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max ( x ) x ∏ I output: x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) x p

  18. Decoding Decoding the max-value the max-value start from the last eliminated variable ∗ at this point we have x i n can only have in its domain , x ∗ ← arg max ( x , x ) x ψ x ψ t = n −1 n −1 ∗ i i i x i i n −1 n −1 n −1 n i n −1 n and so on... input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max x ∏ I ( x ) output: x p ϕ I I , … , x go over in some order: x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the product of scalars in as ~ Φ t = m max ( x ) x p

  19. Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y )  = max ϕ ( x , y ) x ∑ y ∑ y x

  20. Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y )  = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order

  21. Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y )  = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n

  22. Marginal-MAP variable elimination Marginal-MAP variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y ) =  max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) { y , … , y } 1 m decode the maximizing value

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend