Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel - PowerPoint PPT Presentation

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science University of Massachusetts, Amherst { belanger,sheldon,mccallum } @cs.umass.edu December 10, 2013

Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 2 / 26

Markov Random Fields December 10, 2013 4 / 26

Markov Random Fields � Φ θ ( x ) = θ c ( x c ) c ∈C December 10, 2013 4 / 26

Markov Random Fields � Φ θ ( x ) = θ c ( x c ) c ∈C P ( x ) = exp (Φ θ ( x )) log( Z ) December 10, 2013 4 / 26

Markov Random Fields � Φ θ ( x ) = θ c ( x c ) x → µ c ∈C P ( x ) = exp (Φ θ ( x )) log( Z ) December 10, 2013 4 / 26

Markov Random Fields � Φ θ ( x ) = θ c ( x c ) x → µ c ∈C P ( x ) = exp (Φ θ ( x )) Φ θ ( x ) → � θ , µ � log( Z ) December 10, 2013 4 / 26

Marginal Inference µ MARG = E P θ [ µ ] December 10, 2013 5 / 26

Marginal Inference µ MARG = E P θ [ µ ] µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) December 10, 2013 5 / 26

Marginal Inference µ MARG = E P θ [ µ ] µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) µ approx = arg max ¯ µ ∈L � µ , θ � + H B ( µ ) December 10, 2013 5 / 26

Marginal Inference µ MARG = E P θ [ µ ] µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) µ approx = arg max ¯ µ ∈L � µ , θ � + H B ( µ ) � H B ( µ ) = W c H ( µ c ) c ∈C December 10, 2013 5 / 26

MAP Inference µ MAP = arg max µ ∈M � µ , θ � December 10, 2013 6 / 26

MAP Inference µ MAP = arg max µ ∈M � µ , θ � θ µ MAP Black&Box&& MAP&Solver& December 10, 2013 6 / 26

MAP Inference µ MAP = arg max µ ∈M � µ , θ � θ µ MAP Black&Box&& MAP&Solver& θ µ MAP Gray&Box&& MAP&Solver& December 10, 2013 6 / 26

Marginal → MAP Reductions Hazan and Jaakkola [2012] Ermon et al. [2013] December 10, 2013 7 / 26

Generic FW with Line Search y t = arg min x ∈ X � x , −∇ f ( x t − 1 ) � x t = min γ ∈ [0 , 1] f ((1 − γ ) x t + γ y t ) December 10, 2013 9 / 26

Generic FW with Line Search Compute& x t Line&Search& &Gradient& Linear&& Minimiza<on& Oracle& �r f ( x t − 1 ) y t December 10, 2013 10 / 26

FW for Marginal Inference Compute&Gradient& µ t +1 Line&Search& r F ( µ t ) = θ + r H ( µ t ) MAP& Inference& ˜ ˜ Oracle& θ µ MAP December 10, 2013 11 / 26

Subproblem Parametrization � F ( µ ) = � µ , θ � + W c H ( µ c ) c ∈C December 10, 2013 12 / 26

Subproblem Parametrization � F ( µ ) = � µ , θ � + W c H ( µ c ) c ∈C ˜ � θ = ∇ F ( µ t ) = θ + W c ∇ H ( µ c ) c ∈C December 10, 2013 12 / 26

Line Search µ t µ t +1 ˜ µ MAP December 10, 2013 13 / 26

Line Search µ t µ t +1 ˜ µ MAP Computing line search objective can scale with: December 10, 2013 13 / 26

Line Search µ t µ t +1 ˜ µ MAP Computing line search objective can scale with: Bad: # possible values in cliques. December 10, 2013 13 / 26

Line Search µ t µ t +1 ˜ µ MAP Computing line search objective can scale with: Bad: # possible values in cliques. Good: # cliques in graph. (see paper) December 10, 2013 13 / 26

Experiment #1 December 10, 2013 14 / 26

Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) December 10, 2013 16 / 26

Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) δ C f t +2 MAP suboptimality at iter t December 10, 2013 16 / 26

Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) δ C f t +2 MAP suboptimality at iter t − → NP-Hard December 10, 2013 16 / 26

Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) δ C f t +2 MAP suboptimality at iter t − → NP-Hard How to deal with MAP hardness? Use MAP solver and hope for the best [Hazan and Jaakkola, 2012]. Relax to the local polytope . December 10, 2013 16 / 26

Curvature + Convergence Rate 2 C f = sup γ 2 ( f ( y ) − f ( x ) − � y − x , ∇ f ( x ) � ) x , s ∈ D ; γ ∈ [0 , 1]; y = x + γ ( s − x ) December 10, 2013 17 / 26

Curvature + Convergence Rate 2 C f = sup γ 2 ( f ( y ) − f ( x ) − � y − x , ∇ f ( x ) � ) x , s ∈ D ; γ ∈ [0 , 1]; y = x + γ ( s − x ) 0.7 0.6 0.5 0.4 entropy 0.3 µ t 0.2 µ t +1 0.1 0 0 0.2 0.4 0.6 0.8 1 prob x = 1 ˜ µ MAP December 10, 2013 17 / 26

Experiment #2 December 10, 2013 18 / 26

Beyond MRFs Question Are MRFs the right Gibbs distribution to use Frank-Wolfe? December 10, 2013 20 / 26

Beyond MRFs Question Are MRFs the right Gibbs distribution to use Frank-Wolfe? Problem Family MAP Algorithm Marginal Algorithm tree-structured graphical models Viterbi Forward-Backward loopy graphical models Max-Product BP Sum-Product BP Directed Spanning Tree Chu Liu Edmonds Matrix Tree Theorem Bipartite Matching Hungarian Algorithm × December 10, 2013 20 / 26

norm-regularized marginal inference µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) + λ R ( µ ) Harchaoui et al. [2013]. December 10, 2013 22 / 26

norm-regularized marginal inference µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) + λ R ( µ ) Harchaoui et al. [2013]. Local linear oracle for MRFs? µ t = arg ˜ µ ∈M∩ B r ( µ t ) � µ , θ � max Garber and Hazan [2013] December 10, 2013 22 / 26

Conclusion We need to figure out how to handle the entropy gradient. December 10, 2013 23 / 26

Conclusion We need to figure out how to handle the entropy gradient. There are plenty of extensions to further Gibbs distributions + regularizers. December 10, 2013 23 / 26

Further Reading I Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming the curse of dimensionality: Discrete integration by hashing and optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 334–342, 2013. D. Garber and E. Hazan. A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization. ArXiv e-prints , January 2013. Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradient algorithms for norm-regularized smooth convex optimization. arXiv preprint arXiv:1302.2325 , 2013. Tamir Hazan and Tommi S Jaakkola. On the Partition Function and Random Maximum A-Posteriori Perturbations. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) , pages 991–998, 2012. Bert Huang and Tony Jebara. Approximating the permanent with belief propagation. arXiv preprint arXiv:0908.1769 , 2009. December 10, 2013 24 / 26

Further Reading II Mark Huber. Exact sampling from perfect matchings of dense regular bipartite graphs. Algorithmica , 44(3):183–193, 2006. Martin Jaggi. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 427–435, 2013. James Petterson, Tiberio Caetano, Julian McAuley, and Jin Yu. Exponential family graph matching and ranking. 2009. Tim Roughgarden and Michael Kearns. Marginals-to-models reducibility. In Advances in Neural Information Processing Systems , pages 1043–1051, 2013. Maksims Volkovs and Richard S Zemel. Efficient sampling for bipartite matching problems. In Advances in Neural Information Processing Systems , pages 1322–1330, 2012. Pascal O Vontobel. The bethe permanent of a non-negative matrix. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on , pages 341–346. IEEE, 2010. December 10, 2013 25 / 26

Finding the Marginal Matching Sampling Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012]. December 10, 2013 26 / 26

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel - PowerPoint PPT Presentation

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science University of Massachusetts, Amherst { belanger,sheldon,mccallum } @cs.umass.edu December 10, 2013 Table of Contents Markov

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Glass Clean-up Systems in MRFs MAY 16, 2018 | NERC Webinar The Question Is there a productive

Admixture of Poisson MRFs (APM): A Topic Model with Word Dependencies David Inouye*, Pradeep

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Developing tools to identify marginal lands and assess their potential for bioenergy production

Short Run Marginal Cost Short Run Marginal Cost K Peter Kolf General Manager Economic

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation & marginal valuation & un-priced

Joint and marginal probabilities Joint: Marginal: How to compute the probability of observations

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at

Live eMate eMate repair at WWNC repair at WWNC Live Frank Gr Gr ndel ndel Frank

Brooke Nash Branch Chief, Municipal Waste Reduction Big Picture Massachusetts 8 MRFS (one

Is the Glass Half Empty or Half Full: The State of Glass Recycling at U.S. MRFs Northeast

FPGAs for Image Processing A DSL and program transformations Rob Stewart Greg Michaelson Idress

Towards a theory of Undo Aaron Brown UC Berkeley June 2002 ROC Retreat Outline Recap of

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony

Graphical Models Graphical Models Relationship between the directed & undirected models

Quantifying the Performance Impacts of Using Local Memory for Many-Core Processors Jianbin Fang 1

Automated Key Management for End-To-End Encrypted Email Communication Intermediate talk for the

Network Administration HW4 Checkpoints tzute Computer Center, CS, NCTU Overview (1/3) A. Check

Problems in Software Composition Stephen Kell Stephen.Kell@cl.cam.ac.uk Problems in Software

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel - PowerPoint PPT Presentation

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science University of Massachusetts, Amherst { belanger,sheldon,mccallum } @cs.umass.edu December 10, 2013 Table of Contents Markov

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Glass Clean-up Systems in MRFs MAY 16, 2018 | NERC Webinar The Question Is there a productive

Admixture of Poisson MRFs (APM): A Topic Model with Word Dependencies David Inouye*, Pradeep

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Developing tools to identify marginal lands and assess their potential for bioenergy production

Short Run Marginal Cost Short Run Marginal Cost K Peter Kolf General Manager Economic

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation &amp; marginal valuation &amp; un-priced

Joint and marginal probabilities Joint: Marginal: How to compute the probability of observations

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at

Live eMate eMate repair at WWNC repair at WWNC Live Frank Gr Gr ndel ndel Frank

Brooke Nash Branch Chief, Municipal Waste Reduction Big Picture Massachusetts 8 MRFS (one

Is the Glass Half Empty or Half Full: The State of Glass Recycling at U.S. MRFs Northeast

FPGAs for Image Processing A DSL and program transformations Rob Stewart Greg Michaelson Idress

Towards a theory of Undo Aaron Brown UC Berkeley June 2002 ROC Retreat Outline Recap of

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Quantifying the Performance Impacts of Using Local Memory for Many-Core Processors Jianbin Fang 1

Automated Key Management for End-To-End Encrypted Email Communication Intermediate talk for the

Network Administration HW4 Checkpoints tzute Computer Center, CS, NCTU Overview (1/3) A. Check

Problems in Software Composition Stephen Kell Stephen.Kell@cl.cam.ac.uk Problems in Software

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation & marginal valuation & un-priced

Graphical Models Graphical Models Relationship between the directed & undirected models