marginal inference in mrfs using frank wolfe
play

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel - PowerPoint PPT Presentation

Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science University of Massachusetts, Amherst { belanger,sheldon,mccallum } @cs.umass.edu December 10, 2013 Table of Contents Markov


  1. Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science University of Massachusetts, Amherst { belanger,sheldon,mccallum } @cs.umass.edu December 10, 2013

  2. Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 2 / 26

  3. Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 3 / 26

  4. Markov Random Fields December 10, 2013 4 / 26

  5. Markov Random Fields � Φ θ ( x ) = θ c ( x c ) c ∈C December 10, 2013 4 / 26

  6. Markov Random Fields � Φ θ ( x ) = θ c ( x c ) c ∈C P ( x ) = exp (Φ θ ( x )) log( Z ) December 10, 2013 4 / 26

  7. Markov Random Fields � Φ θ ( x ) = θ c ( x c ) x → µ c ∈C P ( x ) = exp (Φ θ ( x )) log( Z ) December 10, 2013 4 / 26

  8. Markov Random Fields � Φ θ ( x ) = θ c ( x c ) x → µ c ∈C P ( x ) = exp (Φ θ ( x )) Φ θ ( x ) → � θ , µ � log( Z ) December 10, 2013 4 / 26

  9. Marginal Inference µ MARG = E P θ [ µ ] December 10, 2013 5 / 26

  10. Marginal Inference µ MARG = E P θ [ µ ] µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) December 10, 2013 5 / 26

  11. Marginal Inference µ MARG = E P θ [ µ ] µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) µ approx = arg max ¯ µ ∈L � µ , θ � + H B ( µ ) December 10, 2013 5 / 26

  12. Marginal Inference µ MARG = E P θ [ µ ] µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) µ approx = arg max ¯ µ ∈L � µ , θ � + H B ( µ ) � H B ( µ ) = W c H ( µ c ) c ∈C December 10, 2013 5 / 26

  13. MAP Inference µ MAP = arg max µ ∈M � µ , θ � December 10, 2013 6 / 26

  14. MAP Inference µ MAP = arg max µ ∈M � µ , θ � θ µ MAP Black&Box&& MAP&Solver& December 10, 2013 6 / 26

  15. MAP Inference µ MAP = arg max µ ∈M � µ , θ � θ µ MAP Black&Box&& MAP&Solver& θ µ MAP Gray&Box&& MAP&Solver& December 10, 2013 6 / 26

  16. Marginal → MAP Reductions Hazan and Jaakkola [2012] Ermon et al. [2013] December 10, 2013 7 / 26

  17. Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 8 / 26

  18. Generic FW with Line Search y t = arg min x ∈ X � x , −∇ f ( x t − 1 ) � x t = min γ ∈ [0 , 1] f ((1 − γ ) x t + γ y t ) December 10, 2013 9 / 26

  19. Generic FW with Line Search Compute& x t Line&Search& &Gradient& Linear&& Minimiza<on& Oracle& �r f ( x t − 1 ) y t December 10, 2013 10 / 26

  20. FW for Marginal Inference Compute&Gradient& µ t +1 Line&Search& r F ( µ t ) = θ + r H ( µ t ) MAP& Inference& ˜ ˜ Oracle& θ µ MAP December 10, 2013 11 / 26

  21. Subproblem Parametrization � F ( µ ) = � µ , θ � + W c H ( µ c ) c ∈C December 10, 2013 12 / 26

  22. Subproblem Parametrization � F ( µ ) = � µ , θ � + W c H ( µ c ) c ∈C ˜ � θ = ∇ F ( µ t ) = θ + W c ∇ H ( µ c ) c ∈C December 10, 2013 12 / 26

  23. Line Search µ t µ t +1 ˜ µ MAP December 10, 2013 13 / 26

  24. Line Search µ t µ t +1 ˜ µ MAP Computing line search objective can scale with: December 10, 2013 13 / 26

  25. Line Search µ t µ t +1 ˜ µ MAP Computing line search objective can scale with: Bad: # possible values in cliques. December 10, 2013 13 / 26

  26. Line Search µ t µ t +1 ˜ µ MAP Computing line search objective can scale with: Bad: # possible values in cliques. Good: # cliques in graph. (see paper) December 10, 2013 13 / 26

  27. Experiment #1 December 10, 2013 14 / 26

  28. Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 15 / 26

  29. Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) December 10, 2013 16 / 26

  30. Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) δ C f t +2 MAP suboptimality at iter t December 10, 2013 16 / 26

  31. Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) δ C f t +2 MAP suboptimality at iter t − → NP-Hard December 10, 2013 16 / 26

  32. Convergence Rate Convergence Rate of Frank-Wolfe [Jaggi, 2013] F ( µ t ) − F ( µ ∗ ) ≤ 2 C F t + 2(1 + δ ) δ C f t +2 MAP suboptimality at iter t − → NP-Hard How to deal with MAP hardness? Use MAP solver and hope for the best [Hazan and Jaakkola, 2012]. Relax to the local polytope . December 10, 2013 16 / 26

  33. Curvature + Convergence Rate 2 C f = sup γ 2 ( f ( y ) − f ( x ) − � y − x , ∇ f ( x ) � ) x , s ∈ D ; γ ∈ [0 , 1]; y = x + γ ( s − x ) December 10, 2013 17 / 26

  34. Curvature + Convergence Rate 2 C f = sup γ 2 ( f ( y ) − f ( x ) − � y − x , ∇ f ( x ) � ) x , s ∈ D ; γ ∈ [0 , 1]; y = x + γ ( s − x ) 0.7 0.6 0.5 0.4 entropy 0.3 µ t 0.2 µ t +1 0.1 0 0 0.2 0.4 0.6 0.8 1 prob x = 1 ˜ µ MAP December 10, 2013 17 / 26

  35. Experiment #2 December 10, 2013 18 / 26

  36. Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 19 / 26

  37. Beyond MRFs Question Are MRFs the right Gibbs distribution to use Frank-Wolfe? December 10, 2013 20 / 26

  38. Beyond MRFs Question Are MRFs the right Gibbs distribution to use Frank-Wolfe? Problem Family MAP Algorithm Marginal Algorithm tree-structured graphical models Viterbi Forward-Backward loopy graphical models Max-Product BP Sum-Product BP Directed Spanning Tree Chu Liu Edmonds Matrix Tree Theorem Bipartite Matching Hungarian Algorithm × December 10, 2013 20 / 26

  39. Table of Contents Markov Random Fields 1 Frank-Wolfe for Marginal Inference 2 Optimality Guarantees and Convergence Rate 3 Beyond MRFs 4 Fancier FW 5 December 10, 2013 21 / 26

  40. norm-regularized marginal inference µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) + λ R ( µ ) Harchaoui et al. [2013]. December 10, 2013 22 / 26

  41. norm-regularized marginal inference µ MARG = arg max µ ∈M � µ , θ � + H M ( µ ) + λ R ( µ ) Harchaoui et al. [2013]. Local linear oracle for MRFs? µ t = arg ˜ µ ∈M∩ B r ( µ t ) � µ , θ � max Garber and Hazan [2013] December 10, 2013 22 / 26

  42. Conclusion We need to figure out how to handle the entropy gradient. December 10, 2013 23 / 26

  43. Conclusion We need to figure out how to handle the entropy gradient. There are plenty of extensions to further Gibbs distributions + regularizers. December 10, 2013 23 / 26

  44. Further Reading I Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming the curse of dimensionality: Discrete integration by hashing and optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 334–342, 2013. D. Garber and E. Hazan. A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization. ArXiv e-prints , January 2013. Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradient algorithms for norm-regularized smooth convex optimization. arXiv preprint arXiv:1302.2325 , 2013. Tamir Hazan and Tommi S Jaakkola. On the Partition Function and Random Maximum A-Posteriori Perturbations. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) , pages 991–998, 2012. Bert Huang and Tony Jebara. Approximating the permanent with belief propagation. arXiv preprint arXiv:0908.1769 , 2009. December 10, 2013 24 / 26

  45. Further Reading II Mark Huber. Exact sampling from perfect matchings of dense regular bipartite graphs. Algorithmica , 44(3):183–193, 2006. Martin Jaggi. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 427–435, 2013. James Petterson, Tiberio Caetano, Julian McAuley, and Jin Yu. Exponential family graph matching and ranking. 2009. Tim Roughgarden and Michael Kearns. Marginals-to-models reducibility. In Advances in Neural Information Processing Systems , pages 1043–1051, 2013. Maksims Volkovs and Richard S Zemel. Efficient sampling for bipartite matching problems. In Advances in Neural Information Processing Systems , pages 1322–1330, 2012. Pascal O Vontobel. The bethe permanent of a non-negative matrix. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on , pages 341–346. IEEE, 2010. December 10, 2013 25 / 26

  46. Finding the Marginal Matching Sampling Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012]. December 10, 2013 26 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend