a study of nesterov s scheme for lagrangian decomposition
play

A Study of Nesterovs Scheme for Lagrangian Decomposition and MAP - PowerPoint PPT Presentation

A Study of Nesterovs Scheme for Lagrangian Decomposition and MAP Labeling Bogdan Savchynskyy, J org Kappes, Stefan Schmidt, Christoph Schn orr Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 1/15 MRF/MAP


  1. A Study of Nesterov’s Scheme for Lagrangian Decomposition and MAP Labeling Bogdan Savchynskyy, J¨ org Kappes, Stefan Schmidt, Christoph Schn¨ orr Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 1/15

  2. MRF/MAP Inference – Applications ✓➳ ✛ y ✝ ✏ arg min ➳ θ v ♣ y v q � θ vv ✶ ♣ y v , y v ✶ q y P Y V v P V vv ✶ P E Segmentation [Rother et al. 2004], [Nowozin, Lampert 2010] Multi-camera stereo [Kolmogorov, Zabih 2002] Stereo and Motion [Kim et al. 2003] Clustering [Zabih, Kolmogorov. 2004] Medical imaging [Raj et al. 2007] Pose Estimation [Bergtholdt et al. 2010], [Bray et al. 2006] . . . A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. R. Szeliski et al. 2008 2/15

  3. MRF/MAP Inference – Approaches Graph Cuts [Boykov et al. 2001] Special type of potentials. [Kolmogorov, Zabih 2002] Sub-modularity [Boykov, Kolmogorov 2004] QPBO and Roof Duality [Hammel et al.1984], Partial optimality. [Boros, Hammer 2002], [Rother et al. 2007], [Kohli et al. 2008] Combinatorial methods [Bergtholdt et al. 2006], [Schlesinger 2009] Exponential complexity in the [Sanchez et al. 2008], worst-case. [Marinescu, Dechter 2009] 3/15

  4. MRF/MAP Inference – Approaches Message passing and belief propagation Relaxation, dual decomposition. [Weiss, Freeman 2001], [Wainwright et al. Sub-optimal fixed point 2002], [Kolmogorov 2005], [Globerson, Stopping criterion? Jaakkola 2007] Sub-gradient Optimization Schemes Relaxation, dual decomposition. [Komodakis et al. 2007], [Schlesinger, Slow convergence. Giginyak 2007], [Kappes et al. 2010] Stopping criterion? Focus and Contribution: Local Polytope/LP relaxation based on dual decomposition – similar to message passing and sub-gradient schemes; efficient iterations – outperforms subgradient; convergence to the optimum – outperforms message passing; stopping criterion based on duality gap – novel! 4/15

  5. Dual Decomposition Approach E 1 ♣ θ 1 , y q E 2 ♣ θ 2 , y q E ♣ θ, y q ✏ � Ñ � ✒ ✚ y P Y V E 1 ♣ θ 1 , y q � min y P Y V E 2 ♣ θ 2 , y q y P Y V E ♣ θ, y q ➙ min max min θ 1 � θ 2 ✏ θ Simple subproblems in parallel Concave, but non-smooth 5/15

  6. Large Scale Convex Optimization Problem: Dual Decomposition Ñ Convex, Large-Scale, Non-Smooth Sub-gradient schemes: [Komodakis et al. 2007], Smoothing technique + [Schlesinger, Giginyak 2007] accelerated gradient Block-coordinate ascent: methods: [Nesterov 2004, [Wainwright 2004], 2007] [Kolmogorov 2005], [Globerson, Proximal methods: [Combettes, Jaakkola 2007] Wajs 2005], [Beck, Teboulle Smoothing + Block-coordinate 2009] ascent: [Johnson et al. 2007], Proximal Primal-Dual [Werner 2009] Algorithms: [Esser et al. 2010] Proximal methods: [Ravikumar et al. 2010] Solution direction: Smooth and Optimize 6/15

  7. Smoothing Technique by Y.Nesterov y P D r① Ax, y ② � φ ♣ y qs ✁ ✁ ✁ ✁ ✁ Ñ min y P D r① Ax, y ② � φ ♣ y q � ρd ♣ y qs min ❧♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♥ f ♣ x q ˜ f ρ ♣ x q Concave, but non-smooth, Lipschitz-continuous gradient, convergence t ✓ O ♣ 1 convergence t ✓ O ♣ 1 ε 2 q ε q 7/15

  8. Efficient Implementation of a Nesterov’s Method Our approach Basic scheme Worst-case number Stopping condition Duality gap of steps Smoothing selection Worst-case analysis Adaptive Lipschitz constant estimation Worst-case analysis Adaptive (step-size selection) 8/15

  9. Duality Gap and Stopping Condition g ♣ x, y q ✁ max x g ♣ x, y q ↕ ε min x max min y y dual decomposition approaches optimize the relaxed dual x g ♣ x, y q . max min y standard approach – estimate a non-relaxed primal, integer solution. we estimate the relaxed primal min x max g ♣ x, y q – difficult! . y 9/15

  10. Smoothing Selection Fast optimization – low precision Slow optimization – high precision 10/15

  11. Smoothing Selection δ ε ✏ 2 δ Nesterov: worst-case δ estimate. Ours: adaptive estimate. Tsukuba dataset and precision about 0 . 3 % 11/15

  12. Lipschitz Constant (Steps-Size) Estimation Nesterov: worst-case estimate of L . x ✏ y � 1 L ∇ f ♣ y q Ours: adaptive estimate of L without violating the theory! Tsukuba dataset and precision about 3 % 12/15

  13. Comparison to Other Approaches Random synthetic grid model 20x20, 5 labels and Tsukuba dataset 13/15

  14. Summary Contribution: O ♣ 1 ε q vs. O ♣ 1 Improved convergence estimation: ε 2 q Sound stopping condition: min x max g ♣ x, y q ✁ max min x g ♣ x, y q ↕ ε y y Fine-grained ✓ � parallelization properties Applicable to arbitrary graphs and arbitrary potentials. Future work: Examine Primal-Dual viewpoint – EMMCVPR 2011 Appication in structured prediction and learning. 14/15

  15. V. Jojic, S. Gould, and D. Koller. Accelerated dual decomposition ... 2010 Primal LP solution Primal integer solution Synthetic grid 20x20, 5 labels. 15/15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend