highly parallel methods for machine learning and signal
play

HIGHLY PARALLEL METHODS FOR MACHINE LEARNING AND SIGNAL RECOVERY - PowerPoint PPT Presentation

HIGHLY PARALLEL METHODS FOR MACHINE LEARNING AND SIGNAL RECOVERY Tom Goldstein TOPICS Introduction ADMM / Fast ADMM Application: Distributed computing Automation & Adaptivity FIRST-ORDER METHODS minimize F ( x ) x Generalizes


  1. HIGHLY PARALLEL METHODS FOR MACHINE LEARNING AND SIGNAL RECOVERY Tom Goldstein

  2. TOPICS Introduction ADMM / Fast ADMM Application: Distributed computing Automation & Adaptivity

  3. FIRST-ORDER METHODS minimize F ( x ) x Generalizes Gradient descent x 0 x k +1 = x k � τ r F ( x k ) x 2 x 1 x 4 x 3

  4. FIRST-ORDER METHODS minimize F ( x ) x Generalizes Gradient descent x 0 x k +1 = x k � τ r F ( x k ) Pros x 2 x 1 • Linear complexity x 4 x 3 • Parallelizable • Low memory requirements

  5. FIRST-ORDER METHODS minimize F ( x ) x Generalizes Gradient descent x k +1 = x k � τ r F ( x k ) Con Pros • Linear complexity • Poor convergence rates • Parallelizable • Low memory requirements

  6. FIRST-ORDER METHODS minimize F ( x ) x Generalizes Gradient descent x k +1 = x k � τ r F ( x k ) Con Pros • Linear complexity • Poor convergence rates • Parallelizable • Low memory requirements Solution: Adaptivity and Acceleration

  7. CONSTRAINED PROBLEMS minimize H ( u ) + G ( v ) subject to Au + Bv = b Big idea: Lagrange multipliers u,v H ( u ) + G ( v ) + h λ , b � Au � Bv i + τ 2 k b � Au � Bv k 2 max min λ

  8. CONSTRAINED PROBLEMS minimize H ( u ) + G ( v ) subject to Au + Bv = b Big idea: Lagrange multipliers u,v H ( u ) + G ( v ) + h λ , b � Au � Bv i + τ 2 k b � Au � Bv k 2 max min λ

  9. CONSTRAINED PROBLEMS minimize H ( u ) + G ( v ) subject to Au + Bv = b Big idea: Lagrange multipliers u,v H ( u ) + G ( v ) + h λ , b � Au � Bv i + τ 2 k b � Au � Bv k 2 max min λ

  10. CONSTRAINED PROBLEMS minimize H ( u ) + G ( v ) subject to Au + Bv = b Big idea: Lagrange multipliers u,v H ( u ) + G ( v ) + h λ , b � Au � Bv i + τ 2 k b � Au � Bv k 2 max min λ

  11. CONSTRAINED PROBLEMS minimize H ( u ) + G ( v ) subject to Au + Bv = b Big idea: Lagrange multipliers u,v H ( u ) + G ( v ) + h λ , b � Au � Bv i + τ 2 k b � Au � Bv k 2 max min λ • Optimality for : b − Au − Bv = 0 λ • Reduced energy: H ( u ) + G ( v ) • Saddle-point = Solution to constrained problem

  12. ADMM minimize H ( u ) + G ( v ) subject to Au + Bv = b Big Idea: Lagrange multipliers u,v H ( u ) + G ( v ) + h λ , b � Au � Bv i + τ 2 k b � Au � Bv k 2 max min λ Alternating Direction Method of Multipliers H ( u ) + h λ k , � Au i + τ 2 k b � Au � Bv k k 2 u k +1 = arg min u G ( v ) + h λ k , � Bv i + τ 2 k b � Au k +1 � Bv k 2 v k +1 = arg min v λ k +1 = λ k + τ ( b � Au k +1 � Bv k +1 )

  13. EXAMPLE PROBLEMS Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ u � f ⇥ 2 • TV Denoising noisy image • TV Deblurring min | ⇤ u | + µ 2 ⇥ Ku � f ⇥ 2 • General Problem: min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 Goldstein & Osher, “Split Bregman,” 2009

  14. EXAMPLE PROBLEMS Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ u � f ⇥ 2 • TV Denoising clean image • TV Deblurring min | ⇤ u | + µ 2 ⇥ Ku � f ⇥ 2 • General Problem: min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 Goldstein & Osher, “Split Bregman,” 2009

  15. EXAMPLE PROBLEMS Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ u � f ⇥ 2 • TV Denoising total variation • TV Deblurring min | ⇤ u | + µ 2 ⇥ Ku � f ⇥ 2 • General Problem: min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 Goldstein & Osher, “Split Bregman,” 2009

  16. EXAMPLE PROBLEMS Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ u � f ⇥ 2 • TV Denoising • TV Deblurring min | ⇤ u | + µ 2 ⇥ Ku � f ⇥ 2 • General Problem: blurred image min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 Goldstein & Osher, “Split Bregman,” 2009

  17. EXAMPLE PROBLEMS Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ u � f ⇥ 2 • TV Denoising • TV Deblurring min | ⇤ u | + µ 2 ⇥ Ku � f ⇥ 2 • General Problem: Convolution min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 Goldstein & Osher, “Split Bregman,” 2009

  18. EXAMPLE PROBLEMS Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ u � f ⇥ 2 • TV Denoising • TV Deblurring min | ⇤ u | + µ 2 ⇥ Ku � f ⇥ 2 • General Problem: min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 General Problem Goldstein & Osher, “Split Bregman,” 2009

  19. WHY IS SPLITTING GOOD? Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 Goldstein & Osher, “Split Bregman,” 2009

  20. WHY IS SPLITTING GOOD? Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 • Make change of Variables: v r u Goldstein & Osher, “Split Bregman,” 2009

  21. WHY IS SPLITTING GOOD? Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 • Make change of Variables: v r u • ‘Split Bregman’ form: | v | + µ 2 k Au � f k 2 minimize v � r u = 0 subject to Goldstein & Osher, “Split Bregman,” 2009

  22. WHY IS SPLITTING GOOD? Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 • Make change of Variables: v r u • ‘Split Bregman’ form: | v | + µ 2 k Au � f k 2 minimize v � r u = 0 subject to • Augmented Lagrangian 2 k Au � f k 2 + h λ , v � r u i + τ | v | + 1 2 k v � r u k 2 Goldstein & Osher, “Split Bregman,” 2009

  23. WHY IS SPLITTING GOOD? Non-Smooth Problems min | ⇤ u | + µ 2 ⇥ Au � f ⇥ 2 2 k Au � f k 2 + h λ , v � r u i + τ | v | + 1 2 k v � r u k 2 ADMM for TV k Au � f k 2 + τ 2 k v k � r u � λ k k 2 u k +1 = arg min u | v | + τ 2 k v � r u k +1 � λ k k 2 v k +1 = arg min v λ k +1 = λ k + τ ( r u k +1 � v ) Goldstein, Osher. 2008

  24. WHY IS SPLITTING BAD? min | ⇤ u | + µ TV Denoising: 2 ⇥ u � f ⇥ 2

  25. GRADIENT VS. NESTEROV Gradient Nesterov x 0 x 0 x 1 x 2 x 2 x 1 x 4 x 3 x 3 x 4 ✓ 1 ✓ 1 ◆ ◆ O O k 2 k

  26. GRADIENT VS. NESTEROV Gradient Nesterov x 0 x 0 x 1 x 2 x 2 x 1 x 4 x 3 x 3 x 4 ✓ 1 ✓ 1 ◆ ◆ Optimal O O k 2 k Nemirovski and Yudin ’83

  27. NESTEROV’S METHOD minimize F ( x ) x Gradient Descent x k +1 = y k � τ r F ( y k ) ✓ ◆ α k +1 = 1 q Acceleration Factor 4 α 2 1 + k + 1 2 Prediction y k +1 = x k +1 + α k � 1 ( x k +1 � x k ) α k +1 Nesterov ’83

  28. ACCELERATED SPLITTING METHODS

  29. HOW TO MEASURE CONVERGENCE? No “Objective” to minimize Unconstrained Constrained 1 2 0.8 0.6 1.5 0.4 0.2 1 0 − 0.2 0.5 − 0.4 − 0.6 0 1 − 0.8 0.5 1 1 − 1 0.5 1 0.5 0 0.5 0 0 0 − 0.5 − 0.5 − 0.5 − 0.5 − 1 − 1 − 1 − 1 Convex Saddle

  30. RESIDUALS minimize H ( u ) + G ( v ) subject to Au + Bv = b • Lagrangian min u,v max H ( u ) + G ( v ) + h λ , b � Au � Bv i λ

  31. RESIDUALS minimize H ( u ) + G ( v ) subject to Au + Bv = b • Lagrangian min u,v max H ( u ) + G ( v ) + h λ , b � Au � Bv i λ • Derivative for λ b − Au − Bv = 0 ∂ H ( u ) − A T λ = 0 • Derivative for u

  32. RESIDUALS minimize H ( u ) + G ( v ) subject to Au + Bv = b • Lagrangian min u,v max H ( u ) + G ( v ) + h λ , b � Au � Bv i λ • Derivative for λ b − Au − Bv = 0 ∂ H ( u ) − A T λ = 0 • Derivative for u • We have convergence when derivatives are ‘small’ r k = b − Au k − Bv k d k = ∂ H ( u k ) − A T λ k

  33. RESIDUALS minimize H ( u ) + G ( v ) subject to Au + Bv = b • Lagrangian min u,v max H ( u ) + G ( v ) + h λ , b � Au � Bv i λ • Derivative for λ b − Au − Bv = 0 ∂ H ( u ) − A T λ = 0 • Derivative for u • We have convergence when derivatives are ‘small’ r k = b − Au k − Bv k d k = τ A T B ( v k − v k − 1 )

  34. EXPLICIT RESIDUALS • Explicit formulas for residuals r k = b − Au k − Bv k d k = τ A T B ( v k − v k − 1 )

  35. EXPLICIT RESIDUALS • Explicit formulas for residuals r k = b − Au k − Bv k d k = τ A T B ( v k − v k − 1 ) • Combined residual c k = k r k k 2 + 1 τ k d k k 2 Goldstein, O'Donoghue, Setzer, Baraniuk. 2012 & Yuan and He. 2012

  36. EXPLICIT RESIDUALS • Explicit formulas for residuals r k = b − Au k − Bv k d k = τ A T B ( v k − v k − 1 ) • Combined residual c k = k r k k 2 + 1 τ k d k k 2 • ADMM/AMA converge at rate c k ≤ O (1 /k ) Goldstein, O'Donoghue, Setzer, Baraniuk. 2012 & Yuan and He. 2012

  37. EXPLICIT RESIDUALS • Explicit formulas for residuals r k = b − Au k − Bv k d k = τ A T B ( v k − v k − 1 ) • Combined residual c k = k r k k 2 + 1 τ k d k k 2 • ADMM/AMA converge at rate c k ≤ O (1 /k ) ✓ 1 ◆ Goal: O k 2 Goldstein, O'Donoghue, Setzer, Baraniuk. 2012 & Yuan and He. 2012

  38. FAST ADMM v 0 2 R N v , λ − 1 = ˆ λ 0 2 R N b , τ > 0 Require: v − 1 = ˆ 1: for k = 1 , 2 , 3 . . . do u k = argmin H ( u ) + h ˆ v k k 2 λ k , � Au i + τ 2 k b � Au � B ˆ 2: v k = argmin G ( v ) + h ˆ 2 k b � Au k � Bv k 2 λ k , � Bv i + τ 3: λ k = ˆ λ k + τ ( b � Au k � Bv k ) 4: 1+ p 1+4 α 2 α k +1 = k 5: 2 v k +1 = v k + α k − 1 ˆ α k +1 ( v k � v k − 1 ) 6: ˆ λ k +1 = λ k + α k − 1 α k +1 ( λ k � λ k − 1 ) 7: 8: end for Goldstein, O'Donoghue, Setzer, Baraniuk. 2012

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend