frank wolfe splitting via augmented lagrangian method
play

Frank-Wolfe Splitting via Augmented Lagrangian Method Fabian - PowerPoint PPT Presentation

Frank-Wolfe Splitting via Augmented Lagrangian Method Fabian Pedregosa 2 Simon Lacoste-Julien 1 Gauthier Gidel 1 1 MILA, DIRO Universit de Montral 2 UC Berkeley & ETH Zurich April 2018 Gauthier Gidel FW Splitting via ALM April 2018 Why


  1. Frank-Wolfe Splitting via Augmented Lagrangian Method Fabian Pedregosa 2 Simon Lacoste-Julien 1 Gauthier Gidel 1 1 MILA, DIRO Université de Montréal 2 UC Berkeley & ETH Zurich April 2018 Gauthier Gidel FW Splitting via ALM April 2018

  2. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting for highly structured constraint sets: Permutahedron: [Lancia and Alignment constraint: [Alayrac Serafini, 2018] [Evangelopoulos et al., 2016] et al., 2017] Gauthier Gidel FW Splitting via ALM April 2018

  3. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting for highly structured constraint sets: Alignment constraint: Permutahedron: [Lancia and Serafini, 2018] [Evangelopoulos et al., 2017] [Alayrac et al., 2016] Gauthier Gidel FW Splitting via ALM April 2018

  4. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting for highly structured constraint sets: Permutahedron: Alignment constraint: [Lancia and Serafini, 2018] [Alayrac et al., 2016] [Evangelopoulos et al., 2017] Gauthier Gidel FW Splitting via ALM April 2018

  5. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting when projection is not practical : Projection Linear Minimization Oracle ◮ When projection is practical better use projected gradient method. Gauthier Gidel FW Splitting via ALM April 2018

  6. Why Frank-Wolfe sometimes is not enough. ◮ FW requires linear minimization ( LMO ) over these set. LMO ( d ) := arg min � d , x � x ∈C ◮ Intersection of constraint sets: C 1 ∩ C 2 . ◮ LMO C 1 ∩C 2 ( d ) may be too expensive. ◮ FW-AL just requires LMO C 1 ( d ) and LMO C 2 ( d ). Gauthier Gidel FW Splitting via ALM April 2018

  7. Why Frank-Wolfe sometimes is not enough. ◮ FW requires linear minimization ( LMO ) over these set. LMO ( d ) := arg min � d , x � x ∈C ◮ Intersection of constraint sets: C 1 ∩ C 2 . ◮ LMO C 1 ∩C 2 ( d ) may be too expensive. ◮ FW-AL just requires LMO C 1 ( d ) and LMO C 2 ( d ). Gauthier Gidel FW Splitting via ALM April 2018

  8. Why Frank-Wolfe sometimes is not enough. ◮ FW requires linear minimization ( LMO ) over these set. LMO ( d ) := arg min � d , x � x ∈C ◮ Intersection of constraint sets: C 1 ∩ C 2 . ◮ LMO C 1 ∩C 2 ( d ) may be too expensive. ◮ FW-AL just requires LMO C 1 ( d ) and LMO C 2 ( d ). Gauthier Gidel FW Splitting via ALM April 2018

  9. Simultaneously sparse and low rank matrix recovery Proposed by Richard et al. [2012]: � S − ˆ Σ � 2 min 2 . S � 0 , � S � 1 ≤ β 1 , � S � ∗ ≤ β 2 ◮ Sparcity constraint: C 1 := { S � 0 , � S � 1 ≤ β 1 } , LMO C 1 ( D ) = Largest coefficient of the matrix: O ( d 2 ) ◮ Low rank constraint: C 2 := { S � 0 , � S � ∗ ≤ β 2 } . LMO C 2 ( D ) = Largest eigenvector: O ( d 2 / √ ǫ ) Gauthier Gidel FW Splitting via ALM April 2018

  10. Simultaneously sparse and low rank matrix recovery Proposed by Richard et al. [2012]: � S − ˆ Σ � 2 min 2 . S � 0 , � S � 1 ≤ β 1 , � S � ∗ ≤ β 2 ◮ Sparcity constraint: C 1 := { S � 0 , � S � 1 ≤ β 1 } , LMO C 1 ( D ) = Largest coefficient of the matrix: O ( d 2 ) ◮ Low rank constraint: C 2 := { S � 0 , � S � ∗ ≤ β 2 } . LMO C 2 ( D ) = Largest eigenvector: O ( d 2 / √ ǫ ) Gauthier Gidel FW Splitting via ALM April 2018

  11. Simultaneously sparse and low rank matrix recovery Proposed by Richard et al. [2012]: � S − ˆ Σ � 2 min 2 . S � 0 , � S � 1 ≤ β 1 , � S � ∗ ≤ β 2 ◮ Sparcity constraint: C 1 := { S � 0 , � S � 1 ≤ β 1 } , LMO C 1 ( D ) = Largest coefficient of the matrix: O ( d 2 ) ◮ Low rank constraint: C 2 := { S � 0 , � S � ∗ ≤ β 2 } . LMO C 2 ( D ) = Largest eigenvector: O ( d 2 / √ ǫ ) Gauthier Gidel FW Splitting via ALM April 2018

  12. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  13. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  14. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  15. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  16. Structured SVM Proposed by Yen et al. [2016b]: 1 � � A F α � 2 � δ ⊤ min 2 − j α j dual problem: 2 α f ∈ ∆ |Y f | F ∈T j ∈V s.t. M fi α f = α i , f ∈ F, F ∈ T , i ∈ N ( f ) . ◮ V : Variables. T : Factor templates. N ( f ): neighbors of f . ◮ Consistency constraint: M 11 x (1) = α 1 , M 12 x (1) = α 2 , . . . x (1) x (2) α 1 α 2 α 3 Gauthier Gidel FW Splitting via ALM April 2018

  17. Structured SVM Proposed by Yen et al. [2016b]: 1 � � A F α � 2 � δ ⊤ min 2 − j α j dual problem: 2 α f ∈ ∆ |Y f | F ∈T j ∈V s.t. M fi α f = α i , f ∈ F, F ∈ T , i ∈ N ( f ) . ◮ V : Variables. T : Factor templates. N ( f ): neighbors of f . ◮ Consistency constraint: M 11 x (1) = α 1 , M 12 x (1) = α 2 , . . . x (1) x (2) α 1 α 2 α 3 Gauthier Gidel FW Splitting via ALM April 2018

  18. Structured SVM Proposed by Yen et al. [2016b]: 1 � � A F α � 2 � δ ⊤ min 2 − j α j dual problem: 2 α f ∈ ∆ |Y f | F ∈T j ∈V s.t. M fi α f = α i , f ∈ F, F ∈ T , i ∈ N ( f ) . ◮ V : Variables. T : Factor templates. N ( f ): neighbors of f . ◮ Consistency constraint: M 11 x (1) = α 1 , M 12 x (1) = α 2 , . . . x (1) x (2) α 1 α 2 α 3 Gauthier Gidel FW Splitting via ALM April 2018

  19. General Formulation x (1) ,..., x ( k ) f ( x (1) , . . . , x ( k ) ) , minimize K x ( k ) ∈ C k , k ∈ [ K ] , A k x ( k ) = 0 . � k =1 ◮ f is convex and smooth (gradient Lipschitz). ◮ C k , k ∈ { 1 , . . . , K } are convex compact. Gauthier Gidel FW Splitting via ALM April 2018

  20. Augmented Lagrangian Method k =1 A k x ( k ) = 0. ◮ Augmented Lagrangian trick to get rid of � K k =1 A k x ( k ) = 0 and the functions, ◮ M s.t. M x = 0 ⇔ � K 2 � M x � 2 . L ( x , y ) := f ( x ) + � y , M x � + λ � f ( x ) if M x = 0 , p ( x ) := max y ∈ R d L ( x , y ) = + ∞ otherwise . ◮ Augmented Lagrangian formulation of our problem, minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . Gauthier Gidel FW Splitting via ALM April 2018

  21. Augmented Lagrangian Method k =1 A k x ( k ) = 0. ◮ Augmented Lagrangian trick to get rid of � K k =1 A k x ( k ) = 0 and the functions, ◮ M s.t. M x = 0 ⇔ � K 2 � M x � 2 . L ( x , y ) := f ( x ) + � y , M x � + λ � f ( x ) if M x = 0 , p ( x ) := max y ∈ R d L ( x , y ) = + ∞ otherwise . ◮ Augmented Lagrangian formulation of our problem, minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . Gauthier Gidel FW Splitting via ALM April 2018

  22. Augmented Lagrangian Method k =1 A k x ( k ) = 0. ◮ Augmented Lagrangian trick to get rid of � K k =1 A k x ( k ) = 0 and the functions, ◮ M s.t. M x = 0 ⇔ � K 2 � M x � 2 . L ( x , y ) := f ( x ) + � y , M x � + λ � f ( x ) if M x = 0 , p ( x ) := max y ∈ R d L ( x , y ) = + ∞ otherwise . ◮ Augmented Lagrangian formulation of our problem, minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . Gauthier Gidel FW Splitting via ALM April 2018

  23. FW-AL algorithm minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . ◮ Standard AL method:  x t +1 = arg min L ( x , y t ) (argmin step) ,  x ∈X y t +1 = y t + η t M x t +1 (Gradient ascent step) .  ◮ Replace arg min steps by FW steps. FW-AL : � x t +1 = FW ( x t ; L ( · , y t )) (Frank-Wolfe step) , y t +1 = y t + η t M x t +1 (Gradient ascent step) . Gauthier Gidel FW Splitting via ALM April 2018

  24. FW-AL algorithm minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . ◮ Standard AL method:  x t +1 = arg min L ( x , y t ) (argmin step) ,  x ∈X y t +1 = y t + η t M x t +1 (Gradient ascent step) .  ◮ Replace arg min steps by FW steps. FW-AL : � x t +1 = FW ( x t ; L ( · , y t )) (Frank-Wolfe step) , y t +1 = y t + η t M x t +1 (Gradient ascent step) . Gauthier Gidel FW Splitting via ALM April 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend