Projection onto Minkowski Sums with Application to Constrained - PowerPoint PPT Presentation

Projection onto Minkowski Sums with Application to Constrained Learning Joong-Ho (Johann) Won 1 Jason Xu 2 Kenneth Lange 3 1 Department of Statistics, Seoul National University 2 Department of Statistical Science, Duke University 3 Departments of Biomathematics, Human Genetics, and Statistics, UCLA June 11, 2019 International Conference on Machine Learning

Outline • Minkowski sum and projection • Why are Minkowski sums useful for constrained learning? • Constrained learning via projection onto Minkowski sums • Minkowski projection algorithm • Applications to constrained learning • Conclusion Minkowski Projection 1

Minkowski sum of sets A, B ⊂ R d A + B � { a + b : a ∈ A, b ∈ B } , Image source: Christophe Weibel https://sites.google.com/site/christopheweibel/research/minkowski-sums Minkowski Projection 2

Projection onto Minkowski sums 1 2 � u − x � 2 P A + B ( x ) = argmin 2 , x / ∈ A + B (P) u ∈ A + B Image source: Christophe Weibel https://sites.google.com/site/christopheweibel/research/minkowski-sums Minkowski Projection 3

Why are Minkowski sums useful for constrained learning? Many penalized or constrained learning problems are of the form k � x ∈ R d f ( x ) + min σ C i ( x ) i =1 • σ C ( x ) = sup y ∈ C � x , y � is the support function of convex set C . • Example: elastic net min x f ( x ) + λ 1 � x � 1 + λ 2 � x � 2 , C 1 = { x : � x � ∞ ≤ λ 1 } , C 2 = { x : � x � 2 ≤ λ 2 } (dual norm balls) Minkowski Projection 4

Why are Minkowski sums useful for constrained learning? Many penalized or constrained learning problems are of the form k � x ∈ R d f ( x ) + min σ C i ( x ) = x ∈ R d f ( x ) + σ C 1 + ··· + C k ( x ) min (1) i =1 • Support functions are additive over Minkowski sums (Hiriart-Urruty and Lemar´ echal 2012). • New perspective on LHS: minimizing sum of two (convex) functions instead of k + 1 functions. Minkowski Projection 5

Multiple/overlapping norm penalties ℓ 1 ,p group lasso/multitask learning (Yuan and Lin 2006) with overlaps allowed: k � x ∈ R d f ( x ) + λ min � x i 1 � p , p ≥ 1 i =1 where x i 1 =subvector of x ; i 1 ⊂ { 1 , . . . , d } =group index. • Involved sets: ℓ q -norm disks. C i = { y = ( y i 1 , y i 2 ) : � y i 1 � q ≤ λ, y i 2 = 0 } , (2) 1 p + 1 q = 1 , i 2 = { 1 , . . . , d } \ i 1 . • No distinction between overlapping vs. non-overlapping groups! Minkowski Projection 6

Conic constraints x ∈ R d f ( x ) subject to x ∈ K ∗ 1 ∩ K ∗ 2 ∩ · · · ∩ K ∗ min k where K ∗ i = { y : � x , y � ≤ 0 , ∀ x ∈ K i } is the polar cone of closed convex cone K i . • Use the fact ι K ∗ i ( x ) = σ K i ( x ) to express it as k k � � x ∈ R d f ( x ) + min ι K ∗ i ( x ) = min x ∈ R d f ( x ) + σ K i ( x ) . i =1 i =1 • ι S = 0 / ∞ indicator of set S Minkowski Projection 7

Constrained lasso: mix-and-match x ∈ R d f ( x ) + λ � x � 1 subject to Bx = 0 , Cx ≤ 0 , min which subsumes the generalized lasso (Tibshirani and Taylor 2011) as a special case (James, Paulson, and Rusmevichientong 2013; Gaines, Kim, and Zhou 2018). • Involved sets: cone, subspace, and ℓ ∞ -norm ball C 1 = { x : Bx = 0 } ∗ = { x : Bx = 0 } ⊥ , (3) C 2 = { x : Cx ≤ 0 } ∗ , C 3 = { x : � x � ∞ ≤ λ } Minkowski Projection 8

Constrained learning via projection onto Minkowski sums Contemporary methods for solving problem (1) (e.g., proximal gradient) requires computing the proximity operator of σ C 1 + ··· + C k : σ C 1 + ··· + C k ( u ) + 1 2 γ � u − x � 2 prox γσ C 1+ ··· + Ck ( x ) = argmin 2 u ∈ R d • Proximal gradient: x ( t +1) = prox γ t σ C 1+ ··· + Ck x ( t ) − γ − 1 � � ∇ f ( x ( t ) ) t • Can be computed via Minkowski projection Minkowski Projection 9

• Duality: σ ∗ C 1 + ··· + C k ( y ) = ι C 1 + ··· + C k ( y ) , ( ι S ( u ) = 0 if u ∈ S, ∞ otherwise ) if C 1 + · · · + C k is closed convex; g ∗ ( y ) = sup x � x , y � − g ( x ) is the Fenchel conjugate of g . • Moreau’s decomposition x = prox γg ( x ) + γ prox γ − 1 g ∗ ( γ − 1 x ) In terms of Minkowski projection, prox γσ C 1+ ··· + Ck ( x ) = x − γ prox γ − 1 ι C 1+ ··· + Ck ( γ − 1 x ) x − γP C 1 + ··· + C k ( γ − 1 x ) = Minkowski Projection 10

Minkowski projection algorithm Goal : to develop an efficient method for computing P C 1 + ··· + C k ( x ) , in case projection onto each set P C i ( x ) is simple. MM algorithm : 1: Input: External point x / ∈ C 1 + . . . + C k ; Projection operator P C i onto set C i , i = 1 , . . . , k ; initial value a i 0 , i = 1 , . . . , k ; viscosity parameter ρ ≥ 0 2: Initialization: n ← 0 3: Repeat For i = 1 , 2 , . . . , k 4: � � a ( i ) x − � i − 1 j =1 a ( j ) j = i +1 a ( j ) 1+ ρ a ( i ) n +1 − � k 1 ρ � � n +1 ← P C i + 5: n n 1+ ρ End For 6: n ← n + 1 7: 8: Until Convergence i =1 a ( i ) 9: Return � k n Minkowski Projection 11

Properties of the Algorithm • Assume k = 2 for exposition purpose: A = C 1 , B = C 2 . Proposition 1 . If both A and B are closed and convex, and A + B is closed, then the Algorithm with ρ = 0 generates a sequence converging to P A + B ( x ) . ≫ Proof: paracontraction (Elsner, Koltracht, and Neumann 1992; Lange 2013). Theorem 1 . If in addition either A or B is strongly convex , then the sequence generated by Algorithm with ρ = 0 converges linearly to P A + B ( x ) . ≫ Set C ⊂ R d is α -strongly convex with respect to norm � · � if there is a constant α > 0 such that for any a and b in C and any γ ∈ [0 , 1] , 2 � a − b � 2 centered at C contains a ball of radius r = γ (1 − γ ) α γ a + (1 − γ ) b (Garber and Hazan 2015). ≫ Ex) ℓ q -norm ball for q ∈ (1 , 2] Minkowski Projection 12

Theorem 2 . If A and B are closed and subanalytic (possibly non-convex), and at least one of them is bounded, then the sequence generated by the Algorithm with ρ > 0 converges to a critical point of (P) regardless of the initial values. ≫ Proof: Kurdyka-� Lojasiewicz inequality (Bolte, Daniilidis, and Lewis 2007). Theorem 3 . If A + B is polyhedral, then the Algorithm with ρ > 0 generates a sequence converging linearly to P A + B ( x ) . ≫ Proof: Luo-Tseng error bound (Karimi, Nutini, and Schmidt 2018). ≫ Ex) ℓ 1 , ∞ overlapping group penalty/multitask learning; polyhedra are not strongly convex Minkowski Projection 13

Applications to constrained learning

Overlapping group penalties/multitask learning k � x ∈ R d f ( x ) + λ min � x i 1 � p , i =1 C i = { y = ( y i 1 , y i 2 ) : � y i 1 � q ≤ λ, y i 2 = 0 } • Overlaps automatically handled with Minkowski projection. • If p ∈ [2 , ∞ ) , dual ℓ q -norm disks are strongly convex; if p = ∞ , polyhedral (linear convergence) • Fast and reliable algorithm for projection onto ℓ q -norm disks available (Liu and Ye 2010). Minkowski Projection 15

• Comparison to the dual projected gradient method used in SLEP (Yuan, Liu, and Ye 2011; Liu, Ji, and Ye 2011; Zhou, Zhang, and So 2015): overlapping group lasso: # groups=20 Diff. obj. values (SLEP − Minkowski) 120 ● ● ● SLEP ● ● ● 1e+01 Minkowski 100 Runtime (sec) no.groups 80 1e−03 10 20 60 50 1e−07 100 40 1e−11 20 ● ● ● ● 0 1e+03 1e+04 1e+05 1e+06 1e+03 1e+04 1e+05 1e+06 Dimension Dimension Minkowski Projection 16

Constrained lasso x ∈ R d f ( x ) + λ � x � 1 subject to Bx = 0 , Cx ≤ 0 , min • Zero-sum constrained lasso (Lin et al. 2014; Altenbuchinger et al. 2017): C 1 = { x : � d j =1 x j = 0 } ⊥ , C 2 = { 0 } , C 3 = { x : � x � ∞ ≤ λ } ( B = 1 T , C = 0 ). • Nonnegative lasso (Efron et al. 2004; El-Arini et al. 2013): C 1 = { 0 } , C 2 = { x : − x ≤ 0 } ∗ , C 3 = { x : � x � ∞ ≤ λ } ( B = 0 , C = − I ). Minkowski Projection 17

• Comparison to generic methods by Gaines, Kim, and Zhou (2018), including path algorithm, ADMM, and commercial solver Gurobi: nonnegative lasso zero-sum constrained lasso 40 500 path algorithm path algorithm Algorithm Rumtime (sec) 35 Algorithm Rumtime (sec) Gurobi ( =0.2 max ) Gurobi ( =0.2 max ) 400 30 ADMM ( =0.2 max ) ADMM ( =0.2 max ) Minkowski ( =0.2 max ) Minkowski ( =0.2 max ) 25 300 Gurobi ( =0.6 max ) Gurobi ( =0.6 max ) 20 ADMM ( =0.6 max ) ADMM ( =0.6 max ) 200 15 Minkowski ( =0.6 max ) Minkowski ( =0.6 max ) 10 100 5 0 0 (100,500) (500,1000) (1000,2000) (2000,4000) (4000,8000) (8000,16000) (100,500) (500,1000) (1000,2000) (2000,4000) (4000,8000) Problem Size, (n, d) Problem Size, (n, d) Minkowski Projection 18

Conclusion • Reconsider constrained learning problems: ≫ structural complexities such as non-separability can be handled gracefully via formulations involving Minkowski sums. • Very simple and efficient algorithm for projecting points onto Minkowski sums of sets: ≫ Linear rate of convergence whenever at least one summand is strongly convex or the Luo-Tseng error bound condition is satisfied. • Our algorithm can serve as an inner loop in, e.g, proximal gradient: ≫ Competitive performance ≫ Fast (inner loop) convergence is crucial. Minkowski Projection 19

Projection onto Minkowski Sums with Application to Constrained - PowerPoint PPT Presentation

Projection onto Minkowski Sums with Application to Constrained Learning Joong-Ho (Johann) Won 1 Jason Xu 2 Kenneth Lange 3 1 Department of Statistics, Seoul National University 2 Department of Statistical Science, Duke University 3 Departments of

Minkowski Sums Minkowski Sumdefinition, complexity, construction, applications Efi Fogel Tel

Iterated Minkowski sums, horoballs and north-south dynamics Jeremias Epperlein (joint with Tom

Minkowski Sums and Offsets of Polygons Seminar Computational Geometry and Geometric Computing

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Minkowski Sum of Convex Polyhedra Efi Fogel Tel Aviv University, Israel Applied Aspects of

A Minkowski problem for nonlinear capacity Andrew Vogel April 22, Boston AMS special session

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

and Motion Planning Translational Motion and Minkowski Sums Dan Halperin School of Computer

Drawing graphs with vertices and edges in convex position and large polygons in Minkowski sums

Last lecture Configuration Space Free-Space and C-Space Obstacles Minkowski Sums 1

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

Onto lo gy Co nstruc tio n fro m Online Onto lo gie s Harith Alani 15 th Int. World Wide Web

Complex Case Phenomena in the Grammar Matrix Scott Drellishak University of Washington July 28,

Convex Optimization: Modeling and Algorithms Lieven Vandenberghe Electrical Engineering

Projective Splitting Methods for Decomposing Convex Optimization Problems Jonat han Eckstein

Brndsted-Rockafellar property of subdifferentials of prox-bounded functions Marc Lassonde

Distributed nonsmooth composite optimization via the proximal augmented Lagrangian Neil K.

Stochastic Optimization Techniques for Big Data Machine Learning Tong Zhang Rutgers University

Plug & Manage Heterogeneous Sensing Devices Levent Grgen, Johan Nystrm-Persson, Amin

Inner and Outer Approximating Flowpipes for Delay Differential Equations Eric Goubault 1 Sylvie

Projection onto Minkowski Sums with Application to Constrained - PowerPoint PPT Presentation

Projection onto Minkowski Sums with Application to Constrained Learning Joong-Ho (Johann) Won 1 Jason Xu 2 Kenneth Lange 3 1 Department of Statistics, Seoul National University 2 Department of Statistical Science, Duke University 3 Departments of

Minkowski Sums Minkowski Sumdefinition, complexity, construction, applications Efi Fogel Tel

Iterated Minkowski sums, horoballs and north-south dynamics Jeremias Epperlein (joint with Tom

Minkowski Sums and Offsets of Polygons Seminar Computational Geometry and Geometric Computing

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Minkowski Sum of Convex Polyhedra Efi Fogel Tel Aviv University, Israel Applied Aspects of

A Minkowski problem for nonlinear capacity Andrew Vogel April 22, Boston AMS special session

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

and Motion Planning Translational Motion and Minkowski Sums Dan Halperin School of Computer

Drawing graphs with vertices and edges in convex position and large polygons in Minkowski sums

Last lecture Configuration Space Free-Space and C-Space Obstacles Minkowski Sums 1

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

Onto lo gy Co nstruc tio n fro m Online Onto lo gie s Harith Alani 15 th Int. World Wide Web

Complex Case Phenomena in the Grammar Matrix Scott Drellishak University of Washington July 28,

Convex Optimization: Modeling and Algorithms Lieven Vandenberghe Electrical Engineering

Projective Splitting Methods for Decomposing Convex Optimization Problems Jonat han Eckstein

Brndsted-Rockafellar property of subdifferentials of prox-bounded functions Marc Lassonde

Distributed nonsmooth composite optimization via the proximal augmented Lagrangian Neil K.

Stochastic Optimization Techniques for Big Data Machine Learning Tong Zhang Rutgers University

Plug &amp; Manage Heterogeneous Sensing Devices Levent Grgen*, Johan Nystrm-Persson*, Amin

Inner and Outer Approximating Flowpipes for Delay Differential Equations Eric Goubault 1 Sylvie

Plug & Manage Heterogeneous Sensing Devices Levent Grgen, Johan Nystrm-Persson, Amin