a primal dual smooth perceptron von neumann algorithm
play

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na - PowerPoint PPT Presentation

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34 Polyhedral feasibility problems R m n , consider the


  1. A primal-dual smooth perceptron-von Neumann algorithm Javier Pe˜ na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34

  2. Polyhedral feasibility problems ∈ R m × n , consider the alternative � � Given A := a 1 a 2 · · · a n feasibility problems A T y > 0 , (D) and Ax = 0 , x ≥ 0 , x � = 0 . (P) Theme Condition-based analysis of elementary algorithms for solving (P) and (D). 2 / 34

  3. Perceptron Algorithm Algorithm to solve A T y > 0 . (D) Perceptron Algorithm (Rosenblatt, 1958) y := 0 while A T y � > 0 a j � a j � , where a T y := y + j y ≤ 0 end while Throughout this talk: � · � = � · � 2 . 3 / 34

  4. Von Neumann’s Algorithm Algorithm to solve Ax = 0 , x ≥ 0 , x � = 0 . (P) Von Neumann’s Algorithm (von Neumann, 1948) x 0 := 1 n 1 ; y 0 := Ax 0 for k = 0 , 1 , . . . if a T j y k := min i a T i y k > 0 then halt: (P) is infeasible 1 − a T j y k λ k := argmin λ ∈ [0 , 1] � (1 − λ ) y k − λ a j � = � y k � 2 − 2 a T j y k +1 x k +1 := λ k x k + (1 − λ k ) e j , where j = argmin i a T i y k end for 4 / 34

  5. Elementary algorithms The perceptron and von Neumann’s algorithms are “elementary” algorithms. “Elementary” means that each iteration involves only simple computations. Why should we care about elementary algorithms? Some large-scale optimization problems (e.g., in compressive sensing) are not solvable via conventional Newton-based algorithms. In some cases, the entire matrix A may not be explicitly available at once. Elementary algorithms have been effective in these cases. 5 / 34

  6. Conditioning Throughout the sequel assume � � A = a 1 · · · a n , where � a j � = 1 , j = 1 , . . . , n . Key parameter j =1 ,..., n a T ρ ( A ) := max � y � =1 min j y . Goffin-Cheung-Cucker condition number 1 C ( A ) := | ρ ( A ) | . (This is closely related to Renegar’s condition number.) 6 / 34

  7. Conditioning Notice A T y > 0 feasible ⇔ ρ ( A ) > 0 . Ax = 0 , x ≥ 0 , x � = 0 feasible ⇔ ρ ( A ) ≤ 0 . Ill-posedness A is ill-posed when ρ ( A ) = 0. In this case both A T y > 0 and Ax = 0 , x > 0 are on the verge of feasibility. Theorem (Cheung & Cucker, 2001) a i − a i � : ˜ | ρ ( A ) | = min { max � ˜ A is ill-posed } . ˜ i A 7 / 34

  8. Some geometry When ρ ( A ) > 0, it is a measure of thickness of the feasible cone: � � r : B ( y , r ) ⊆ { z : A T z ≥ 0 } ρ ( A ) = max . � y � =1 ! small ρ ( A ) large ρ ( A ) 8 / 34

  9. More geometry Let ∆ n := { x ≥ 0 : � x � 1 = 1 } . Proposition (From Renegar 1995 and Cheung-Cucker 2001) | ρ ( A ) | = dist (0 , ∂ { Ax : x ≥ 0 , x ∈ ∆ n } ) . ρ ( A ) > 0 ρ ( A ) < 0 9 / 34

  10. Condition-based complexity Recall our problems of interest A T y > 0 , (D) and Ax = 0 , x ∈ ∆ n . (P) Theorem (Block-Novikoff 1962) If ρ ( A ) > 0 , then the perceptron algorithm terminates after at most 1 ρ ( A ) 2 = C ( A ) 2 iterations. 10 / 34

  11. Condition-based complexity Theorem (Dantzig, 1992) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) , i.e, x ∈ ∆ n with � Ax � < ǫ in at most 1 ǫ 2 iterations. Theorem (Epelman & Freund, 2000) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) in at most 1 � 1 � ρ ( A ) 2 · log ǫ iterations. 11 / 34

  12. Main Theorem Theorem (Soheili & P, 2012) A smooth version of perceptron/von Neumann’s algorithm such that: (a) If ρ ( A ) > 0 , then it finds a solution to A T y > 0 in at most � √ n � 1 �� O ρ ( A ) · log ρ ( A ) iterations. (b) If ρ ( A ) < 0 , then it finds an ǫ -solution to Ax = 0 , x ∈ ∆ n in at most � √ n � 1 �� O | ρ ( A ) | · log ǫ iterations. (c) Iterations are elementary (not much more complicated than those of the perceptron or von Neumann’s algorithms). 12 / 34

  13. Perceptron algorithm again Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . a T a T j y k := min i y k i y k +1 := y k + a j end for Observe a T � A T y , x � . a T j y := min i y ⇔ a j = Ax ( y ) , x ( y ) = argmin i x ∈ ∆ n Hence in the above algorithm y k = Ax k where x k ≥ 0 , � x k � 1 = k . 13 / 34

  14. Normalized Perceptron Algorithm � A T y , x � . Recall x ( y ) := argmin x ∈ ∆ n Normalized Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . 1 θ k := k +1 y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for In this algorithm y k = Ax k for x k ∈ ∆ n . 14 / 34

  15. Perceptron-Von Neumann’s Template Both the perceptron and von Neumann’s algorithms perform similar iterations. PVN Template x 0 ∈ ∆ n ; y 0 := Ax 0 for k = 0 , 1 , . . . x k +1 := (1 − θ k ) x k + θ k x ( y k ) y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for Observe 1 Recover (normalized) perceptron if θ k = k +1 Recover von Neumann’s if θ k = argmin � (1 − λ ) y k − λ Ax ( y k ) � . λ ∈ [0 , 1] 15 / 34

  16. Smooth Perceptron-Von Neumann Algorithm Apply Nesterov’s smoothing technique (Nesterov, 2005). Key step: Use a smooth version of � A T y , x � , x ( y ) = argmin x ∈ ∆ n namely, � A T y , x � + µ � x � 2 � x µ ( y ) := argmin 2 � x − ¯ , x ∈ ∆ n for some µ > 0 and ¯ x ∈ ∆ n . 16 / 34

  17. Smooth Perceptron-Von Neumann Algorithm Assume ¯ x ∈ ∆ n and δ > 0 are given inputs. Algorithm SPVN(¯ x , δ ) y 0 := A ¯ x ; µ 0 := n ; x 0 := x µ 0 ( y 0 ) for k = 0 , 1 , . . . 2 θ k := k +3 y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) if A T y k +1 > 0 then halt: y k +1 is a solution to (D) if � Ax k +1 � ≤ δ then halt: x k +1 is δ -solution to (P) end for 17 / 34

  18. PVN update versus SPVN update Update in PVN template y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) x k +1 := (1 − θ k ) x k + θ k x ( y k ) Update in Algorithm SPVN y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) 18 / 34

  19. Theorem (Soheili and P, 2011) Assume ¯ x ∈ ∆ n and δ > 0 are given. (a) If δ < ρ ( A ) , then Algorithm SPVN finds a solution to (D) in at most √ 2 2 n ρ ( A ) − 1 . iterations. (b) If ρ ( A ) < 0 , then Algorithm SPVN finds a δ -solution to (P) in at most √ 2 2 n − 1 δ iterations. 19 / 34

  20. Iterated Smooth Perceptron-Von Neumann Algorithm Assume γ > 1 is a given constant. Algorithm ISPVN( γ ) x 0 := 1 ˜ n 1 for i = 0 , 1 , . . . δ i := � A ˜ x i � γ ˜ x i +1 = SPVN(˜ x i , δ i ) end for 20 / 34

  21. Main Theorem Again Theorem (Soheili & P, 2012) (a) If ρ ( A ) > 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 2 n ρ ( A ) − 1 iterations. Consequently, Algorithm ISPVN finds a solution to (D) in at most √ � � 2 2 n · log(1 /ρ ( A )) ρ ( A ) − 1 log( γ ) SPVN iterations. (b) If ρ ( A ) < 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 γ 2 n | ρ ( A ) | − 1 iterations. Hence for ǫ > 0 Algorithm ISPVN finds an ǫ -solution to (P) in at most √ � � 2 γ 2 n · log(1 /ǫ ) | ρ ( A ) | − 1 log( γ ) SPVN iterations. 21 / 34

  22. Observe A “pure” SPVN ( δ = 0): � √ n � When ρ ( A ) > 0, it solves (D) in O iterations. ρ ( A ) � √ n � When ρ ( A ) < 0, it finds ǫ -solution to (P) in O iterations. ǫ ISPVN (iterated SPVN with gradual reduction on δ ): � √ n � �� 1 When ρ ( A ) > 0, it solves (D) in O ρ ( A ) log ρ ( A ) iterations. √ n � 1 � �� When ρ ( A ) < 0, it finds ǫ -solution to (P) in O | ρ ( A ) | log ǫ iterations. 22 / 34

  23. Perceptron and von Neumann’s as subgradient algorithms Let φ ( y ) := −� y � 2 x ∈ ∆ n � A T y , x � . + min 2 Observe  2 ρ ( A ) 2 1 if ρ ( A ) > 0 1 2 � Ax � 2 =  max φ ( y ) = min y x ∈ ∆ n 0 if ρ ( A ) ≤ 0 .  PVN Template: y k +1 = y k + θ k ( − y k + Ax ( y k )) is a subgradient algorithm for max φ ( y ) . y For µ > 0 and ¯ x ∈ ∆ n let � A T y , x � + µ − � y � 2 � x � 2 � φ µ ( y ) := + min 2 � x − ¯ 2 x ∈ ∆ n − � y � 2 + � A T y , x µ ( y ) � + µ x � 2 . = 2 � x µ ( y ) − ¯ 2 23 / 34

  24. Proof of Main Theorem Apply Nesterov’s excessive gap technique (Nesterov, 2005). Claim For all x ∈ ∆ n and y ∈ R m we have φ ( y ) ≤ 1 2 � Ax � 2 . Claim For all y ∈ R m we have φ ( y ) ≤ φ µ ( y ) ≤ φ ( y ) + 2 µ. Lemma The iterates x k ∈ ∆ n , y k ∈ R m , k = 0 , 1 , . . . generated by the SPVN Algorithm satisfy the Excessive Gap Condition 1 2 � Ax k � 2 ≤ φ µ k ( y k ) . 24 / 34

  25. Proof of Main Theorem (a): ρ ( A ) > 0 Putting together the two claims and lemma we get 1 2 ρ ( A ) 2 ≤ 1 2 � Ax k � 2 ≤ φ µ k ( y k ) ≤ φ ( y k ) + 2 µ k . So φ ( y k ) ≥ 1 2 ρ ( A ) 2 − 2 µ k . In the algorithm µ k = n · 1 3 · 2 k 2 n 2 n 4 · · · k +2 = ( k +1)( k +2) < ( k +1) 2 . Thus φ ( y k ) > 0, and consequently A T y k > 0, as soon as √ k ≥ 2 2 n ρ ( A ) − 1 . 25 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend