A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na - PowerPoint PPT Presentation

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe˜ na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34

Polyhedral feasibility problems ∈ R m × n , consider the alternative � � Given A := a 1 a 2 · · · a n feasibility problems A T y > 0 , (D) and Ax = 0 , x ≥ 0 , x � = 0 . (P) Theme Condition-based analysis of elementary algorithms for solving (P) and (D). 2 / 34

Perceptron Algorithm Algorithm to solve A T y > 0 . (D) Perceptron Algorithm (Rosenblatt, 1958) y := 0 while A T y � > 0 a j � a j � , where a T y := y + j y ≤ 0 end while Throughout this talk: � · � = � · � 2 . 3 / 34

Von Neumann’s Algorithm Algorithm to solve Ax = 0 , x ≥ 0 , x � = 0 . (P) Von Neumann’s Algorithm (von Neumann, 1948) x 0 := 1 n 1 ; y 0 := Ax 0 for k = 0 , 1 , . . . if a T j y k := min i a T i y k > 0 then halt: (P) is infeasible 1 − a T j y k λ k := argmin λ ∈ [0 , 1] � (1 − λ ) y k − λ a j � = � y k � 2 − 2 a T j y k +1 x k +1 := λ k x k + (1 − λ k ) e j , where j = argmin i a T i y k end for 4 / 34

Elementary algorithms The perceptron and von Neumann’s algorithms are “elementary” algorithms. “Elementary” means that each iteration involves only simple computations. Why should we care about elementary algorithms? Some large-scale optimization problems (e.g., in compressive sensing) are not solvable via conventional Newton-based algorithms. In some cases, the entire matrix A may not be explicitly available at once. Elementary algorithms have been effective in these cases. 5 / 34

Conditioning Throughout the sequel assume � � A = a 1 · · · a n , where � a j � = 1 , j = 1 , . . . , n . Key parameter j =1 ,..., n a T ρ ( A ) := max � y � =1 min j y . Goffin-Cheung-Cucker condition number 1 C ( A ) := | ρ ( A ) | . (This is closely related to Renegar’s condition number.) 6 / 34

Conditioning Notice A T y > 0 feasible ⇔ ρ ( A ) > 0 . Ax = 0 , x ≥ 0 , x � = 0 feasible ⇔ ρ ( A ) ≤ 0 . Ill-posedness A is ill-posed when ρ ( A ) = 0. In this case both A T y > 0 and Ax = 0 , x > 0 are on the verge of feasibility. Theorem (Cheung & Cucker, 2001) a i − a i � : ˜ | ρ ( A ) | = min { max � ˜ A is ill-posed } . ˜ i A 7 / 34

Some geometry When ρ ( A ) > 0, it is a measure of thickness of the feasible cone: � � r : B ( y , r ) ⊆ { z : A T z ≥ 0 } ρ ( A ) = max . � y � =1 ! small ρ ( A ) large ρ ( A ) 8 / 34

More geometry Let ∆ n := { x ≥ 0 : � x � 1 = 1 } . Proposition (From Renegar 1995 and Cheung-Cucker 2001) | ρ ( A ) | = dist (0 , ∂ { Ax : x ≥ 0 , x ∈ ∆ n } ) . ρ ( A ) > 0 ρ ( A ) < 0 9 / 34

Condition-based complexity Recall our problems of interest A T y > 0 , (D) and Ax = 0 , x ∈ ∆ n . (P) Theorem (Block-Novikoff 1962) If ρ ( A ) > 0 , then the perceptron algorithm terminates after at most 1 ρ ( A ) 2 = C ( A ) 2 iterations. 10 / 34

Condition-based complexity Theorem (Dantzig, 1992) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) , i.e, x ∈ ∆ n with � Ax � < ǫ in at most 1 ǫ 2 iterations. Theorem (Epelman & Freund, 2000) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) in at most 1 � 1 � ρ ( A ) 2 · log ǫ iterations. 11 / 34

Main Theorem Theorem (Soheili & P, 2012) A smooth version of perceptron/von Neumann’s algorithm such that: (a) If ρ ( A ) > 0 , then it finds a solution to A T y > 0 in at most � √ n � 1 �� O ρ ( A ) · log ρ ( A ) iterations. (b) If ρ ( A ) < 0 , then it finds an ǫ -solution to Ax = 0 , x ∈ ∆ n in at most � √ n � 1 �� O | ρ ( A ) | · log ǫ iterations. (c) Iterations are elementary (not much more complicated than those of the perceptron or von Neumann’s algorithms). 12 / 34

Perceptron algorithm again Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . a T a T j y k := min i y k i y k +1 := y k + a j end for Observe a T � A T y , x � . a T j y := min i y ⇔ a j = Ax ( y ) , x ( y ) = argmin i x ∈ ∆ n Hence in the above algorithm y k = Ax k where x k ≥ 0 , � x k � 1 = k . 13 / 34

Normalized Perceptron Algorithm � A T y , x � . Recall x ( y ) := argmin x ∈ ∆ n Normalized Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . 1 θ k := k +1 y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for In this algorithm y k = Ax k for x k ∈ ∆ n . 14 / 34

Perceptron-Von Neumann’s Template Both the perceptron and von Neumann’s algorithms perform similar iterations. PVN Template x 0 ∈ ∆ n ; y 0 := Ax 0 for k = 0 , 1 , . . . x k +1 := (1 − θ k ) x k + θ k x ( y k ) y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for Observe 1 Recover (normalized) perceptron if θ k = k +1 Recover von Neumann’s if θ k = argmin � (1 − λ ) y k − λ Ax ( y k ) � . λ ∈ [0 , 1] 15 / 34

Smooth Perceptron-Von Neumann Algorithm Apply Nesterov’s smoothing technique (Nesterov, 2005). Key step: Use a smooth version of � A T y , x � , x ( y ) = argmin x ∈ ∆ n namely, � A T y , x � + µ � x � 2 � x µ ( y ) := argmin 2 � x − ¯ , x ∈ ∆ n for some µ > 0 and ¯ x ∈ ∆ n . 16 / 34

Smooth Perceptron-Von Neumann Algorithm Assume ¯ x ∈ ∆ n and δ > 0 are given inputs. Algorithm SPVN(¯ x , δ ) y 0 := A ¯ x ; µ 0 := n ; x 0 := x µ 0 ( y 0 ) for k = 0 , 1 , . . . 2 θ k := k +3 y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) if A T y k +1 > 0 then halt: y k +1 is a solution to (D) if � Ax k +1 � ≤ δ then halt: x k +1 is δ -solution to (P) end for 17 / 34

PVN update versus SPVN update Update in PVN template y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) x k +1 := (1 − θ k ) x k + θ k x ( y k ) Update in Algorithm SPVN y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) 18 / 34

Theorem (Soheili and P, 2011) Assume ¯ x ∈ ∆ n and δ > 0 are given. (a) If δ < ρ ( A ) , then Algorithm SPVN finds a solution to (D) in at most √ 2 2 n ρ ( A ) − 1 . iterations. (b) If ρ ( A ) < 0 , then Algorithm SPVN finds a δ -solution to (P) in at most √ 2 2 n − 1 δ iterations. 19 / 34

Iterated Smooth Perceptron-Von Neumann Algorithm Assume γ > 1 is a given constant. Algorithm ISPVN( γ ) x 0 := 1 ˜ n 1 for i = 0 , 1 , . . . δ i := � A ˜ x i � γ ˜ x i +1 = SPVN(˜ x i , δ i ) end for 20 / 34

Main Theorem Again Theorem (Soheili & P, 2012) (a) If ρ ( A ) > 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 2 n ρ ( A ) − 1 iterations. Consequently, Algorithm ISPVN finds a solution to (D) in at most √ � � 2 2 n · log(1 /ρ ( A )) ρ ( A ) − 1 log( γ ) SPVN iterations. (b) If ρ ( A ) < 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 γ 2 n | ρ ( A ) | − 1 iterations. Hence for ǫ > 0 Algorithm ISPVN finds an ǫ -solution to (P) in at most √ � � 2 γ 2 n · log(1 /ǫ ) | ρ ( A ) | − 1 log( γ ) SPVN iterations. 21 / 34

Observe A “pure” SPVN ( δ = 0): � √ n � When ρ ( A ) > 0, it solves (D) in O iterations. ρ ( A ) � √ n � When ρ ( A ) < 0, it finds ǫ -solution to (P) in O iterations. ǫ ISPVN (iterated SPVN with gradual reduction on δ ): � √ n � �� 1 When ρ ( A ) > 0, it solves (D) in O ρ ( A ) log ρ ( A ) iterations. √ n � 1 � �� When ρ ( A ) < 0, it finds ǫ -solution to (P) in O | ρ ( A ) | log ǫ iterations. 22 / 34

Perceptron and von Neumann’s as subgradient algorithms Let φ ( y ) := −� y � 2 x ∈ ∆ n � A T y , x � . + min 2 Observe  2 ρ ( A ) 2 1 if ρ ( A ) > 0 1 2 � Ax � 2 =  max φ ( y ) = min y x ∈ ∆ n 0 if ρ ( A ) ≤ 0 .  PVN Template: y k +1 = y k + θ k ( − y k + Ax ( y k )) is a subgradient algorithm for max φ ( y ) . y For µ > 0 and ¯ x ∈ ∆ n let � A T y , x � + µ − � y � 2 � x � 2 � φ µ ( y ) := + min 2 � x − ¯ 2 x ∈ ∆ n − � y � 2 + � A T y , x µ ( y ) � + µ x � 2 . = 2 � x µ ( y ) − ¯ 2 23 / 34

Proof of Main Theorem Apply Nesterov’s excessive gap technique (Nesterov, 2005). Claim For all x ∈ ∆ n and y ∈ R m we have φ ( y ) ≤ 1 2 � Ax � 2 . Claim For all y ∈ R m we have φ ( y ) ≤ φ µ ( y ) ≤ φ ( y ) + 2 µ. Lemma The iterates x k ∈ ∆ n , y k ∈ R m , k = 0 , 1 , . . . generated by the SPVN Algorithm satisfy the Excessive Gap Condition 1 2 � Ax k � 2 ≤ φ µ k ( y k ) . 24 / 34

Proof of Main Theorem (a): ρ ( A ) > 0 Putting together the two claims and lemma we get 1 2 ρ ( A ) 2 ≤ 1 2 � Ax k � 2 ≤ φ µ k ( y k ) ≤ φ ( y k ) + 2 µ k . So φ ( y k ) ≥ 1 2 ρ ( A ) 2 − 2 µ k . In the algorithm µ k = n · 1 3 · 2 k 2 n 2 n 4 · · · k +2 = ( k +1)( k +2) < ( k +1) 2 . Thus φ ( y k ) > 0, and consequently A T y k > 0, as soon as √ k ≥ 2 2 n ρ ( A ) − 1 . 25 / 34

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na - PowerPoint PPT Presentation

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34 Polyhedral feasibility problems R m n , consider the

The von Neumann Architecture The von Neumann Architecture of Computer Systems of Computer

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Computer Architecture Review CS 562 1 The von Neumann Model John von Neumann (1946)

Set Theory and von Neumann algebras Rom an Sasyk ENS Lyon & Universidad de Buenos Aires

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Template Rendering DTL Usage Variables {{variable}} Notation: IF Variable Not Valid:

Where are we? Knowledge Engineering Semester 2, 2004-05 Last time . . . Michael Rovatsos

Momentum relation and classical limit in the future-not-included complex action theory

Outline Paper presentation Ultra-Portable Devices Introduction. Paper: Paper:

#prep X Assembly 02: Left Fan In this guide, we attach Left filament fan to the X carriage.

Coulomb Branch and the Moduli Space of Instantons Giulia Ferlito Imperial College London March

Index theorems on anti-self-dual orbifolds Jeff Viaclovsky August 6, 2012, Kyoto Introduction

tr Prr trt

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na - PowerPoint PPT Presentation

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34 Polyhedral feasibility problems R m n , consider the

The von Neumann Architecture The von Neumann Architecture of Computer Systems of Computer

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Computer Architecture Review CS 562 1 The von Neumann Model John von Neumann (1946)

Set Theory and von Neumann algebras Rom an Sasyk ENS Lyon &amp; Universidad de Buenos Aires

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Template Rendering DTL Usage Variables {{variable}} Notation: IF Variable Not Valid:

Where are we? Knowledge Engineering Semester 2, 2004-05 Last time . . . Michael Rovatsos

Momentum relation and classical limit in the future-not-included complex action theory

Outline Paper presentation Ultra-Portable Devices Introduction. Paper: Paper:

#prep X Assembly 02: Left Fan In this guide, we attach Left filament fan to the X carriage.

Coulomb Branch and the Moduli Space of Instantons Giulia Ferlito Imperial College London March

Index theorems on anti-self-dual orbifolds Jeff Viaclovsky August 6, 2012, Kyoto Introduction

tr Prr trt

Set Theory and von Neumann algebras Rom an Sasyk ENS Lyon & Universidad de Buenos Aires