On max- k -sums Michael J. Todd January 10, 2018 School of - PowerPoint PPT Presentation

On max- k -sums Michael J. Todd January 10, 2018 School of Operations Research and Information Engineering, Cornell University http://people.orie.cornell.edu/ ∼ miketodd/todd.html 11th US-Mexico Workshop on Optimization and its Applications, Huatulco, January 2018

1. Definitions Given scalars y 1 , . . . , y n ∈ IR , define their max- k -sum as k � � M k ( y ) := max y i = y [ j ] | K | = k i ∈ K j =1 and their min- k -sum as n � � m k ( y ) := min y i = y [ j ] , | K | = k i ∈ K j = n − k +1 where y [1] , . . . , y [ n ] denote the y i ’s in nonincreasing order. These arise in • constraints in scenario-based conditional value at risk computation (giving a convex problem; restricting k out of n gives a MIP), • penalties for peak demand in electricity modelling, • and are related to Owl norms used in regularization in machine learning problems. Given functions f 1 , . . . , f n on IR d , define F k ( t ) := M k ( f 1 ( t ) , . . . , f n ( t )) and f k ( t ) := m k ( f 1 ( t ) , . . . , f n ( t )) .

2. Two Questions a) How can we define • smooth approximations to F k and f k , maintaining certain properties of the unsmoothed functions? b) How can we define (original or smoothed) max- k -sums [min- k -sums] if • the y i ’s lie in a vector space ordered by a convex cone, again preserving properties of the real case? Note that F k ( f k ) is the composition of M k ( m k ) with the map f from t to ( f 1 ( t ) , . . . , f n ( t )) , so most of the time we address only the latter functions. Desirable Properties • 0-consistency: M 0 ( y ) = m 0 ( y ) = 0 ; • n -consistency: M n ( y ) = m n ( y ) = � i y i ; • sign-reversal: m k ( y ) = − M k ( − y ) ; • summability: M k ( y ) + m n − k ( y ) = � i y i ; • translation invariance: M k ( y + η 1 ) = M k ( y ) + kη , m k ( y + η 1 ) = m k ( y ) + kη . • scale invariance: for α > 0 , M k ( αy ) = αM k ( y ) , m k ( αy ) = αm k ( y ) . • convexity: if f 1 , . . . , f n are convex, so is F k ; if they are concave, so is f k .

3. Smoothing via Randomization in the Domain A classical technique is to approximate a nonsmooth function h via a convolution or as an expectation: ˜ h ( t ) := E s h ( t − s ) � = h ( t − s ) φ ( s ) ds, where φ is the probability density function of a localized random variable s ∈ ℜ d . However, this shrinks the domain dom h := { t : h ( t ) < ∞} , inappropriate in some cases, and requires a computationally burdensome d -dimensional integration.

4. A Modification Instead, we randomize in the range of the functions: Let ξ 1 , . . . , ξ n be iid random variables distributed like the (continuous) random variable Ξ and set ¯ � M k ( y ) := E ξ 1 ,...,ξ n max ( y i − ξ i ) + kE Ξ | K | = k i ∈ K � m k ( y ) := E ξ 1 ,...,ξ n min ( y i − ξ i ) + kE Ξ ¯ | K | = k i ∈ K and then ¯ F k ( t ) := ¯ M k ( f ( t )) and ¯ f k ( t ) := ¯ m k ( f ( t )) . These functions inherit the smoothness of the f i ’s. Moreover, they inherit the domains of the nonsmooth functions. Further, they satisfy 0- and n -consistency, summability, translation invariance, and convexity, and the approximation bounds M k ( y ) ≤ ¯ M k ( y ) ≤ M k ( y ) + ¯ M k (0) ≤ M k ( y ) + min( k ¯ M 1 (0) , − ( n − k ) ¯ m 1 (0)) and m k (0) ≥ m k ( y ) − min(( n − k ) ¯ m k ( y ) ≥ ¯ m k ( y ) ≥ m k ( y ) + ¯ M 1 (0) , − k ¯ m 1 (0)) . They do not satisfy sign reversal or scale invariance, but m k ( y ; Ξ) = − ¯ M k (( − y ; − Ξ) ¯ and M k ( αy ; α Ξ) = α ¯ ¯ M k ( y ; Ξ) , m k , for positive α . and similarly for ¯

5. Evaluation To enable fairly efficient evaluation, we choose Gumbel random variables: P (Ξ > x ) = exp( − exp( x )) , E Ξ = − γ . Recall that z [ k ] denotes the k th largest component of a vector z ∈ ℜ n . We are interested in q k := E (( y − ξ ) [ k ] ) . � n − | K | − 1 � � � ( − 1) k −| K |− 1 q k = · · · = ln exp( y h ) + γ. k − | K | − 1 | K | <k h/ ∈ K From this, we obtain Theorem 1 � n − | K | − 2 � ¯ � � M k ( y ) = ( − 1) k −| K |− 1 ln exp( y h ) . k − | K | − 1 | K | <k ∈ K h/ ⊓ ⊔ � 0 � − 1 � p � � � (Here := := 1 , and otherwise := 0 if p < q .) 0 0 q We have reduced the work from an n -dimensional integration to a sum over O (( n ) k − 1 ) terms. Note that almost all the terms disappear for k = n , and we get ¯ M n ( y ) = M n ( y ) as expected.

6. Examples k = 1 : Here only K = ∅ contributes to the sum, so we obtain �� ¯ M 1 ( y ) = ln exp( y h ) . h Such functions have been used as potential functions in theoretical computer science, starting with Shahrokhi-Matula and Grigoriadis-Khachiyan, and are discussed by Tun¸ cel and Nemirovski in the context of barrier functions. They also appear in the economic literature on consumer choice, dating back to the 1960s (e.g., Luce and Suppes). This function is sometimes called the soft maximum of the y j ’s. This term is also used for the weight vector � exp( y i ) � . � h exp( y h ) M 1 and thus the gradient of ¯ F 1 is the weighted combination Note that this is the gradient of ¯ of those of the f j ’s using these weights for y = f ( t ) .

k = 2 : Here K can be the empty set or any singleton, and we find   �� ¯ � � M 2 ( y ) = − ( n − 2) ln exp( y h ) + ln exp( y h )  h i h � = i     �  + ln �  + = ln exp( y [ h ] ) exp( y [ h ] ) h � =2 h � =1 exp( y [ i ] ) � � � ln 1 − . � h exp( y h ) i> 2 Bounds Theorem 2 M k ( y ) ≤ ¯ M k ( y ) ≤ M k ( y ) + k ln n. If we want a closer (but “rougher”) approximation, we can scale the Gumbel random variables by α < 1 , or equivalently, scale the vector y by α − 1 , apply the formulae above, and then scale the result by α . If the y i ’s differ by orders of magnitude, the above expressions need to be carefully evaluated, but at the same time, we may be able to ignore many of the terms.

7. Formulation via (Continuous) Optimization Problems We note that M 1 ( y ) can be obtained as the optimal value of P ( M 1 ) : min { x : x ≥ y i for all i } and � � D ( M 1 ) : max { u i y i : u i = 1 , u i ≥ 0 for all i } ; i i either the smallest upper bound on the y i ’s or their largest convex combination. These are probably the simplest and most intuitive dual linear programming problems of all! Analogously, M k ( y ) is the optimal value of D ( M k ) : � � max { u i y i : u i = k, 0 ≤ u i ≤ 1 for all i } , i i with feasible region U := U k , whose dual is � P ( M k ) : min { kx + z i : x + z i ≥ y i , z i ≥ 0 , for all i } . i (Note that there is a slight abuse of notation: for k = 1 , these are not the same problems as above, but can be seen to be equivalent.) We can similarly obtain m 1 ( y ) and m k ( y ) .

8. Smoothing via Perturbation (` a la Nesterov) We define ˆ M k ( y ) to be the optimal value of ˆ � D ( M k ) : u i y i − g ∗ ( u ) : u ∈ U } , max { i where g ∗ := g ∗ k is a strongly convex function on U := U k satisfying certain properties, F k ( t ) , and ˆ m k ( y ) , ˆ f k ( t ) + ∞ off { u : � i u i = k } , with minimum 0 and maximum ∆ on U . We define ˆ analogously. We then have 0- and n -consistency, sign reversal, translation invariance, and summability M k is Lipschitz continuously differentiable. as long as g ∗ n − k ( u ) = g ∗ k ( 1 − u ) for u ∈ U n − k . Moreover, ˆ We also have scale invariance in the form ˆ M k ( αy, αg ∗ ) = αM k ( y, g ∗ ) , F k and ˆ the convexity property for ˆ f k , and the bounds M k ( y ) − ∆ ≤ ˆ M k ( y ) ≤ M k ( y ) , m k ( y ) ≤ ˆ m k ( y ) ≤ m k ( y ) + ∆ . The dual of ˆ D ( M k ) is ˆ P ( M k ) : � � min { kx + z i + g ( w ) : x + z i ≥ y i − w i , z i ≥ 0 , for all i ( and w i = 0) } , i i where g is the convex conjugate of g ∗ .

9. Examples Quadratic function Let 2( � u � 2 ) 2 − β ( k ) 2 g ∗ ( u ) := g ∗ k ( u ) := β 2 n . Then we can show that ˆ D ( M k ) is solved by u i = mid (0 , y i /β − λ, 1) for all i, for some λ , and we can solve the problem in O ( n ln n ) time by sorting and a binary search. Single-sided entropic function Next we let � n � g ∗ ( u ) := g ∗ k ( u ) := � u i ln u i + k ln k i for nonnegative u i ’s summing to k . Now we can find the optimal u from u i = min(exp( y i − λ ) , 1) for all i, for some λ , so the problem can again be solved in O ( n ln n ) time by sorting and a binary search. Interestingly, ˆ M 1 ( y ) = ¯ M 1 ( y ) − ln n , but there is no such relation for k > 1 , and the ˆ M k ’s are much easier to evaluate than the ¯ M k ’s.

On max- k -sums Michael J. Todd January 10, 2018 School of - PowerPoint PPT Presentation

On max- k -sums Michael J. Todd January 10, 2018 School of Operations Research and Information Engineering, Cornell University http://people.orie.cornell.edu/ miketodd/todd.html 11th US-Mexico Workshop on Optimization and its Applications,

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Binomial = + + + Theorem + + + + + Image by MIT OpenCourseWare. Products of Sums = Sums

Treasury (spending / accessing funds, A-Board (requesting funds) SUMS technical support) Form

Low-rank sums-of-squares representations Cynthia Vinzant, North Carolina State University joint

Sums of Money MDM4U: Mathematics of Data Management How many different sums of money can be made

Sums of two squares A tale of two sums Melanie Abel Department of Mathematics University of

18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 1 18.175 Lecture 7 Outline

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and

Foundations of Computer Science Lecture 9 Sums And Asymptotics Computing Sums Asymptotics:

Universal elliptic Gauss sums and applications Christian Berghoff Rheinische

Foundations of Computer Science Last Time Lecture 9 Sums And Asymptotics Computing Sums

Max India Limited Max India Limited I Investor Presentation t P t ti June, 2014

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway

Children, Families, Health and Human Services Interim Committee June 20, 2016 NetReflector

Faculty Orientation 2018 Agenda Introduction Technology Services Instructional

Escalating Privileges in Linux using Fault Injection Niek Timmers Cristofaro Mune

The Impact of an Innovative HRA-based Wellness & Prevention Demonstration on Claims-Based

Secure by default Anti-exploit techniques and hardenings in SUSE products Johannes Segitz SUSE

O NLINE S URVEY D ESIGN & I MPLEMENTATION Eden Kyse, PhD & Stephanie Prall Center for

Graph500 in the public cloud Master project Systems and Network Engineering Harm Dermois

On max- k -sums Michael J. Todd January 10, 2018 School of - PowerPoint PPT Presentation

On max- k -sums Michael J. Todd January 10, 2018 School of Operations Research and Information Engineering, Cornell University http://people.orie.cornell.edu/ miketodd/todd.html 11th US-Mexico Workshop on Optimization and its Applications,

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Binomial = + + + Theorem + + + + + Image by MIT OpenCourseWare. Products of Sums = Sums

Treasury (spending / accessing funds, A-Board (requesting funds) SUMS technical support) Form

Low-rank sums-of-squares representations Cynthia Vinzant, North Carolina State University joint

Sums of Money MDM4U: Mathematics of Data Management How many different sums of money can be made

Sums of two squares A tale of two sums Melanie Abel Department of Mathematics University of

18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 1 18.175 Lecture 7 Outline

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and

Foundations of Computer Science Lecture 9 Sums And Asymptotics Computing Sums Asymptotics:

Universal elliptic Gauss sums and applications Christian Berghoff Rheinische

Foundations of Computer Science Last Time Lecture 9 Sums And Asymptotics Computing Sums

Max India Limited Max India Limited I Investor Presentation t P t ti June, 2014

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway

Children, Families, Health and Human Services Interim Committee June 20, 2016 NetReflector

Faculty Orientation 2018 Agenda Introduction Technology Services Instructional

Escalating Privileges in Linux using Fault Injection Niek Timmers Cristofaro Mune

The Impact of an Innovative HRA-based Wellness &amp; Prevention Demonstration on Claims-Based

Secure by default Anti-exploit techniques and hardenings in SUSE products Johannes Segitz SUSE

O NLINE S URVEY D ESIGN &amp; I MPLEMENTATION Eden Kyse, PhD &amp; Stephanie Prall Center for

Graph500 in the public cloud Master project Systems and Network Engineering Harm Dermois

The Impact of an Innovative HRA-based Wellness & Prevention Demonstration on Claims-Based

O NLINE S URVEY D ESIGN & I MPLEMENTATION Eden Kyse, PhD & Stephanie Prall Center for