E ffi cient Bregman Projections Onto the Simplex Walid Krichene - PowerPoint PPT Presentation

Introduction Projection Algorithms Numerical experiments E ffi cient Bregman Projections Onto the Simplex Walid Krichene Syrine Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley ENSIMAG and Criteo Labs, France ! December 16, 2015 1/15

Introduction Projection Algorithms Numerical experiments Outline 1 Introduction 2 Projection Algorithms 3 Numerical experiments 1/15

Introduction Projection Algorithms Numerical experiments Bregman Projections onto the simplex Bregman projections are the building block of mirror descent (Nemirovski and Yudin) and dual averaging (Nesterov). Convex optimization: min x ∈ X f ( x ) Online learning (regret minimization). 2/15

Introduction Projection Algorithms Numerical experiments Bregman Projections onto the simplex Bregman projections are the building block of mirror descent (Nemirovski and Yudin) and dual averaging (Nesterov). Convex optimization: min x ∈ X f ( x ) Online learning (regret minimization). Algorithm 2 Mirror descent method 1: for ⌧ 2 N do Query a sub-gradient vector g ( ⌧ ) 2 @ f ( x ( ⌧ ) ) (or loss vector) 2: Update 3: x ( ⌧ + 1 ) = arg min D ( x , ( r ) − 1 ( r ( x ( ⌧ ) ) � ⌘ ⌧ g ( ⌧ ) )) (1) x ∈ X : strongly convex distance generating function. D : Bregman divergence. 2/15

Introduction Projection Algorithms Numerical experiments Illustration of Bregman projections E E ∗ X r ψ x ( τ ) � η τ g ( τ ) x ( τ +1) ( r ψ ) − 1 Figure: Illustration of a mirror descent iteration. x ( ⌧ + 1 ) = arg min D ( x , ( r ) − 1 ( r ( x ( ⌧ ) ) � ⌘ ⌧ g ( ⌧ ) )) x ∈ X 3/15

Introduction Projection Algorithms Numerical experiments More precisely Feasible set is the simplex (or cartesian product of simplexes) ( ) X x 2 R d ∆ = + : x i = 1 i Motivation: online learning, optimization with probability distributions. 4/15

Introduction Projection Algorithms Numerical experiments More precisely Feasible set is the simplex (or cartesian product of simplexes) ( ) X x 2 R d ∆ = + : x i = 1 i Motivation: online learning, optimization with probability distributions. DGF is induced by a potential. X ( x ) = f ( x i ) i R x 1 � − 1 ( u ) du , � increasing, called the potential. f ( x ) = Consequence: known expression of r and ( r ) − 1 . 4/15

Introduction Projection Algorithms Numerical experiments Outline 1 Introduction 2 Projection Algorithms 3 Numerical experiments 4/15

Introduction Projection Algorithms Numerical experiments Projection algorithms General strategy: Derive optimality conditions Design algorithm to satisfy conditions. 5/15

Introduction Projection Algorithms Numerical experiments Optimality conditions x ? = arg min D ( x , ( r ) − 1 ( r (¯ x ) � ¯ g ) x ∈ X Optimality conditions x ? is optimal if and only if 9 ⌫ ? 2 R : ( � ( � − 1 (¯ x ? � g i + ⌫ ? ) � 8 i , i = x i ) � ¯ + , P d i = 1 x ? i = 1 , Proof: write KKT conditions, eliminate complementary slackness. 6/15

Introduction Projection Algorithms Numerical experiments Optimality conditions x ? = arg min D ( x , ( r ) − 1 ( r (¯ x ) � ¯ g ) x ∈ X Optimality conditions x ? is optimal if and only if 9 ⌫ ? 2 R : ( � ( � − 1 (¯ x ? � g i + ⌫ ? ) � 8 i , i = x i ) � ¯ + , P d i = 1 x ? i = 1 , Proof: write KKT conditions, eliminate complementary slackness. Comments: Reduced a problem in dimension d to a problem in dimension 1. � ( � − 1 (¯ The function c : ⌫ 7! P � � x i ) � ¯ g i + ⌫ ) + is increasing. i Can solve for ⌫ ? using bisection. 6/15

Introduction Projection Algorithms Numerical experiments Bisection algorithm for general divergences Algorithm 3 Bisection method to compute the projection x ? with precision ✏ . 1: Input: ¯ x , ¯ g , ✏ . 2: Initialize ⌫ = � − 1 ( 1 ) � max � − 1 (¯ ¯ x i ) � ¯ g i i ⌫ = � − 1 ( 1 / d ) � max � − 1 (¯ x i ) � ¯ g i i 3: while c ( ⌫ ) � c ( ⌫ ) > ✏ do Let ⌫ + ¯ ⌫ + ⌫ 4: 2 if c ( ⌫ + ) > 1 then 5: ⌫ ⌫ + ¯ 6: else 7: ⌫ ⌫ + 8: � ( � − 1 (¯ � � 9: Return ˜ x (¯ ⌫ ) = x i ) � ¯ g i + ¯ ⌫ ) + Theorem The algorithm terminates after O ( ln 1 ✏ ) iterations, and outputs ˜ x such that ⌫ ) � x ? k  ✏ k ˜ x (¯ 7/15

Introduction Projection Algorithms Numerical experiments Exact projections for exponential divergences Special case 1: ( x ) = k x k 2 : can compute the solution exactly [1]. [1] J. Duchi, S. Shalev-Schwartz, Y. Singer, T. Chandra, E ffi cient Projections onto the ` 1 Ball for Learning in High Dimensions, ICML 2008. 8/15

Introduction Projection Algorithms Numerical experiments Exact projections for exponential divergences Special case 1: ( x ) = k x k 2 : can compute the solution exactly [1]. Special case 2: Exponential divergence: � ✏ : ( �1 , + 1 ) ! ( � ✏ , + 1 ) u 7! e u − 1 � ✏ , [1] J. Duchi, S. Shalev-Schwartz, Y. Singer, T. Chandra, E ffi cient Projections onto the ` 1 Ball for Learning in High Dimensions, ICML 2008. 8/15

Introduction Projection Algorithms Numerical experiments Exact projections for exponential divergences Special case 1: ( x ) = k x k 2 : can compute the solution exactly [1]. Special case 2: Exponential divergence: � ✏ : ( �1 , + 1 ) ! ( � ✏ , + 1 ) u 7! e u − 1 � ✏ , For ✏ = 0: ( x ) = H ( x ) = P i x i ln x i (negative entropy). D ( x , y ) = D KL ( x , y ) . [1] J. Duchi, S. Shalev-Schwartz, Y. Singer, T. Chandra, E ffi cient Projections onto the ` 1 Ball for Learning in High Dimensions, ICML 2008. 8/15

Introduction Projection Algorithms Numerical experiments Exact projections for exponential divergences Special case 1: ( x ) = k x k 2 : can compute the solution exactly [1]. Special case 2: Exponential divergence: � ✏ : ( �1 , + 1 ) ! ( � ✏ , + 1 ) u 7! e u − 1 � ✏ , For ✏ = 0: ( x ) = H ( x ) = P i x i ln x i (negative entropy). D ( x , y ) = D KL ( x , y ) . For ✏ > 0: ( x ) = H ( x + ✏ ) D ( x , y ) = D KL ( x + ✏ , y + ✏ ) . [1] J. Duchi, S. Shalev-Schwartz, Y. Singer, T. Chandra, E ffi cient Projections onto the ` 1 Ball for Learning in High Dimensions, ICML 2008. 8/15

Introduction Projection Algorithms Numerical experiments Motivation Bregman projection with KL divergence. Hedge algorithm in online learning. Multiplicative weights algorithm. Exponentiated gradient descent. Has closed-form solution in O ( d ) 9/15

Introduction Projection Algorithms Numerical experiments Motivation Bregman projection with KL divergence. Hedge algorithm in online learning. Multiplicative weights algorithm. Exponentiated gradient descent. Has closed-form solution in O ( d ) D KL ( x, y 0 ) However: D KL, ✏ ( x, y 0 ) ` ✏ 2 k x � y 0 k 2 D KL ( x , y ) unbounded on the simplex 1 L ✏ 2 k x � y 0 k 2 1 (problematic for stochastic mirror descent). H ( x ) is not a smooth function (problematic for accelerated mirror descent). Taking ✏ > 0 solves these issues. 0 1 p 9/15

Introduction Projection Algorithms Numerical experiments Optimality conditions Recall general optimality condition: x ? � ( � − 1 (¯ g i + ⌫ ? ) i = � x i ) � ¯ � + . Optimality conditions with exponential divergence Let x ? be the solution and I = { i : x ? i > 0 } its support. Then 8 x i + ✏ ) e − ¯ gi i = � ✏ + (¯ x ? 8 i 2 I , , < Z ? (2) x i + ✏ ) e − ¯ gi P i ∈ I (¯ Z ? = . : 1 + |I| ✏ g i , then x i + ✏ ) e − ¯ Furthermore, if ¯ y i = (¯ ( i 2 I and ¯ y j > ¯ y i ) ) j 2 I 10/15

Introduction Projection Algorithms Numerical experiments A sorting-based algorithm Algorithm 4 Sorting method to compute the Bregman projection with D ✏ 1: Input: ¯ x , ¯ g 2: Output: x ? x i + ✏ ) e − ¯ g i 3: Form the vector ¯ y i = (¯ 4: Sort ¯ y , let ¯ y � ( i ) be the i -th smallest element of y . 5: Let j ? be the smallest index for which X ( 1 + ✏ ( d � j + 1 ))¯ y � ( j ) � ✏ y � ( i ) > 0 ¯ i ≥ j P i ≥ j ? ¯ y � ( i ) 6: Set Z = 1 + ✏ ( d − j ? + 1 ) 7: Set ✓ � ✏ + ¯ y i ◆ x ? i = Z + Complexity: O ( d ln d ) 11/15

Introduction Projection Algorithms Numerical experiments A randomized-pivot algorithm Adapted from the QuickSelect algorithm: Select i th element of a vector ¯ y . Can sort then return i th element: O ( d ln d ) . QuickSelect: expected O ( d ) , worst-case O ( d 2 ) . 12/15

Introduction Projection Algorithms Numerical experiments A randomized-pivot algorithm 9 1 4 8 7 2 3 5 6 k = 5 12/15

Introduction Projection Algorithms Numerical experiments A randomized-pivot algorithm k = 5 9 1 4 8 7 2 3 5 6 1 2 9 4 8 7 3 5 6 12/15

Introduction Projection Algorithms Numerical experiments A randomized-pivot algorithm k = 5 9 1 4 8 7 2 3 5 6 1 2 9 4 8 7 3 5 6 9 4 8 7 3 5 6 k = 3 12/15

E ffi cient Bregman Projections Onto the Simplex Walid Krichene - PowerPoint PPT Presentation

Introduction Projection Algorithms Numerical experiments E ffi cient Bregman Projections Onto the Simplex Walid Krichene Syrine Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley ENSIMAG and Criteo Labs, France

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Simplex Method and Reduced Costs, Duality and Marginal Costs Frdric Giroire FG Simplex

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work

simplex simplex Just sign in and upload Features Web-based UI Publish videos from

Linear Programming Chapter 6.14-7.3 Bjrn Morn 1 Simplex Method with Upper Bounds Optjmality

The Revised Simplex Method Combinatorial Problem Solving (CPS) Javier Larrosa Albert Oliveras

Crazy Picture. Maximum matching and simplex. z y x Maximum matching and simplex. max x + y + z

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

A Bregman near neighbor lower bound via directed isoperimetry Amirali Abdullah Suresh

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

simplex Pro simplex Pro Webcasting Studio Features Webcast with video & slides

More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2

Constrained optimization Problem in standard form minimize f ( x ) subject to a i ( x ) = 0, for i

Solving bitvectors with MCSAT: explanations from bits and pieces Stphane Graham-Lengrand, Dejan

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Network Flow IV - Given information about flights that an airline needs to Applications II

An MCSAT treatment of Bit-Vectors (work-in-progress) Stphane Graham-Lengrand and Dejan

Policy & Financing Executive Directors Office & Behavioral Health Hearing Kim

reflect those of anyone else in the Federal Reserve System.

E ffi cient Bregman Projections Onto the Simplex Walid Krichene - PowerPoint PPT Presentation

Introduction Projection Algorithms Numerical experiments E ffi cient Bregman Projections Onto the Simplex Walid Krichene Syrine Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley ENSIMAG and Criteo Labs, France

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Simplex Method and Reduced Costs, Duality and Marginal Costs Frdric Giroire FG Simplex

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work

simplex simplex Just sign in and upload Features Web-based UI Publish videos from

Linear Programming Chapter 6.14-7.3 Bjrn Morn 1 Simplex Method with Upper Bounds Optjmality

The Revised Simplex Method Combinatorial Problem Solving (CPS) Javier Larrosa Albert Oliveras

Crazy Picture. Maximum matching and simplex. z y x Maximum matching and simplex. max x + y + z

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

MELODI M achin E L earning, O ptimization, &amp; D ata I nterpretation @ UW Iyer &amp; Bilmes,

A Bregman near neighbor lower bound via directed isoperimetry Amirali Abdullah Suresh

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

simplex Pro simplex Pro Webcasting Studio Features Webcast with video &amp; slides

More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2

Constrained optimization Problem in standard form minimize f ( x ) subject to a i ( x ) = 0, for i

Solving bitvectors with MCSAT: explanations from bits and pieces Stphane Graham-Lengrand, Dejan

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Network Flow IV - Given information about flights that an airline needs to Applications II

An MCSAT treatment of Bit-Vectors (work-in-progress) Stphane Graham-Lengrand and Dejan

Policy &amp; Financing Executive Directors Office &amp; Behavioral Health Hearing Kim

reflect those of anyone else in the Federal Reserve System.

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

simplex Pro simplex Pro Webcasting Studio Features Webcast with video & slides

Policy & Financing Executive Directors Office & Behavioral Health Hearing Kim