On the Convergence of No-regret Learning in Selfish Routing ICML - PowerPoint PPT Presentation

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighès 2 Alexandre Bayen 3 UC Berkeley Ecole Polytechnique June 23, 2014 1 walid@cs.berkeley.edu 2 benjamin.drighes@polytechnique.edu 3 bayen@berkeley.edu

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Introduction Routing game: players choose routes. Population distributions: µ ( t ) ∈ ∆ P 1 × · · · × ∆ P K Nash equilibria: N µ ( t ) = 1 τ ≤ t µ ( τ ) → N . � Under no-regret dynamics, ¯ t Does µ ( t ) → N ?

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Outline Online learning in the routing game 1 µ ( t ) Convergence of ¯ 2 Convergence of µ ( t ) 3

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k Player x ∈ X k : distribution over paths π ( x ) ∈ ∆ P k , Population distribution over paths µ k ∈ ∆ P k , µ k = � X k π ( x ) dm ( x ) Loss on path p : ℓ k p ( µ )

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Online learning model π ( t ) ∈ ∆ P 1 Sample p ∼ π ( t ) Discover ℓ ( t ) ∈ [ 0 , 1 ] P 1 Update π ( t + 1 )

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ The Hedge algorithm Hedge algorithm Update the distribution according to observed loss p e − η t ℓ k ( t ) π ( t + 1 ) ∝ π ( t ) p p

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Nash equilibrium µ ∈ N if ∀ k , ∀ p ∈ P k with positive mass, p ′ ( µ ) ∀ p ′ ∈ P k ℓ k p ( µ ) ≤ ℓ k How to compute Nash equilibria?

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Nash equilibrium µ ∈ N if ∀ k , ∀ p ∈ P k with positive mass, p ′ ( µ ) ∀ p ′ ∈ P k ℓ k p ( µ ) ≤ ℓ k How to compute Nash equilibria? Convex formulation

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Convex potential function � ( M µ ) e � V ( µ ) = c e ( u ) du 0 e V is convex. ∇ µ k V ( µ ) = ℓ k ( µ ) . Minimizer not unique. How do players find a Nash equilibrium? Iterative play.

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Convex potential function � ( M µ ) e � V ( µ ) = c e ( u ) du 0 e V is convex. ∇ µ k V ( µ ) = ℓ k ( µ ) . Minimizer not unique. How do players find a Nash equilibrium? Iterative play. Ideally: distributed, and has reasonable information requirements.

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Assume sublinear regret dynamics Losses are in [ 0 , 1 ] . π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � Expected loss is Discounted regret t ≤ T γ t ℓ k ( t ) � π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � t ≤ T γ t − min p � p r ( T ) ( x ) = ¯ � t ≤ T γ t

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Assume sublinear regret dynamics Losses are in [ 0 , 1 ] . π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � Expected loss is Discounted regret t ≤ T γ t ℓ k ( t ) � π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � t ≤ T γ t − min p � p r ( T ) ( x ) = ¯ � t ≤ T γ t Assumptions γ ( t ) > 0 γ ( t ) ↓ 0 � t γ ( t ) = ∞

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence to Nash equilibria Population regret 1 � r k ( T ) = r ( T ) ( x ) dm ( x ) ¯ ¯ m ( X k ) X k Convergence of averages to Nash equilibria If an update has sublinear population regret, then µ ( T ) = � t ≤ T γ t µ ( t ) / � ¯ t ≤ T γ t converges � � µ ( T ) , N T →∞ d lim ¯ = 0

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence to Nash equilibria Population regret 1 � r k ( T ) = r ( T ) ( x ) dm ( x ) ¯ ¯ m ( X k ) X k Convergence of averages to Nash equilibria If an update has sublinear population regret, then µ ( T ) = � t ≤ T γ t µ ( t ) / � ¯ t ≤ T γ t converges � � µ ( T ) , N T →∞ d lim ¯ = 0 Proof: show � µ ( T ) ) − V ( µ ∗ ) ≤ r k ( T ) V (¯ ¯ k Similar result in Blum et al. (2006)

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence of a dense subsequence Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of ( µ ( t ) ) t converges to N Subsequence ( µ ( t ) ) t ∈T converges � t ∈T : t ≤ T γ t lim T →∞ = 1 � t ≤ T γ t

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence of a dense subsequence Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of ( µ ( t ) ) t converges to N Subsequence ( µ ( t ) ) t ∈T converges � t ∈T : t ≤ T γ t lim T →∞ = 1 � t ≤ T γ t Proof. Absolute Cesàro convergence implies convergence of a dense subsequence.

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Example: Hedge with learning rates γ τ p e − η t ℓ k ( t ) π ( t + 1 ) ∝ π ( t ) p p Regret bound Under Hedge with η t = γ t , ln π ( 0 ) t ≤ T γ 2 min ( x ) + c � t r ( T ) ( x ) ≤ ρ ¯ � t ≤ T γ t

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Simulations 0 1 5 4 6 2 3 Figure : Example network

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Simulations path p 0 = ( v 0 , v 4 , v 5 , v 1 ) 2 . 5 path p 1 = ( v 0 , v 4 , v 6 , v 1 ) τ →∞ µ 1( τ ) : Nash equilibrium p 1 lim path p 2 = ( v 0 , v 1 ) 2 p ( µ ( τ ) ) 1 . 5 ℓ 1 1 p 0 0 . 5 µ 1(0) : uniform 0 10 20 30 40 50 τ p 2 path p 3 = ( v 2 , v 4 , v 5 , v 3 ) 2 . 5 p 4 path p 4 = ( v 2 , v 4 , v 6 , v 3 ) path p 5 = ( v 2 , v 3 ) 2 µ 2(0) : uniform p ( µ ( τ ) ) 1 . 5 ℓ 2 p 3 1 0 . 5 τ →∞ µ 2( τ ) : Nash equilibrium 0 10 20 30 40 50 lim τ p 5 Figure : Path losses and strategies for the Hedge algorithm with γ τ = 1 / ( 10 + τ )

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Sufficient conditions for convergence of ( µ ( t ) ) t µ ( t ) → N . Have ¯

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Sufficient conditions for convergence of ( µ ( t ) ) t µ ( t ) → N . Have ¯ Sufficient condition If V ( µ ( t ) ) converges ( µ ( t ) need not converge), then V ( µ ( t ) ) → V ∗ µ ( t ) → N ( V is continuous, µ ( t ) ∈ ∆ compact)

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Imagine an underlying continuous time. Updates happen at γ 1 , γ 1 + γ 2 , . . . γ 1 + γ 2 0 γ 1 . . . Figure : Underlying continuous time

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Imagine an underlying continuous time. Updates happen at γ 1 , γ 1 + γ 2 , . . . γ 1 + γ 2 0 γ 1 . . . Figure : Underlying continuous time In the update equation µ ( t + 1 ) ∝ µ ( t ) p e − γ t ℓ p ( t ) , take γ t → 0 p We obtain the autonomous ODE: Replicator equation ∀ p ∈ P k , d µ k p = µ k �� ℓ k ( µ ) , µ k � − ℓ k � p ( µ ) (1) p dt Also in evolutionary game theory.

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points.

µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points. Proof: V is a Lyapunov function.

On the Convergence of No-regret Learning in Selfish Routing ICML - PowerPoint PPT Presentation

( t ) Convergence of ( t ) Online learning in the routing game Convergence of On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighs 2 Alexandre Bayen 3 UC Berkeley Ecole

Congestion Games and Selfish Routing Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

The coevolution of altruism and punishment: role of the selfish punisher Mayuko Nakamaru Tokyo

The Price of Anarchy of Selfish Routing Arthur van Goethem & Sk uli Arnlaugsson June 8,

Unselfish Self-care W oodbridge W omens Conference 2016 Jessica Croker Being Selfish vs.

The Value of Information in Selfish Routing Simon Scherrer, Adrian Perrig, Stefan Schmid 27th

How Much Can Taxes Help Selfish Routing? Tim Roughgarden (Cornell) Joint with Richard Cole (NYU)

Bitcoin Selfish Mining Marie Vasek Secure Electric Commerce (some slides heavily inspired by Dr.

Pricing Network Edges for Heterogeneous Selfish Users Tim Roughgarden (Cornell) Joint with

Pricing Networks with Selfish Routing Tim Roughgarden (Cornell) Joint with Richard Cole (NYU)

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Royal Economic Society The history of Regret Theory Robert Sugden Contribution to Economic

Finitely forcible graph limits are universal Jacob Cooper Dan Kr al Ta sa Martins

Markov Chains and Coupling In this class we will consider the problem of bounding the time taken

Convergence of symmetric Feller processes on metric trees Anita Winter , University of

Non-convex Optimization for Machine Learning Prateek Jain Microsoft Research India Outline

On the Algebraic Structure of Convergence Alva Couch and Yizhan Sun Tufts University

GANT IP and GEANT Plus Network Convergence Mian Usman, DANTE 13 th Feb TERENA Network

EI331 Signals and Systems Lecture 29 Bo Jiang John Hopcroft Center for Computer Science

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

On the Convergence of No-regret Learning in Selfish Routing ICML - PowerPoint PPT Presentation

( t ) Convergence of ( t ) Online learning in the routing game Convergence of On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighs 2 Alexandre Bayen 3 UC Berkeley Ecole

Congestion Games and Selfish Routing Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

The coevolution of altruism and punishment: role of the selfish punisher Mayuko Nakamaru Tokyo

The Price of Anarchy of Selfish Routing Arthur van Goethem &amp; Sk uli Arnlaugsson June 8,

Unselfish Self-care W oodbridge W omens Conference 2016 Jessica Croker Being Selfish vs.

The Value of Information in Selfish Routing Simon Scherrer, Adrian Perrig, Stefan Schmid 27th

How Much Can Taxes Help Selfish Routing? Tim Roughgarden (Cornell) Joint with Richard Cole (NYU)

Bitcoin Selfish Mining Marie Vasek Secure Electric Commerce (some slides heavily inspired by Dr.

Pricing Network Edges for Heterogeneous Selfish Users Tim Roughgarden (Cornell) Joint with

Pricing Networks with Selfish Routing Tim Roughgarden (Cornell) Joint with Richard Cole (NYU)

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Royal Economic Society The history of Regret Theory Robert Sugden Contribution to Economic

Finitely forcible graph limits are universal Jacob Cooper Dan Kr al Ta sa Martins

Markov Chains and Coupling In this class we will consider the problem of bounding the time taken

Convergence of symmetric Feller processes on metric trees Anita Winter , University of

Non-convex Optimization for Machine Learning Prateek Jain Microsoft Research India Outline

On the Algebraic Structure of Convergence Alva Couch and Yizhan Sun Tufts University

GANT IP and GEANT Plus Network Convergence Mian Usman, DANTE 13 th Feb TERENA Network

EI331 Signals and Systems Lecture 29 Bo Jiang John Hopcroft Center for Computer Science

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

The Price of Anarchy of Selfish Routing Arthur van Goethem & Sk uli Arnlaugsson June 8,