on the convergence of no regret learning in selfish
play

On the Convergence of No-regret Learning in Selfish Routing ICML - PowerPoint PPT Presentation

( t ) Convergence of ( t ) Online learning in the routing game Convergence of On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighs 2 Alexandre Bayen 3 UC Berkeley Ecole


  1. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighès 2 Alexandre Bayen 3 UC Berkeley Ecole Polytechnique June 23, 2014 1 walid@cs.berkeley.edu 2 benjamin.drighes@polytechnique.edu 3 bayen@berkeley.edu

  2. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Introduction Routing game: players choose routes. Population distributions: µ ( t ) ∈ ∆ P 1 × · · · × ∆ P K Nash equilibria: N µ ( t ) = 1 τ ≤ t µ ( τ ) → N . � Under no-regret dynamics, ¯ t Does µ ( t ) → N ?

  3. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Outline Online learning in the routing game 1 µ ( t ) Convergence of ¯ 2 Convergence of µ ( t ) 3

  4. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k

  5. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k Player x ∈ X k : distribution over paths π ( x ) ∈ ∆ P k , Population distribution over paths µ k ∈ ∆ P k , µ k = � X k π ( x ) dm ( x ) Loss on path p : ℓ k p ( µ )

  6. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k Player x ∈ X k : distribution over paths π ( x ) ∈ ∆ P k , Population distribution over paths µ k ∈ ∆ P k , µ k = � X k π ( x ) dm ( x ) Loss on path p : ℓ k p ( µ )

  7. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Online learning model π ( t ) ∈ ∆ P 1 Sample p ∼ π ( t ) Discover ℓ ( t ) ∈ [ 0 , 1 ] P 1 Update π ( t + 1 )

  8. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ The Hedge algorithm Hedge algorithm Update the distribution according to observed loss p e − η t ℓ k ( t ) π ( t + 1 ) ∝ π ( t ) p p

  9. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Nash equilibrium µ ∈ N if ∀ k , ∀ p ∈ P k with positive mass, p ′ ( µ ) ∀ p ′ ∈ P k ℓ k p ( µ ) ≤ ℓ k How to compute Nash equilibria?

  10. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Nash equilibrium µ ∈ N if ∀ k , ∀ p ∈ P k with positive mass, p ′ ( µ ) ∀ p ′ ∈ P k ℓ k p ( µ ) ≤ ℓ k How to compute Nash equilibria? Convex formulation

  11. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Convex potential function � ( M µ ) e � V ( µ ) = c e ( u ) du 0 e V is convex. ∇ µ k V ( µ ) = ℓ k ( µ ) . Minimizer not unique. How do players find a Nash equilibrium? Iterative play.

  12. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Convex potential function � ( M µ ) e � V ( µ ) = c e ( u ) du 0 e V is convex. ∇ µ k V ( µ ) = ℓ k ( µ ) . Minimizer not unique. How do players find a Nash equilibrium? Iterative play. Ideally: distributed, and has reasonable information requirements.

  13. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Assume sublinear regret dynamics Losses are in [ 0 , 1 ] . π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � Expected loss is Discounted regret t ≤ T γ t ℓ k ( t ) � π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � t ≤ T γ t − min p � p r ( T ) ( x ) = ¯ � t ≤ T γ t

  14. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Assume sublinear regret dynamics Losses are in [ 0 , 1 ] . π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � Expected loss is Discounted regret t ≤ T γ t ℓ k ( t ) � π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � t ≤ T γ t − min p � p r ( T ) ( x ) = ¯ � t ≤ T γ t Assumptions γ ( t ) > 0 γ ( t ) ↓ 0 � t γ ( t ) = ∞

  15. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence to Nash equilibria Population regret 1 � r k ( T ) = r ( T ) ( x ) dm ( x ) ¯ ¯ m ( X k ) X k Convergence of averages to Nash equilibria If an update has sublinear population regret, then µ ( T ) = � t ≤ T γ t µ ( t ) / � ¯ t ≤ T γ t converges � � µ ( T ) , N T →∞ d lim ¯ = 0

  16. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence to Nash equilibria Population regret 1 � r k ( T ) = r ( T ) ( x ) dm ( x ) ¯ ¯ m ( X k ) X k Convergence of averages to Nash equilibria If an update has sublinear population regret, then µ ( T ) = � t ≤ T γ t µ ( t ) / � ¯ t ≤ T γ t converges � � µ ( T ) , N T →∞ d lim ¯ = 0 Proof: show � µ ( T ) ) − V ( µ ∗ ) ≤ r k ( T ) V (¯ ¯ k Similar result in Blum et al. (2006)

  17. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence of a dense subsequence Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of ( µ ( t ) ) t converges to N Subsequence ( µ ( t ) ) t ∈T converges � t ∈T : t ≤ T γ t lim T →∞ = 1 � t ≤ T γ t

  18. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence of a dense subsequence Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of ( µ ( t ) ) t converges to N Subsequence ( µ ( t ) ) t ∈T converges � t ∈T : t ≤ T γ t lim T →∞ = 1 � t ≤ T γ t Proof. Absolute Cesàro convergence implies convergence of a dense subsequence.

  19. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Example: Hedge with learning rates γ τ p e − η t ℓ k ( t ) π ( t + 1 ) ∝ π ( t ) p p Regret bound Under Hedge with η t = γ t , ln π ( 0 ) t ≤ T γ 2 min ( x ) + c � t r ( T ) ( x ) ≤ ρ ¯ � t ≤ T γ t

  20. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Simulations 0 1 5 4 6 2 3 Figure : Example network

  21. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Simulations path p 0 = ( v 0 , v 4 , v 5 , v 1 ) 2 . 5 path p 1 = ( v 0 , v 4 , v 6 , v 1 ) τ →∞ µ 1( τ ) : Nash equilibrium p 1 lim path p 2 = ( v 0 , v 1 ) 2 p ( µ ( τ ) ) 1 . 5 ℓ 1 1 p 0 0 . 5 µ 1(0) : uniform 0 10 20 30 40 50 τ p 2 path p 3 = ( v 2 , v 4 , v 5 , v 3 ) 2 . 5 p 4 path p 4 = ( v 2 , v 4 , v 6 , v 3 ) path p 5 = ( v 2 , v 3 ) 2 µ 2(0) : uniform p ( µ ( τ ) ) 1 . 5 ℓ 2 p 3 1 0 . 5 τ →∞ µ 2( τ ) : Nash equilibrium 0 10 20 30 40 50 lim τ p 5 Figure : Path losses and strategies for the Hedge algorithm with γ τ = 1 / ( 10 + τ )

  22. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Sufficient conditions for convergence of ( µ ( t ) ) t µ ( t ) → N . Have ¯

  23. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Sufficient conditions for convergence of ( µ ( t ) ) t µ ( t ) → N . Have ¯ Sufficient condition If V ( µ ( t ) ) converges ( µ ( t ) need not converge), then V ( µ ( t ) ) → V ∗ µ ( t ) → N ( V is continuous, µ ( t ) ∈ ∆ compact)

  24. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Imagine an underlying continuous time. Updates happen at γ 1 , γ 1 + γ 2 , . . . γ 1 + γ 2 0 γ 1 . . . Figure : Underlying continuous time

  25. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Imagine an underlying continuous time. Updates happen at γ 1 , γ 1 + γ 2 , . . . γ 1 + γ 2 0 γ 1 . . . Figure : Underlying continuous time In the update equation µ ( t + 1 ) ∝ µ ( t ) p e − γ t ℓ p ( t ) , take γ t → 0 p We obtain the autonomous ODE: Replicator equation ∀ p ∈ P k , d µ k p = µ k �� ℓ k ( µ ) , µ k � − ℓ k � p ( µ ) (1) p dt Also in evolutionary game theory.

  26. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt

  27. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points.

  28. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points. Proof: V is a Lyapunov function.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend