On the Convergence of No-regret Learning in Selfish Routing ICML - - PowerPoint PPT Presentation

on the convergence of no regret learning in selfish
SMART_READER_LITE
LIVE PREVIEW

On the Convergence of No-regret Learning in Selfish Routing ICML - - PowerPoint PPT Presentation

( t ) Convergence of ( t ) Online learning in the routing game Convergence of On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighs 2 Alexandre Bayen 3 UC Berkeley Ecole


slide-1
SLIDE 1

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

On the Convergence of No-regret Learning in Selfish Routing

ICML 2014 - Beijing

Walid Krichene 1 Benjamin Drighès 2 Alexandre Bayen 3

UC Berkeley Ecole Polytechnique

June 23, 2014

1walid@cs.berkeley.edu 2benjamin.drighes@polytechnique.edu 3bayen@berkeley.edu

slide-2
SLIDE 2

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Introduction

Routing game: players choose routes. Population distributions: µ(t) ∈ ∆P1 × · · · × ∆PK Nash equilibria: N Under no-regret dynamics, ¯ µ(t) = 1

t

  • τ≤t µ(τ) → N.

Does µ(t) → N?

slide-3
SLIDE 3

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Outline

1

Online learning in the routing game

2

Convergence of ¯ µ(t)

3

Convergence of µ(t)

slide-4
SLIDE 4

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Routing game

2 3 1 4 5 6

Figure : Example network

Directed graph (V , E) Population Xk: paths Pk

slide-5
SLIDE 5

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Routing game

2 3 1 4 5 6

Figure : Example network

Directed graph (V , E) Population Xk: paths Pk Player x ∈ Xk: distribution over paths π(x) ∈ ∆Pk, Population distribution over paths µk ∈ ∆Pk, µk =

  • Xk π(x)dm(x)

Loss on path p: ℓk

p(µ)

slide-6
SLIDE 6

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Routing game

2 3 1 4 5 6

Figure : Example network

Directed graph (V , E) Population Xk: paths Pk Player x ∈ Xk: distribution over paths π(x) ∈ ∆Pk, Population distribution over paths µk ∈ ∆Pk, µk =

  • Xk π(x)dm(x)

Loss on path p: ℓk

p(µ)

slide-7
SLIDE 7

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Online learning model

π(t) ∈ ∆P1 Sample p ∼ π(t) Discover ℓ(t) ∈ [0, 1]P1 Update π(t+1)

slide-8
SLIDE 8

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

The Hedge algorithm

Hedge algorithm Update the distribution according to observed loss π(t+1)

p

∝ π(t)

p e−ηtℓk(t)

p

slide-9
SLIDE 9

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Nash equilibria

Nash equilibrium µ ∈ N if ∀k, ∀p ∈ Pk with positive mass, ℓk

p(µ) ≤ ℓk p′(µ) ∀p′ ∈ Pk

How to compute Nash equilibria?

slide-10
SLIDE 10

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Nash equilibria

Nash equilibrium µ ∈ N if ∀k, ∀p ∈ Pk with positive mass, ℓk

p(µ) ≤ ℓk p′(µ) ∀p′ ∈ Pk

How to compute Nash equilibria? Convex formulation

slide-11
SLIDE 11

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Nash equilibria

Convex potential function V (µ) =

  • e

(Mµ)e ce(u)du V is convex. ∇µkV (µ) = ℓk(µ). Minimizer not unique. How do players find a Nash equilibrium? Iterative play.

slide-12
SLIDE 12

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Nash equilibria

Convex potential function V (µ) =

  • e

(Mµ)e ce(u)du V is convex. ∇µkV (µ) = ℓk(µ). Minimizer not unique. How do players find a Nash equilibrium? Iterative play. Ideally: distributed, and has reasonable information requirements.

slide-13
SLIDE 13

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Assume sublinear regret dynamics

Losses are in [0, 1]. Expected loss is

  • π(t)(x), ℓk(µ(t))
  • Discounted regret

¯ r (T)(x) =

  • t≤T γt
  • π(t)(x), ℓk(µ(t))
  • − minp
  • t≤T γtℓk(t)

p

  • t≤T γt
slide-14
SLIDE 14

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Assume sublinear regret dynamics

Losses are in [0, 1]. Expected loss is

  • π(t)(x), ℓk(µ(t))
  • Discounted regret

¯ r (T)(x) =

  • t≤T γt
  • π(t)(x), ℓk(µ(t))
  • − minp
  • t≤T γtℓk(t)

p

  • t≤T γt

Assumptions γ(t) > 0 γ(t) ↓ 0

  • t γ(t) = ∞
slide-15
SLIDE 15

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Convergence to Nash equilibria

Population regret ¯ r k(T) = 1 m(Xk)

  • Xk

¯ r (T)(x)dm(x) Convergence of averages to Nash equilibria If an update has sublinear population regret, then ¯ µ(T) =

t≤T γtµ(t)/ t≤T γt converges

lim

T→∞ d

  • ¯

µ(T), N

  • = 0
slide-16
SLIDE 16

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Convergence to Nash equilibria

Population regret ¯ r k(T) = 1 m(Xk)

  • Xk

¯ r (T)(x)dm(x) Convergence of averages to Nash equilibria If an update has sublinear population regret, then ¯ µ(T) =

t≤T γtµ(t)/ t≤T γt converges

lim

T→∞ d

  • ¯

µ(T), N

  • = 0

Proof: show V (¯ µ(T)) − V (µ∗) ≤

  • k

¯ r k(T) Similar result in Blum et al. (2006)

slide-17
SLIDE 17

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Convergence of a dense subsequence

Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of (µ(t))t converges to N Subsequence (µ(t))t∈T converges limT→∞

  • t∈T :t≤T γt
  • t≤T γt

= 1

slide-18
SLIDE 18

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Convergence of a dense subsequence

Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of (µ(t))t converges to N Subsequence (µ(t))t∈T converges limT→∞

  • t∈T :t≤T γt
  • t≤T γt

= 1 Proof. Absolute Cesàro convergence implies convergence of a dense subsequence.

slide-19
SLIDE 19

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Example: Hedge with learning rates γτ

π(t+1)

p

∝ π(t)

p e−ηtℓk(t)

p

Regret bound Under Hedge with ηt = γt, ¯ r (T)(x) ≤ ρ ln π(0)

min(x) + c t≤T γ2 t

  • t≤T γt
slide-20
SLIDE 20

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Simulations

2 3 1 4 5 6 Figure : Example network

slide-21
SLIDE 21

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Simulations

10 20 30 40 50 0.5 1 1.5 2 2.5 τ ℓ1

p(µ(τ))

path p0 = (v0, v4, v5, v1) path p1 = (v0, v4, v6, v1) path p2 = (v0, v1) µ1(0): uniform lim

τ→∞ µ1(τ): Nash equilibrium

p0 p1 p2 10 20 30 40 50 0.5 1 1.5 2 2.5 τ ℓ2

p(µ(τ))

path p3 = (v2, v4, v5, v3) path p4 = (v2, v4, v6, v3) path p5 = (v2, v3) µ2(0): uniform lim

τ→∞ µ2(τ): Nash equilibrium

p3 p4 p5

Figure : Path losses and strategies for the Hedge algorithm with γτ = 1/(10 + τ)

slide-22
SLIDE 22

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Sufficient conditions for convergence of (µ(t))t

Have ¯ µ(t) → N.

slide-23
SLIDE 23

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Sufficient conditions for convergence of (µ(t))t

Have ¯ µ(t) → N. Sufficient condition If V (µ(t)) converges (µ(t) need not converge), then V (µ(t)) → V∗ µ(t) → N (V is continuous, µ(t) ∈ ∆ compact)

slide-24
SLIDE 24

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Replicator dynamics

Imagine an underlying continuous time. Updates happen at γ1, γ1 + γ2, . . . . . . γ1 γ1 + γ2

Figure : Underlying continuous time

slide-25
SLIDE 25

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Replicator dynamics

Imagine an underlying continuous time. Updates happen at γ1, γ1 + γ2, . . . . . . γ1 γ1 + γ2

Figure : Underlying continuous time

In the update equation µ(t+1)

p

∝ µ(t)

p e−γtℓp(t), take γt → 0

We obtain the autonomous ODE: Replicator equation ∀p ∈ Pk, dµk

p

dt = µk

p

  • ℓk(µ), µk

− ℓk

p(µ)

  • (1)

Also in evolutionary game theory.

slide-26
SLIDE 26

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Replicator dynamics

Replicator equation ∀p ∈ Pk, dµk

p

dt = µk

p(

  • ℓk(µ), µk

− ℓk

p(µ))

slide-27
SLIDE 27

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Replicator dynamics

Replicator equation ∀p ∈ Pk, dµk

p

dt = µk

p(

  • ℓk(µ), µk

− ℓk

p(µ))

Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points.

slide-28
SLIDE 28

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Replicator dynamics

Replicator equation ∀p ∈ Pk, dµk

p

dt = µk

p(

  • ℓk(µ), µk

− ℓk

p(µ))

Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points. Proof: V is a Lyapunov function.

slide-29
SLIDE 29

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

AREP update

Discretization of the continuous-time replicator dynamics π(t+1)

p

− π(t)

p

= ηtπ(t)

p

  • ℓk(µ(t)), π(t)

− ℓk

p(µ(t))

  • + ηtUk(t+1)

p

(U(t))t≥1 perturbations that satisfy for all T > 0, lim

τ1→∞

max

τ2:τ2

t=τ1 ηt<T

  • τ2
  • t=τ1

ηtU(t+1)

  • = 0

Benaïm (1999)

slide-30
SLIDE 30

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Convergence to Nash equilibria

Theorem Under any no-regret algorithm which is approximate REP, µ(t) → N.

slide-31
SLIDE 31

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Convergence to Nash equilibria

Theorem Under any no-regret algorithm which is approximate REP, µ(t) → N. Proof uses two facts Affine interpolation of µ(t) is an asymptotic pseudo trajectory for the ODE. V is a Lyapunov function for Nash equilibria.

slide-32
SLIDE 32

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

REP update

In particular REP update: take U = 0 π(t+1)

p

− π(t)

p

= ηtπ(t)

p

  • ℓk(µ(t)), π(t)

− ℓk

p(µ(t))

slide-33
SLIDE 33

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

REP update

In particular REP update: take U = 0 π(t+1)

p

− π(t)

p

= ηtπ(t)

p

  • ℓk(µ(t)), π(t)

− ℓk

p(µ(t))

  • Hedge

π(t+1)

p

− π(t)

p

= ηtπ(t)

p

e−ηtℓk

p(µ(t)) − 1

ηt

  • p′ e−ηtℓk

p′(µ(t))

slide-34
SLIDE 34

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Mirror Descent

Consider the convex problem minimizeµ∈∆V (µ) Algorithm 1 Mirror Descent Method

1: for t ∈ N do 2: µ(t+1) = arg min

µ∈∆

  • ∇V (µ(t)), µ
  • +

1 ηt Dψ(µ, µ(t))

3: end for

where Dψ is a Bregman divergence Dψ(µ, ν) = ψ(µ) − ψ(ν) − ∇ψ(ν), µ − ν

V (µ(t)) V (µ(t+1))

V (µ) V (µ(t)) + ∇V (µ(t)), µ − µ(t) V (µ(t)) + ∇V (µ(t)), µ − µ(t) + 1

η Dψ(µ, µ(t))

Figure : Mirror Descent iteration.

slide-35
SLIDE 35

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Mirror Descent

Hedge = Mirror descent on V Take Dψ(µ, ν) =

k DKL(µk, νk)

Update: µ(t+1) = arg min

µ∈∆1×···×∆K

  • k
  • ℓk(µ(t)), µk

+ 1 ηt DKL(µk, µk(t))

slide-36
SLIDE 36

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Mirror Descent

Hedge = Mirror descent on V Take Dψ(µ, ν) =

k DKL(µk, νk)

Update: µ(t+1) = arg min

µ∈∆1×···×∆K

  • k
  • ℓk(µ(t)), µk

+ 1 ηt DKL(µk, µk(t))

  • Solution: Hedge algorithm with learning rate η

µk(t+1)

p

∝ µk(t)

p

e−ηℓk(t)

p

slide-37
SLIDE 37

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Mirror Descent

Hedge = Mirror descent on V Take Dψ(µ, ν) =

k DKL(µk, νk)

Update: µ(t+1) = arg min

µ∈∆1×···×∆K

  • k
  • ℓk(µ(t)), µk

+ 1 ηt DKL(µk, µk(t))

  • Solution: Hedge algorithm with learning rate η

µk(t+1)

p

∝ µk(t)

p

e−ηℓk(t)

p

General result Convergence of ¯ µ(T) =

  • t≤T ηtµ(t)
  • t≤T ηt

to N for Any Mirror Descent method

slide-38
SLIDE 38

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Strong convergence of Mirror Descent

Convex V with L-Lipschitz gradient If ηt small enough, MD update guarantees V (µ(t+1)) ≤ V (µ(k)).

V (µ(t)) V (µ(t+1))

V (µ) V (µ(t)) + ∇V (µ(t)), µ − µ(t) V (µ(t)) + ∇V (µ(t)), µ − µ(t) + 1

η Dψ(µ, µ(t))

(a) Large η

V (µ(t)) V (µ(t+1))

V (µ) V (µ(t)) + ∇V (µ(t)), µ − µ(t) V (µ(t)) + ∇V (µ(t)), µ − µ(t) + 1

η Dψ(µ, µ(t))

(b) Small η Figure : Mirror Descent iteration for a function with L-Lipschitz gradient.

slide-39
SLIDE 39

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Strong convergence of Mirror Descent

Convex V with L-Lipschitz gradient If ηt small enough, MD update guarantees V (µ(t+1)) ≤ V (µ(k)).

V (µ(t)) V (µ(t+1))

V (µ) V (µ(t)) + ∇V (µ(t)), µ − µ(t) V (µ(t)) + ∇V (µ(t)), µ − µ(t) + 1

η Dψ(µ, µ(t))

(a) Large η

V (µ(t)) V (µ(t+1))

V (µ) V (µ(t)) + ∇V (µ(t)), µ − µ(t) V (µ(t)) + ∇V (µ(t)), µ − µ(t) + 1

η Dψ(µ, µ(t))

(b) Small η Figure : Mirror Descent iteration for a function with L-Lipschitz gradient.

V (µ(t)) is monotone, converges, so µ(t) → N.

slide-40
SLIDE 40

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Summary

Summary Convergence of ¯ µ(t) under no-regret updates. Convergence of a dense subsequence (µ(t))t∈T . Convergence of µ(t) for no-regret AREP updates.

Hedge, REP

Convergence of µ(t) for MD updates (+ convergence rate)

Hedge

slide-41
SLIDE 41

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Summary

Summary Convergence of ¯ µ(t) under no-regret updates. Convergence of a dense subsequence (µ(t))t∈T . Convergence of µ(t) for no-regret AREP updates.

Hedge, REP

Convergence of µ(t) for MD updates (+ convergence rate)

Hedge

Future work Bandit setting. Stochastic perturbations on the losses.

slide-42
SLIDE 42

Online learning in the routing game Convergence of ¯ µ(t) Convergence of µ(t)

Thank you. Poster M43 walid@cs.berkeley.edu benjamin.drighes@polytechnique.edu bayen@berkeley.edu