quancol . ........ . . . ... ... ... ... ... ... ... SFM, - - PowerPoint PPT Presentation

quan col
SMART_READER_LITE
LIVE PREVIEW

quancol . ........ . . . ... ... ... ... ... ... ... SFM, - - PowerPoint PPT Presentation

www.quanticol.eu Mean-field methods: what can go wrong? The decoupling assumption: a zoom on the fixed point and on mean-field games Nicolas Gast (Inria) and Luca Bortolussi (UNITS) Inria, Grenoble, France SFM, Bertinoro, June 21, 2016


slide-1
SLIDE 1

www.quanticol.eu

Mean-field methods: what can go wrong?

The decoupling assumption: a zoom on the fixed point and on mean-field games Nicolas Gast (Inria) and Luca Bortolussi (UNITS) Inria, Grenoble, France SFM, Bertinoro, June 21, 2016

quancol . ........ . . . ... ... ... ... ... ... ...

SFM, Bertinoro, June 21, 2016 1 / 59

slide-2
SLIDE 2

www.quanticol.eu

Markov chains

S I R 1 2 0.1 Q =   −1 1 −2 2 0.1 −0.1   Transition graph Infinitesimal generator

SFM, Bertinoro, June 21, 2016 2 / 59

slide-3
SLIDE 3

www.quanticol.eu

Transient and steady-state analysis

Q =   −1 1 −2 2 0.1 −0.1  

SFM, Bertinoro, June 21, 2016 3 / 59

slide-4
SLIDE 4

www.quanticol.eu

Transient and steady-state analysis

Q =   −1 1 −2 2 0.1 −0.1  

Transient analysis: the master equation

If X is a CTMC (continuous time Markov chain) with generator Q: d dt Pi(t) =

  • j∈S

Pj(t)Qji, where Pi(t) = P(X(t) = i).

SFM, Bertinoro, June 21, 2016 3 / 59

slide-5
SLIDE 5

www.quanticol.eu

Transient and steady-state analysis

Q =   −1 1 −2 2 0.1 −0.1  

Transient analysis: the master equation

If X is a CTMC (continuous time Markov chain) with generator Q: d dt P(t) = P(t)Qji, where Pi(t) = P(X(t) = i).

SFM, Bertinoro, June 21, 2016 3 / 59

slide-6
SLIDE 6

www.quanticol.eu

Transient and steady-state analysis

Q =   −1 1 −2 2 0.1 −0.1  

Steady-state analysis

SFM, Bertinoro, June 21, 2016 3 / 59

slide-7
SLIDE 7

www.quanticol.eu

Transient and steady-state analysis

Q =   −1 1 −2 2 0.1 −0.1  

Steady-state analysis

If the chain is irreducible,

The equation πQ = 0 has a unique solution such that

  • i πi = 1.

limi→∞ Pi(t) = πi SFM, Bertinoro, June 21, 2016 3 / 59

slide-8
SLIDE 8

www.quanticol.eu

The state space explosion

313 ≈ 106 states. We need to keep track of SN states P(X1(t) = i1, . . . , Xn(t) = in) The generator Q has SN entries.

SFM, Bertinoro, June 21, 2016 4 / 59

slide-9
SLIDE 9

www.quanticol.eu

The state space explosion

313 ≈ 106 states. We need to keep track of SN states P(X1(t) = i1, . . . , Xn(t) = in) The generator Q has SN entries.

The decoupling assumption is

P(X1(t) = i1, . . . , Xn(t) = in)

  • SN variables

≈ P(X1(t) = i1) . . . P(Xn(t) = in)

  • N×S variables

Question: when is this (not) valid?

SFM, Bertinoro, June 21, 2016 4 / 59

slide-10
SLIDE 10

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 5 / 59

slide-11
SLIDE 11

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 6 / 59

slide-12
SLIDE 12

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

SFM, Bertinoro, June 21, 2016 7 / 59

slide-13
SLIDE 13

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

SFM, Bertinoro, June 21, 2016 7 / 59

slide-14
SLIDE 14

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss

SFM, Bertinoro, June 21, 2016 7 / 59

slide-15
SLIDE 15

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss Model:

Items have the same size. Cache can store m items. There are n items. Item i is

requested with probability pi.

Goal

Compute P(item 1 is in cache) Compute hit probability. SFM, Bertinoro, June 21, 2016 7 / 59

slide-16
SLIDE 16

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss

Markov model

State space : set of m distinct items. Transitions: {i1 . . . im} → {i1 . . . ik−1, j, ik+1 . . . in} with probability pj/m.

SFM, Bertinoro, June 21, 2016 7 / 59

slide-17
SLIDE 17

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss

Markov model

State space : set of m distinct items. Transitions: {i1 . . . im} → {i1 . . . ik−1, j, ik+1 . . . in} with probability pj/m.

Decoupling assumption

P(i1 . . . im) ≈ P(i1)

  • =:xi1

. . . P(im)

SFM, Bertinoro, June 21, 2016 7 / 59

slide-18
SLIDE 18

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss

Decoupling assumption

P(i1 . . . im) ≈ P(i1)

  • =:xi1

. . . P(im) If we zoom on object k:

  • ut

in cache pk 1 m

  • j not in cache

pj

SFM, Bertinoro, June 21, 2016 7 / 59

slide-19
SLIDE 19

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss

Decoupling assumption

P(i1 . . . im) ≈ P(i1)

  • =:xi1

. . . P(im) If we zoom on object k:

  • ut

in cache pk 1 m

  • j not in cache

pj

j pj(1−xj) SFM, Bertinoro, June 21, 2016 7 / 59

slide-20
SLIDE 20

www.quanticol.eu

A cache-replacement policy

  • G. Van Houdt, 2015

Application data source cache

requests

hit

  • ne item is replaced

(at random) miss If we zoom on object k:

  • ut

in cache pk 1 m

  • j not in cache

pj

j pj(1−xj)

Mean-field model

Let xk := P(item k is in the cache). ˙ xk = pk(1 − xk) −

  • ℓ(pℓ(1−xℓ))

m

xk.

SFM, Bertinoro, June 21, 2016 7 / 59

slide-21
SLIDE 21

www.quanticol.eu

A cache-replacement policy: simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

Simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

Mean-field: ˙ x = xQ(x)

Figure: Popularities of objects change every 2000 steps.

SFM, Bertinoro, June 21, 2016 8 / 59

slide-22
SLIDE 22

www.quanticol.eu

A cache-replacement policy: simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

Simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

  • de aprox (1 list)
  • de approx (4 lists)

Mean-field: ˙ x = xQ(x)

Figure: Popularities of objects change every 2000 steps.

SFM, Bertinoro, June 21, 2016 8 / 59

slide-23
SLIDE 23

www.quanticol.eu

Stationary distribution

Fixed point equation

0 = ˙

xk = pk(1 − xk) −

  • ℓ(pℓ(1−xℓ))

m

xk.

k xk = m.

(ref: Dan and Towsley, Gast Van Houdt, ... )

SFM, Bertinoro, June 21, 2016 9 / 59

slide-24
SLIDE 24

www.quanticol.eu

Stationary distribution

Fixed point equation

0 = ˙

xk = pk(1 − xk) −

  • ℓ(pℓ(1−xℓ))

m

xk.

k xk = m.

(ref: Dan and Towsley, Gast Van Houdt, ... ) Algorithm: easy to solve:

  • 1. Define xk(T) the solution of pk(1 − xk) − Txk.

xk(T) = pk/(1 + T)

  • 2. Find T such that

k(1 − xk(T)) = m.

SFM, Bertinoro, June 21, 2016 9 / 59

slide-25
SLIDE 25

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 10 / 59

slide-26
SLIDE 26

www.quanticol.eu

Decoupling and ˙ x = xQ(x)

P(X1(t) = i1, . . . , Xn(t) = in) ≈ P(X1(t) = i1)

  • =x1,i1(t)

. . . P(Xn(t) = in)

  • =xn,in(t)

When we zoom on one object

P(X1(t + dt) = j|X1(t) = i) ≈ E [P(X1(t) = j|X1 = i ∧ X2 . . . Xn)] ≈ Q(1)

i,j (x) :=

  • i2...in

K(i,i2...in)→(j,j2...jn)x2,i2 . . . xn,in We then get: d dt x1,j(t) ≈

  • i

x1,iQ(1)

i,j

SFM, Bertinoro, June 21, 2016 11 / 59

slide-27
SLIDE 27

www.quanticol.eu

Transient regime

Theorem (Snitzman (99), Kurtz (70’), Benaim, Le Boudec (08),...)

For fixed t, the decoupling assumption is equivalent to the mean-field convergence. For example (remember Luca’s talk), if x → xQ(x) is Lipschitz-continuous then, as the number of objects N goes to infinity: lim

N→∞ P(Xk(t) = i) = xk,i(t),

where x satisfies ˙ x = xQ(x).

SFM, Bertinoro, June 21, 2016 12 / 59

slide-28
SLIDE 28

www.quanticol.eu

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 13 / 59

slide-29
SLIDE 29

www.quanticol.eu

The fixed point method

Transient regime Stationary Markov chain ˙ p = pK πK = 0 t → ∞

1Performance analysis of the IEEE 802.11 distributed coordination function. 2Fixed point analys is of single cell IEEE 802.11e WLANs: Uniqueness, multistability. 3Performance analysis of exponenetial backoff. 4New insights from a fixed-point analysis of single cell IEEE 802.11 WLANs. SFM, Bertinoro, June 21, 2016 14 / 59

slide-30
SLIDE 30

www.quanticol.eu

The fixed point method

Transient regime Stationary Markov chain ˙ p = pK πK = 0 t → ∞ Mean-field ˙ x = xQ(x) xQ(x) = 0

fixed points

N → ∞ ? Method was used in many papers: Bianchi 001 Ramaiyan et al. 082 Kwak et al. 053 Kumar et al 084

1Performance analysis of the IEEE 802.11 distributed coordination function. 2Fixed point analys is of single cell IEEE 802.11e WLANs: Uniqueness, multistability. 3Performance analysis of exponenetial backoff. 4New insights from a fixed-point analysis of single cell IEEE 802.11 WLANs. SFM, Bertinoro, June 21, 2016 14 / 59

slide-31
SLIDE 31

www.quanticol.eu

Does it always work?56

SIRS model:

A node S becomes I at rate 1 (external infection) When a S meets an I, it becomes infected at rate 1/(S + a) An I recovers at rate 5. A node R becomes S by: meeting a node S (rate 10S) alone (at rate 10−3). 5Benaim Le Boudec 08 6Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling

Assumption for Analyzing 802.11 MAC Protoco. 2010

SFM, Bertinoro, June 21, 2016 15 / 59

slide-32
SLIDE 32

www.quanticol.eu

Does it always work?56

SIRS model:

A node S becomes I at rate 1 (external infection) When a S meets an I, it becomes infected at rate 1/(S + a) An I recovers at rate 5. A node R becomes S by: meeting a node S (rate 10S) alone (at rate 10−3).

S I R

1 + 10I

S+a

5 10S + 10−3

5Benaim Le Boudec 08 6Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling

Assumption for Analyzing 802.11 MAC Protoco. 2010

SFM, Bertinoro, June 21, 2016 15 / 59

slide-33
SLIDE 33

www.quanticol.eu

Does it always work?78

S I R

1 + 10I

S+a

5 10S + 10−3

Markov chain is irreducible. Unique fixed point xQ(x) = 0. 7Benaim Le Boudec 08 8Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling

Assumption for Analyzing 802.11 MAC Protoco. 2010

SFM, Bertinoro, June 21, 2016 16 / 59

slide-34
SLIDE 34

www.quanticol.eu

Does it always work?78

S I R

1 + 10I

S+a

5 10S + 10−3

Markov chain is irreducible. Unique fixed point xQ(x) = 0.

Fixed point

  • Stat. measure

xQ(x) = 0 N = 1000 xS xI πS πI a = .3 0.209 0.234 0.209 0.234

7Benaim Le Boudec 08 8Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling

Assumption for Analyzing 802.11 MAC Protoco. 2010

SFM, Bertinoro, June 21, 2016 16 / 59

slide-35
SLIDE 35

www.quanticol.eu

Does it always work?78

S I R

1 + 10I

S+a

5 10S + 10−3

Markov chain is irreducible. Unique fixed point xQ(x) = 0.

Fixed point

  • Stat. measure

xQ(x) = 0 N = 1000 xS xI πS πI a = .3 0.209 0.234 0.209 0.234 a = .1 0.078 0.126 0.11 0.13

7Benaim Le Boudec 08 8Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling

Assumption for Analyzing 802.11 MAC Protoco. 2010

SFM, Bertinoro, June 21, 2016 16 / 59

slide-36
SLIDE 36

www.quanticol.eu

What happened?

0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time 0.0 0.1 0.2 0.3 0.4 0.5 xS

xS (mean-field)

SFM, Bertinoro, June 21, 2016 17 / 59

slide-37
SLIDE 37

www.quanticol.eu

What happened?

0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time 0.0 0.1 0.2 0.3 0.4 0.5 xS

xS (mean-field) X N

S (N = 1000)

SFM, Bertinoro, June 21, 2016 17 / 59

slide-38
SLIDE 38

www.quanticol.eu

What happened?

5 10 15 20 25 30 Time 0.0 0.1 0.2 0.3 0.4 0.5 xS

xS (mean-field) X N

S (N = 1000)

SFM, Bertinoro, June 21, 2016 17 / 59

slide-39
SLIDE 39

www.quanticol.eu

What happened?

(xS = 0.078, xI = 0.126), (πS = 0.11, πI = 0.13)

0.0 1.0 1.0 0.0 0.0 1.0

R I S

SFM, Bertinoro, June 21, 2016 18 / 59

slide-40
SLIDE 40

www.quanticol.eu

What happened?

(xS = 0.078, xI = 0.126), (πS = 0.11, πI = 0.13)

0.0 1.0 1.0 0.0 0.0 1.0 Fixed point true stationnary distribution

R I S

SFM, Bertinoro, June 21, 2016 18 / 59

slide-41
SLIDE 41

www.quanticol.eu

What happened?

(xS = 0.078, xI = 0.126), (πS = 0.11, πI = 0.13)

0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 Fixed point true stationnary distribution limit cycle

R I S

SFM, Bertinoro, June 21, 2016 18 / 59

slide-42
SLIDE 42

www.quanticol.eu

Fixed points?

Transient regime Stationary Markov chain ˙ p = pK πK = 0 t → ∞

SFM, Bertinoro, June 21, 2016 19 / 59

slide-43
SLIDE 43

www.quanticol.eu

Fixed points?

Transient regime Stationary Markov chain ˙ p = pK πK = 0 t → ∞ Mean-field ˙ x = xQ(x) xQ(x) = 0

fixed points

N → ∞ ?

SFM, Bertinoro, June 21, 2016 19 / 59

slide-44
SLIDE 44

www.quanticol.eu

Fixed points?

Transient regime Stationary Markov chain ˙ p = pK πK = 0 t → ∞ xQ(x) = 0 Mean-field ˙ x = xQ(x) xQ(x) = 0

fixed points

N → ∞ N → ∞ t → ∞

SFM, Bertinoro, June 21, 2016 19 / 59

slide-45
SLIDE 45

www.quanticol.eu

Fixed points?

Transient regime Stationary Markov chain ˙ p = pK πK = 0 t → ∞ xQ(x) = 0 Mean-field ˙ x = xQ(x) xQ(x) = 0

fixed points

N → ∞ N → ∞ t → ∞ if yes then yes

Theorem ((i) Benaim Le Boudec 08,(ii) Le Boudec 12)

The stationary distribution πN concentrates on the fixed points if : (i) All trajectories of the ODE converges to the fixed points. (ii) (or) The Markov chain is reversible.

SFM, Bertinoro, June 21, 2016 19 / 59

slide-46
SLIDE 46

www.quanticol.eu

Steady-state: theorem

Theorem

Let us consider a mean-field model for which xN converges to the solution of ˙ x = f (x). Then:

If all trajectories converge to a unique fixed point x∗, the πN

converges to x∗. Note: unique fixed point implies the decoupling assumption:

SFM, Bertinoro, June 21, 2016 20 / 59

slide-47
SLIDE 47

www.quanticol.eu

Quiz

Consider the SIRS model:

0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 Fixed point true stationnary distribution limit cycle

R S I Under the stationary distribu- tion πN: (A) As there are no fixed point, there is no such stationary distribution. (B) P(X1 = S, X2 = S) ≈ P(X1 = S)P(X2 = S) (C) P(X1 = S, X2 = S) > P(X1 = S)P(X2 = S) (D) P(X1 = S, X2 = S) < P(X1 = S)P(X2 = S)

SFM, Bertinoro, June 21, 2016 21 / 59

slide-48
SLIDE 48

www.quanticol.eu

Quiz

Consider the SIRS model:

0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 Fixed point true stationnary distribution limit cycle

R S I

positive correlation

Under the stationary distribu- tion πN: (A) As there are no fixed point, there is no such stationary distribution. (B) P(X1 = S, X2 = S) ≈ P(X1 = S)P(X2 = S) (C) P(X1 = S, X2 = S) > P(X1 = S)P(X2 = S) (D) P(X1 = S, X2 = S) < P(X1 = S)P(X2 = S)

Answer: C

P(X1(t) = S, X2(t) = S) = x1(t)2. Thus: positively correlated.

SFM, Bertinoro, June 21, 2016 21 / 59

slide-49
SLIDE 49

www.quanticol.eu

Lyapunov functions

How to show that trajectories converge to a fixed point?

SFM, Bertinoro, June 21, 2016 22 / 59

slide-50
SLIDE 50

www.quanticol.eu

Lyapunov functions

How to show that trajectories converge to a fixed point?

A solution of d

dt x(t) = xQ(x(t)) converges to the fixed points of

xQ(x) = 0, if there exists a Lyapunov function f , that is:

Lower bounded: infx f (x) > +∞ Decreasing along trajectories:

d dt f (x(t)) < 0, whenever x(t)Q(x(t)) = 0.

SFM, Bertinoro, June 21, 2016 22 / 59

slide-51
SLIDE 51

www.quanticol.eu

Lyapunov functions

How to show that trajectories converge to a fixed point?

A solution of d

dt x(t) = xQ(x(t)) converges to the fixed points of

xQ(x) = 0, if there exists a Lyapunov function f , that is:

Lower bounded: infx f (x) > +∞ Decreasing along trajectories:

d dt f (x(t)) < 0, whenever x(t)Q(x(t)) = 0.

How to find a Lyapunov function

Energy? Distance? Entropy? Luck? SFM, Bertinoro, June 21, 2016 22 / 59

slide-52
SLIDE 52

www.quanticol.eu

The relative entropy is a Lyapunov function for Markov chains

Let Q be the generator of an irreducible Markov chain and π be its stationary distribution. Let P(t) be the solution of d

dt P(t) = P(t)Q.

Theorem (e.g. Budhiraja et al 15, Dupuis-Fischer 11)

The relative entropy R(Pπ) =

  • i

Pi log Pi πi is a Lyapunov function: d dt R(P(t)π) < 0, with equality if and only if P(t) = π.

SFM, Bertinoro, June 21, 2016 23 / 59

slide-53
SLIDE 53

www.quanticol.eu

Relative entropy for mean-field models

Assume that Q(x) be a generator of an irreducible Markov chain and let π(x) be its stationary distribution. Let P(t) be the solution of

d dt P(t) = P(t)Q(P(t)). Then

d dt R(P(t)π(t)) = d dt P(t) ∂ ∂P R(P(t), π(t))

  • ≤0

+ d dt π(t) ∂ ∂πR(P(t), π(t))

  • =−

i xi(t) d dt log πi(t)

≤ −

  • i

xi(t) d dt log πi(t)

SFM, Bertinoro, June 21, 2016 24 / 59

slide-54
SLIDE 54

www.quanticol.eu

Relative entropy for mean-field models

Assume that Q(x) be a generator of an irreducible Markov chain and let π(x) be its stationary distribution. Let P(t) be the solution of

d dt P(t) = P(t)Q(P(t)). Then

d dt R(P(t)π(t)) = d dt P(t) ∂ ∂P R(P(t), π(t))

  • ≤0

+ d dt π(t) ∂ ∂πR(P(t), π(t))

  • =−

i xi(t) d dt log πi(t)

≤ −

  • i

xi(t) d dt log πi(t)

Theorem

If there exists a lower bounded integral F(x) of −

i xi(t) d dt log πi(t), then x → R(xπ(x)) + F(x) is a Lyapunov

function for the mean-field model.

SFM, Bertinoro, June 21, 2016 24 / 59

slide-55
SLIDE 55

www.quanticol.eu

The decoupling assumption: conclusion

Decoupling ≈ mean-field convergence If the rates are continuous, convergence holds for the transient

regime

The stationary regime should be handle with care The uniqueness of the fixed point is not enough. Lyapunov functions can help but are not easy to find. SFM, Bertinoro, June 21, 2016 25 / 59

slide-56
SLIDE 56

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 26 / 59

slide-57
SLIDE 57

www.quanticol.eu

A martingale argument

The drift of a mean-field model is X(t) satisfies lim

dt→0

1 dt E [X(t + dt) − X(t)|X(t) = x] = f (x) lim

dt→0

1 dt var [X(t + dt) − X(t) − f (X(t))|X(t) = x] ≤ C/N This means that: M(t) = X(t) − (x0 − t f (X(s))ds) is such that: E [M(t) | Fs] = M(s)

  • M(t) is a martingale

∧ var [M(t)] ≤ Ct/N

  • Small variance

.

SFM, Bertinoro, June 21, 2016 27 / 59

slide-58
SLIDE 58

www.quanticol.eu

Martingale concentration results

Let M(t) be such that: E [M(t) | Fs] = M(s)

  • M(t) is a martingale

∧ var [M(t)] ≤ C/N

  • Small variance

. Then: (Doob’s inequality): P

  • sup

t≤T

M(t) ≥ ǫ

C Nǫ2 .

SFM, Bertinoro, June 21, 2016 28 / 59

slide-59
SLIDE 59

www.quanticol.eu

Mean-field convergence

Going back to slide 1, we have: X(t) = x0 + t f (X(s))ds + M(t)

small by previous slide

SFM, Bertinoro, June 21, 2016 29 / 59

slide-60
SLIDE 60

www.quanticol.eu

Mean-field convergence

Going back to slide 1, we have: X(t) = x0 + t f (X(s))ds + M(t)

small by previous slide

Is X(t) close to ˙ x = f (x)?

SFM, Bertinoro, June 21, 2016 29 / 59

slide-61
SLIDE 61

www.quanticol.eu

The initial value problem

“Dynamical systems 101”

The initial value problem: ˙ x = f (x) x(0) = x0 ∈ Rd. The existence and solution is guaranteed by the Picard-Cauchy theorem:

If f is Lipschitz-continuous on Rd, then there exists a unique

solution on [0, T].

SFM, Bertinoro, June 21, 2016 30 / 59

slide-62
SLIDE 62

www.quanticol.eu

Uniqueness of solution

“Dynamical system 101 (ctn)”

Reminder: f is Lipschitz-continuous if there exists L such that: ∀x, y ∈ Rd: f (x) − f (y) ≤ L x − y .

SFM, Bertinoro, June 21, 2016 31 / 59

slide-63
SLIDE 63

www.quanticol.eu

Uniqueness of solution

“Dynamical system 101 (ctn)”

Reminder: f is Lipschitz-continuous if there exists L such that: ∀x, y ∈ Rd: f (x) − f (y) ≤ L x − y . If x(t) = x0 + t

0 f (x(s))ds and y(t) = y0 +

t

0 f (y(s))ds + ε then

x(t) − y(t) ≤ L t x(s) − y(s) + x0 − y0 + ε.

SFM, Bertinoro, June 21, 2016 31 / 59

slide-64
SLIDE 64

www.quanticol.eu

Uniqueness of solution

“Dynamical system 101 (ctn)”

Reminder: f is Lipschitz-continuous if there exists L such that: ∀x, y ∈ Rd: f (x) − f (y) ≤ L x − y . If x(t) = x0 + t

0 f (x(s))ds and y(t) = y0 +

t

0 f (y(s))ds + ε then

x(t) − y(t) ≤ L t x(s) − y(s) + x0 − y0 + ε. Gronwall’s Lemma: this implies that x(t) − y(t) ≤ (x0 − y0 + ε)eLt.

SFM, Bertinoro, June 21, 2016 31 / 59

slide-65
SLIDE 65

www.quanticol.eu

Consequence

Theorem

If X N(0) = x0, then: E

  • sup

t≤T

  • X N(t) − x(t)
  • ≤ O

1 √ N

  • eLT.

SFM, Bertinoro, June 21, 2016 32 / 59

slide-66
SLIDE 66

www.quanticol.eu

Rate of convergence: recap and some extensions

The speed of convergence can be extended to

Non-smooth dynamics (one sided Lipschitz functions) Steady-state (if f is C 2 and unique attractor) E [X(t)]

It cannot be extended to

General non-Lipschitz dynamics. Steady-state with no attractor. SFM, Bertinoro, June 21, 2016 33 / 59

slide-67
SLIDE 67

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 34 / 59

slide-68
SLIDE 68

www.quanticol.eu

Optimal control

Stochastic optimal control: closed-loop policies

actions(t+1)=function(state(t)).

Deterministic optimal control: open-loop policies are optimal. SFM, Bertinoro, June 21, 2016 35 / 59

slide-69
SLIDE 69

www.quanticol.eu

Markov decision processes

Reference: Puterman (2014)

Definition: a Markov decision process (MDP)

State space

/ action space

Transition probabilities : p(X(t + 1) = j|X(t) = i, action) Instantaneous cost: cost(t, state, action). Objective:

min E [cost(t, Xt, action)]

SFM, Bertinoro, June 21, 2016 36 / 59

slide-70
SLIDE 70

www.quanticol.eu

Markov decision processes

Reference: Puterman (2014)

Example: You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop?

Definition: a Markov decision process (MDP)

State space {1. . . 6} / action space ={stop, continue} Transition probabilities : p(X(t + 1) = j|X(t) = i, action)

p(X(t + 1) = i) = 1/6 if continue. p(X(t + 1) = X(t)) = 1 if stop.

Instantaneous cost: cost(t, state, action). Objective:

min E [cost(t, Xt, action)]

SFM, Bertinoro, June 21, 2016 36 / 59

slide-71
SLIDE 71

www.quanticol.eu

Example of Markov decision process

You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) Vt(i) = max

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Example: t 1 2 3 4 5 i 1 2 3 4 5 6

SFM, Bertinoro, June 21, 2016 37 / 59

slide-72
SLIDE 72

www.quanticol.eu

Example of Markov decision process

You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) Vt(i) = max

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Example: t 1 2 3 4 5 i 1 1 2 2 3 3 4 4 5 5 6 6

SFM, Bertinoro, June 21, 2016 37 / 59

slide-73
SLIDE 73

www.quanticol.eu

Example of Markov decision process

You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) Vt(i) = max

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Example: t 1 2 3 4 5 i 1 3.5 1 2 3.5 2 3 3.5 3 4 4 4 5 5 5 6 6 6

SFM, Bertinoro, June 21, 2016 37 / 59

slide-74
SLIDE 74

www.quanticol.eu

Example of Markov decision process

You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) Vt(i) = max

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Example: t 1 2 3 4 5 i 1 4.25 3.5 1 2 4.25 3.5 2 3 4.25 3.5 3 4 4.25 4 4 5 5 5 5 6 6 6 6

SFM, Bertinoro, June 21, 2016 37 / 59

slide-75
SLIDE 75

www.quanticol.eu

Example of Markov decision process

You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) Vt(i) = max

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Example: t 1 2 3 4 5 i 1 4.66 4.25 3.5 1 2 4.66 4.25 3.5 2 3 4.66 4.25 3.5 3 4 4.66 4.25 4 4 5 5 5 5 5 6 6 6 6 6

SFM, Bertinoro, June 21, 2016 37 / 59

slide-76
SLIDE 76

www.quanticol.eu

Example of Markov decision process

You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) Vt(i) = max

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Example: t 1 2 3 4 5 i 1 4.95 4.66 4.25 3.5 1 2 4.95 4.66 4.25 3.5 2 3 4.95 4.66 4.25 3.5 3 4 4.95 4.66 4.25 4 4 5 5 5 5 5 5 6 6 6 6 6 6

SFM, Bertinoro, June 21, 2016 37 / 59

slide-77
SLIDE 77

www.quanticol.eu

The curse of dimensionality

To solve Bellman’s equation, we need to iterate over the whole state space. Vt(i) = min

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

SFM, Bertinoro, June 21, 2016 38 / 59

slide-78
SLIDE 78

www.quanticol.eu

The curse of dimensionality

To solve Bellman’s equation, we need to iterate over the whole state space. Vt(i) = min

action cost(t, i, action)+E [Vt+1(X(t + 1) | X(t) = i, action)] .

Alternative:

Approximate dynamic programming (learning) Mean-field optimal control SFM, Bertinoro, June 21, 2016 38 / 59

slide-79
SLIDE 79

www.quanticol.eu

Example of mean-field control

MDP Mean-field optimization Find π(t, X) to minimize V π,N = E

  • t

cost(Xt, π(t, Xt))

  • subject to P(Xt+1

= i|Xt = j, π(.) = a) = Pi,j,a. Find a(t) to minimize V a = T cost(xt, at)dt subject to ˙ xt = f (xt, at)

SFM, Bertinoro, June 21, 2016 39 / 59

slide-80
SLIDE 80

www.quanticol.eu

Example of mean-field control

MDP Mean-field optimization Find π(t, X) to minimize V π,N = E

  • t

cost(Xt, π(t, Xt))

  • subject to P(Xt+1

= i|Xt = j, π(.) = a) = Pi,j,a. Find a(t) to minimize V a = T cost(xt, at)dt subject to ˙ xt = f (xt, at)

Theorem (G. Gaujal, Le Boudec 2012)

If the drift and costs are Lipschitz, then

the V N,∗ → V ∗ An open-loop policy a∗ is optimal SFM, Bertinoro, June 21, 2016 39 / 59

slide-81
SLIDE 81

www.quanticol.eu

Mean-field control: example

20 40 60 80 20 40 60 80 20 40 60 80

Proportion of Infected Population Proportion of Susceptible Population Proportion of Recovered and Vaccinated Population

OPT: MAX MFE: MAX OPT: MAX MFE: NO OPT: NO MFE: NO

S I R V γmI(t) π(t) ρ

SFM, Bertinoro, June 21, 2016 40 / 59

slide-82
SLIDE 82

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 41 / 59

slide-83
SLIDE 83

www.quanticol.eu

Motivation

Mean field games (Lions and Lasry, 2007 and Caines, 2007) capture the dynamic evolution of a large population of strategic players.

SFM, Bertinoro, June 21, 2016 42 / 59

slide-84
SLIDE 84

www.quanticol.eu

Game Taxinomy

static games: payoff

matrix per player. Strategy of one player is a (randomized) action.

population games: infinite

number of identical players. Players profiles replaced by action profiles.

Stochastic (repeated)

games: payoff is the (disc.) sum from 0 to T. Strategy of a player is a policy (function).

Mean field games:

dynamic games over infinite number of players.

SFM, Bertinoro, June 21, 2016 43 / 59

slide-85
SLIDE 85

www.quanticol.eu

Game Taxinomy

static games: payoff

matrix per player. Strategy of one player is a (randomized) action. Solution of the game: Nash equilibrium.

population games: infinite

number of identical players. Players profiles replaced by action profiles. Solution of the game: Wardrop equilibrium

Stochastic (repeated)

games: payoff is the (disc.) sum from 0 to T. Strategy of a player is a policy (function). Solution: Sub-game Perfect Eq. + folk theorem.

Mean field games:

dynamic games over infinite number of players. Solution of the game: mean field equilibrium.

SFM, Bertinoro, June 21, 2016 43 / 59

slide-86
SLIDE 86

www.quanticol.eu

Static game example

The prisoner’s dilemma

Two possible actions: {C, D}. The cost matrix is: C D C 1, 1 3, 0 D 0, 3 2, 2 (1)

SFM, Bertinoro, June 21, 2016 44 / 59

slide-87
SLIDE 87

www.quanticol.eu

Static game example

The prisoner’s dilemma

Two possible actions: {C, D}. The cost matrix is: C D C 1, 1 3, 0 D 0, 3 2, 2 (1)

Lemma

There exists a unique Nash equilibrium that consists in playing D.

SFM, Bertinoro, June 21, 2016 44 / 59

slide-88
SLIDE 88

www.quanticol.eu

Do the equilibria converge?

Static game (N players) Stochastic (repeated games) population games Mean-field games repeat N → ∞ repeat

SFM, Bertinoro, June 21, 2016 45 / 59

slide-89
SLIDE 89

www.quanticol.eu

Do the equilibria converge?

Static game (N players) Stochastic (repeated games) population games Mean-field games repeat N → ∞ repeat ??N → +∞??

SFM, Bertinoro, June 21, 2016 45 / 59

slide-90
SLIDE 90

www.quanticol.eu

Stochastic Games with Identical Players

Introduced by Shapley, 1953. Here, players are interchangeable: the dynamics, the costs and the strategies only depend on the population distribution. State at time t: X(t) = (X1(t), . . . , Xn(t), . . . , XN(t)), with Xn(t) ∈ S (finite set).

SFM, Bertinoro, June 21, 2016 46 / 59

slide-91
SLIDE 91

www.quanticol.eu

Stochastic Games with Identical Players

Introduced by Shapley, 1953. Here, players are interchangeable: the dynamics, the costs and the strategies only depend on the population distribution. State at time t: X(t) = (X1(t), . . . , Xn(t), . . . , XN(t)), with Xn(t) ∈ S (finite set). evolves in continuous time: player n takes actions An(t) ∈ A at instants distributed w.r.t. a Poisson process, independently of the

  • thers.

SFM, Bertinoro, June 21, 2016 46 / 59

slide-92
SLIDE 92

www.quanticol.eu

Stochastic Games

Dynamics and costs

Players interact according to a mean-field model: P

  • Xn(t + dt) = j
  • Xn(t) = i, An(t) = a, M(t) = m
  • = Pij(a, m)dt

Strategy of a player: π : (X(t), m) → A(t).

SFM, Bertinoro, June 21, 2016 47 / 59

slide-93
SLIDE 93

www.quanticol.eu

Stochastic Games

Dynamics and costs

Players interact according to a mean-field model: P

  • Xn(t + dt) = j
  • Xn(t) = i, An(t) = a, M(t) = m
  • = Pij(a, m)dt

Strategy of a player: π : (X(t), m) → A(t). Instantaneous cost: C(Xn(t), An(t), M(t)). Player n chooses a strategy πn to minimize her expected β-discounted payoff V (πn, π), knowing the strategies of the others: V N(πn, π) = E

  • e−βtC(Xn(t), An(t), M(t))
  • An has d.b. πn

An′ has d.b. π (n′ =

SFM, Bertinoro, June 21, 2016 47 / 59

slide-94
SLIDE 94

www.quanticol.eu

Stochastic Games

Nash Equilibria

Definition (Nash Equilibrium)

For a given set of strategies Π, a strategy π ∈ Π is called a symmetric Nash equilibrium in Π for the N-player game if, for any strategy πn ∈ Π, V N(π, π) ≤ V N(πn, π). Existence is guaranteed when the dynamics and the costs are continuous functions of the population (Fink, 1964).

SFM, Bertinoro, June 21, 2016 48 / 59

slide-95
SLIDE 95

www.quanticol.eu

Mean-Field Game Model

In the mean-field limit, the population distribution mπ(t) ∈ P(S) satisfies the mean-field equation: ˙ mπ

j (t) =

  • i∈S
  • a∈A

i (t)Qij(a, mπ(t))πi,a(mπ(t)).

(2)

SFM, Bertinoro, June 21, 2016 49 / 59

slide-96
SLIDE 96

www.quanticol.eu

Mean-Field Game Model

In the mean-field limit, the population distribution mπ(t) ∈ P(S) satisfies the mean-field equation: ˙ mπ

j (t) =

  • i∈S
  • a∈A

i (t)Qij(a, mπ(t))πi,a(mπ(t)).

(2) We focus on a particular player, that we call Player 0. Thanks to the decoupling assumption, the P(X0 = j) = xj satisfies: ˙ xj(t) =

  • i∈S
  • a∈A

xi(t)Qij(a, mπ(t))πn

i,a(t).

(3)

SFM, Bertinoro, June 21, 2016 49 / 59

slide-97
SLIDE 97

www.quanticol.eu

Mean-Field Game Model

Instantaneous cost and mean-field equilibria

The discounted cost of Player 0 is V (π0, π) = ∞

  • i∈S
  • a∈A

xi(t)Ci,a(mπ(t))π0

i,a(mπ(t))e−βt

  • dt,

Definition (Mean-Field Equilibrium)

A strategy is a (symmetric) mean-field equilibrium if V (πMFE, πMFE) ≤ V (π, πMFE).

SFM, Bertinoro, June 21, 2016 50 / 59

slide-98
SLIDE 98

www.quanticol.eu

Convergence of continuous policies

Theorem (Existence of equilibrium, Doncel, G., Gaujal 2016)

Assume that Qij(a, m) and Cia(m) are continuous in m. Then, there always exists a mean-field equilibrium. Applying the Kakutani fixed point theorem for infinite dimension spaces to the population distribution (instead of directly to strategies). Does not require convexity assumptions as in Gomes, Mohr, Souza, 2013.

Theorem (Convergence, Tembine et al., 2009)

If Ci,a(m), Qij(a, m) and the policy πi(m) are continuous in m then the population of the finite game converges to the solution of the differential equation (2) and the evolution of one player converges to the solution of (3). Question: where is the catch?

SFM, Bertinoro, June 21, 2016 51 / 59

slide-99
SLIDE 99

www.quanticol.eu

Non-convergence in General

We consider a matching game version of the prisoner’s dilemma. The state space: S = {C, D} and A = S. Population distribution is m = (mC, mD). Cost of a player: Ci,i(m) = mC + 3mD if i = C 2mD if i = D This is the expected cost of a player matched with another player at random and using the cost matrix: C D C 1, 1 3, 0 D 0, 3 2, 2 (4)

Lemma

There exists a unique mean-field equilibrium π∞ that consists in always playing D.

SFM, Bertinoro, June 21, 2016 52 / 59

slide-100
SLIDE 100

www.quanticol.eu

Non-convergence in General (II)

Let us define the following stationary strategy for N players: πN(M) = D if MC < 1 C if MC = 1. “play C as long as everyone else is playing C. Play D as soon as another player deviates to D.”

SFM, Bertinoro, June 21, 2016 53 / 59

slide-101
SLIDE 101

www.quanticol.eu

Non-convergence in General (II)

Let us define the following stationary strategy for N players: πN(M) = D if MC < 1 C if MC = 1. “play C as long as everyone else is playing C. Play D as soon as another player deviates to D.”

Lemma

For β < 1 and N large, πN is a sub-game perfect equilibrium of the N-player stochastic game.

SFM, Bertinoro, June 21, 2016 53 / 59

slide-102
SLIDE 102

www.quanticol.eu

Non-convergence in General (proofs)

Assume that all players, except player 0, play the strategy πN and let us compute the best response of player 0. If at time t0, MC < 1, then the best response of player 0 is to play D.

SFM, Bertinoro, June 21, 2016 54 / 59

slide-103
SLIDE 103

www.quanticol.eu

Non-convergence in General (proofs)

Assume that all players, except player 0, play the strategy πN and let us compute the best response of player 0. If at time t0, MC < 1, then the best response of player 0 is to play D. If MC = 1 then using π, has a cost

1 N

i=0 e−βi/N =

  • exp(−βt)dt + O(1/N) = 1/β + O(1/N).

If player 0 chooses action D, all players will also play D after the next step. This implies that MD(t) ≈ 1 − exp(−t) and that the player 0 will suffer a cost equal to ∞

0 (xC(t) + 2 − 2e−t)e−βtdt + O(1/N) ≥ 2/(β(β + 1)) + O(1/N).

SFM, Bertinoro, June 21, 2016 54 / 59

slide-104
SLIDE 104

www.quanticol.eu

Non-convergence in General (proofs)

Assume that all players, except player 0, play the strategy πN and let us compute the best response of player 0. If at time t0, MC < 1, then the best response of player 0 is to play D. If MC = 1 then using π, has a cost

1 N

i=0 e−βi/N =

  • exp(−βt)dt + O(1/N) = 1/β + O(1/N).

If player 0 chooses action D, all players will also play D after the next step. This implies that MD(t) ≈ 1 − exp(−t) and that the player 0 will suffer a cost equal to ∞

0 (xC(t) + 2 − 2e−t)e−βtdt + O(1/N) ≥ 2/(β(β + 1)) + O(1/N).

This shows that when β < 1, player 0 has no incentive to deviate from the strategy πN so that, πN is a sug-game perfect equilibrium.

SFM, Bertinoro, June 21, 2016 54 / 59

slide-105
SLIDE 105

www.quanticol.eu

Mean-field Games: Conclusion

With repeated game with a finite number of players, it is possible to define many equilibria by using the “tit for tat” principle (Folk Theorem).

SFM, Bertinoro, June 21, 2016 55 / 59

slide-106
SLIDE 106

www.quanticol.eu

Mean-field Games: Conclusion

With repeated game with a finite number of players, it is possible to define many equilibria by using the “tit for tat” principle (Folk Theorem). When the number of players is infinite, the deviation of a single player is not visible by the population, the equilibria based on the “tit for tat” principle do not scale at the mean-field limit.

This is all the more damaging because these equilibria have very

good social costs: mean-field games fail to describe the best equilibria. Are mean-field games good models?

SFM, Bertinoro, June 21, 2016 55 / 59

slide-107
SLIDE 107

www.quanticol.eu

Outline

1

The decoupling method: finite and infinite time horizon Illustration of the method Finite time horizon: some theory Steady-state regime

2

Rate of convergence

3

Optimal control and mean-field games Centralized control Decentralized control and games

4

Conclusion and recap

SFM, Bertinoro, June 21, 2016 56 / 59

slide-108
SLIDE 108

www.quanticol.eu

Recap

  • 1. Mean-field ≈ decoupling assumption

Valid for finite time. Infinite horizon should be handle with care

  • 2. Rate of convergence

O(1/

√ N) under a Lipschitz condition.

  • 3. Controlled problems

OK for centralized control Not that OK for games SFM, Bertinoro, June 21, 2016 57 / 59

slide-109
SLIDE 109

www.quanticol.eu

Thank you!

http://mescal.imag.fr/membres/nicolas.gast nicolas.gast@inria.fr Mean-field and decoupling

Bena¨ ım, Le Boudec 08

A class of mean field interaction models for computer and communication systems, M.Bena¨

ım and J.Y. Le Boudec., Performance evaluation, 2008. Le Boudec 10

The stationary behaviour of fluid limits of reversible processes is concentrated on stationary points., J.-Y. L. Boudec. , Arxiv:1009.5021, 2010

Darling Norris 08

  • R. W. R. Darling and J. R. Norris, Differential equation approximations for Markov

chains, Probability Surveys 2008

  • G. 16

Construction of Lyapunov functions via relative entropy with application to caching, Gast, N., ACM MAMA 2016

Budhiraja et al. 15

Limits of relative entropies associated with weakly interacting particle systems., A. S. Budhiraja, P. Dupuis, M. Fischer, and K. Ramanan. , Electronic

journal of probability, 20, 2015. SFM, Bertinoro, June 21, 2016 58 / 59

slide-110
SLIDE 110

www.quanticol.eu

References (continued)

Optimal control and mean-field games:

G.,Gaujal Le Boudec 12

Mean field for Markov decision processes: from discrete to continuous optimization, N.Gast,B.Gaujal,J.Y.Le Boudec, IEEE TAC, 2012

  • G. Gaujal 12

Markov chains with discontinuous drifts have differential inclusion limits., Gast N. and Gaujal B., Performance Evaluation, 2012

Puterman

Markov decision processes: discrete stochastic dynamic programming, M.L. Puterman, John Wiley & Sons, 2014.

Lasry Lions

Mean field games, J.-M. Lasry and P.-L. Lions, Japanese Journal of Mathematics, 2007.

Tembine at al 09

Mean field asymptotics of markov decision evolutionary games and teams, H. Tembine, J.-Y. L. Boudec, R. El-Azouzi, and E. Altman., GameNets 00

Applications: caches, bikes

Don and Towsley An approximate analysis of the LRU and FIFO buffer replacement

schemes, A. Dan and D. Towsley., SIGMETRICS 1990

  • G. Van Houdt 15 Transient and Steady-state Regime of a Family of List-based Cache

Replacement Algorithms., Gast, Van Houdt., ACM Sigmetrics 2015

Fricker-Gast 14

Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity., C. Fricker and N. Gast. , EJTL, 2014.

Fricket et al. 13

Mean field analysis for inhomogeneous bike sharing systems, Fricker,

Gast, Mohamed, Discrete Mathematics and Theoretical Computer Science DMTCS

  • G. et al 15

Probabilistic forecasts of bike-sharing systems for journey planning,

  • N. Gast, G. Massonnet, D. Reijsbergen, and M. Tribastone, CIKM 2015

SFM, Bertinoro, June 21, 2016 59 / 59