MEAN FIELD FOR MARKOV DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION
Nicolas Gast, Bruno Gaujal Jean-Yves Le Boudec, Jan 24, 2012
1
DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION - - PowerPoint PPT Presentation
MEAN FIELD FOR MARKOV DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION Nicolas Gast, Bruno Gaujal Jean-Yves Le Boudec, Jan 24, 2012 1 Contents 1. Mean Field Interaction Model 2. Mean Field Interaction Model with Central
1
2
3
1(t), …, XN N(t)) is Markov
4
S I R D βI α b q
1(t), …, XN N(t)) is Markov
5
6
N nodes, homogeneous, pairwise meetings One interaction per time slot, I(N) = 1/N; mean field limit is an ODE Occupancy measure is M(t) = (S(t), I(t), R(t), D(t)) with S(t)+ I(t) + R(t) + D(t) =1 S(t) = proportion of nodes in state `S’
7
S I R D βI α b q α = 0.1 α = 0.7 S+R
I S+R
I (S+R, I) (S, I) mean field limit N = 100, q=b =0.1, β =0.6 dead nodes
8
9
10
1(t), …, XN N(t)) -> action
11
12
θ = 0.68 θ = 0. 8 θ = 0.65
13
14
15
16
17
18
Optimal value for system with N objects (MDP) Optimal value for fluid limit
19
20
Value of this policy Optimal value for system with N objects (MDP)
21
22
23
(taken from Tsitsiklis, Xu 11)
1 The drift is:
24
25
26
[Gast 2011] N. Gast, B. Gaujal, and J.Y. Le Boudec. Mean field for Markov Decision Processes: from Discrete to Continuous Optimization. To appear in IEEE Transaction on Automatic Control, 2012
[Gast 2011b] N. Gast and B. Gaujal. Markov chains with discontinuous drifts have differential inclusions limits. application to stochastic stability and mean field
Short version: N. Gast and B. Gaujal. Mean eld limit of non-smooth systems and differential
[Ethier and Kurtz (2005)] Stewart Ethieru and Thomas Kurtz. Markov Processes, Characterization and Convergence. Wiley 2005.
[Benaim and Le Boudec(2008)] M Benaim and JY Le Boudec. A class of mean field interaction models for computer and communication systems, Performance Evaluation, 65 (11-12): 823—838. 2008 [Khouzani 2010] M.H.R. Khouzani, S. Sarkar, and E. Altman. Maximum damage malware attack in mobile wireless networks. In IEEE Infocom, San Diego, 2010
27